Discussion:
Multiple Language Support
(too old to reply)
Hilary Cotter
2006-12-05 11:30:58 UTC
Permalink
you need to use the ms.locale metatag with the content value being the lcid,
ie EN-US, FR, etc.

Now the problem is that when you query in different languages you will get
false conjugates or false friends. IE querying for poison in English will
give hits to the German word for gift, poison.

So you can restrict your search by subwebs as you have suggested or by
checking the value of the ms.locale tag.
--
Hilary Cotter

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
Hello,
on out web server we are running subwebs in various languages, such as
English, Spanish, German, Russian, Turkish, Japanese and others. All web
pages are UTF-8 encoded and do not contain any further language/character
set
information. Is there any document describing what I have to do in order
to
manage Index Server to support a correct search in different languages.
First of all, I have to insert a meta-tag with the character set usesd, I
guess. But then: do I have to sett up a different index for each subweb?
Where do I get the required files, e.g. for word stemming, which Index
Server
needs for non (western-)european languages? How do I search the proper
index,
if, for example the searched expression is in Turkish, but my browsers
language preference is set to Swedish?
Index Server documentation on MSDN is very cryptic here, so any help is
welcome.
Regards
Kallo
Hilary Cotter
2006-12-18 14:10:10 UTC
Permalink
You can use the metatag in ixsso queries as well as idq. To search on
Turkish characters you need to install the far east word breaker. Search on
the Microsoft site for this.
--
Hilary Cotter

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
1.) I had found information about the specification of ms.local in the
meta
tags, but to be able to use it from the index server, one has to define a
new
Property Name for the Web Catalog in the .idq file. The problem is we are
not
using idq-queries but the ixsso object.
2.) I have copied our Turkish subweb and modified several pages so that
they
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1254">
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-9">
The original Turkish web containing only
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" >
is indexed by the indes server, while the copie of this web with the
Turkish
charset specification is not indexed at all!
3.) If I try to search the original Turkish web (the one which had been
indexed) using words from the western-european character set, it works
fine,
but if the query contains Turkish characters, which are not part of the
western-european set, the message "Error 80041605 - The query contained
only
ignored words." is returned via the Index Server MMCs Query Form. The same
happens with queries in Russian and Japanes.
Any hints anybody?
Regards
kb
Post by Hilary Cotter
you need to use the ms.locale metatag with the content value being the lcid,
ie EN-US, FR, etc.
Now the problem is that when you query in different languages you will get
false conjugates or false friends. IE querying for poison in English will
give hits to the German word for gift, poison.
So you can restrict your search by subwebs as you have suggested or by
checking the value of the ms.locale tag.
--
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
Hello,
on out web server we are running subwebs in various languages, such as
English, Spanish, German, Russian, Turkish, Japanese and others. All web
pages are UTF-8 encoded and do not contain any further
language/character
set
information. Is there any document describing what I have to do in order
to
manage Index Server to support a correct search in different languages.
First of all, I have to insert a meta-tag with the character set usesd, I
guess. But then: do I have to sett up a different index for each subweb?
Where do I get the required files, e.g. for word stemming, which Index
Server
needs for non (western-)european languages? How do I search the proper
index,
if, for example the searched expression is in Turkish, but my browsers
language preference is set to Swedish?
Index Server documentation on MSDN is very cryptic here, so any help is
welcome.
Regards
Kallo
Loading...