On 03.07.2009 00:49 Paul Libbrecht wrote:

[I'll try to address the other responses as well]

> I believe the proper way is for the server to compute a list of  
> accepted languages in order of preferences.
> The web-platform language (e.g. the user-setting), and the values in  
> the Accept-Language http header (which are from the browser or  
> platform).

All this is not going to help much because the main application is a
scientific search portal for books and articles with many users
searching cross-language. The most typical use case is a German user
searching multilingual. So we might even get the search multilingual,
e.g. TITLE:cancer OR TITLE:krebs. No way here to watch out for
Accept-headers or a language select field (would be left on "any" in
most cases). Other popular use cases are citations (in whatever
language) cut and pasted into the search field.

> Then you expand your query for surfing waves (say) to:
> - phrase query: surfing waves exactly (^2.0)
> - two terms, no stemming: surfing waves (^1.5)
> - iterate through the languages and query for stemmed variants:
>    - english: surf wav ^1.0
>    - german surfing wave ^0.9
>    - ....
> - then maybe even try the phonetic analyzer (matched in a separate  
> field probably)

This is an even more sophisticated variant of the multiple "OR" I came
up with. Oh well...

> I think this is a common pattern on the web where the users, browsers,  
> and servers are all somewhat multilingual.

indeed and often users are not even aware of it, especially in a
scientific context they use their native tongue and English almost
interchangably -- and they expect the search engine to cope with it.

I think the best would be to process the data according to its language
but don't make any assumptions about the query language and I am totally
lost how to get a clever schema.xml out of all this.

Thanks everyone for listening and I am still open for good suggestions
to deal with this problem!

-Michael

Reply via email to