On 03.07.2009 00:49 Paul Libbrecht wrote: [I'll try to address the other responses as well]
> I believe the proper way is for the server to compute a list of > accepted languages in order of preferences. > The web-platform language (e.g. the user-setting), and the values in > the Accept-Language http header (which are from the browser or > platform). All this is not going to help much because the main application is a scientific search portal for books and articles with many users searching cross-language. The most typical use case is a German user searching multilingual. So we might even get the search multilingual, e.g. TITLE:cancer OR TITLE:krebs. No way here to watch out for Accept-headers or a language select field (would be left on "any" in most cases). Other popular use cases are citations (in whatever language) cut and pasted into the search field. > Then you expand your query for surfing waves (say) to: > - phrase query: surfing waves exactly (^2.0) > - two terms, no stemming: surfing waves (^1.5) > - iterate through the languages and query for stemmed variants: > - english: surf wav ^1.0 > - german surfing wave ^0.9 > - .... > - then maybe even try the phonetic analyzer (matched in a separate > field probably) This is an even more sophisticated variant of the multiple "OR" I came up with. Oh well... > I think this is a common pattern on the web where the users, browsers, > and servers are all somewhat multilingual. indeed and often users are not even aware of it, especially in a scientific context they use their native tongue and English almost interchangably -- and they expect the search engine to cope with it. I think the best would be to process the data according to its language but don't make any assumptions about the query language and I am totally lost how to get a clever schema.xml out of all this. Thanks everyone for listening and I am still open for good suggestions to deal with this problem! -Michael