Re: Preparing the ground for a real multilang index

2009-07-11 Thread Jan Høydahl
Michael, you're of course right, copyfield would copy from source. The lack of built-in language awareness in Solr is unfortunate :( I have not tried Lucid's BasisTech lemmatizer implementation, but check with them whether they can support multi languages in the same field. -- Jan Høydahl On 8. j

Re: Preparing the ground for a real multilang index

2009-07-08 Thread Paul Libbrecht
Can't the copy field use a different analyzer? Both for query and indexing? Otherwise you need to craft your own analyzer which reads the language from the field-name... there's several classes ready for this. paul Le 08-juil.-09 à 02:36, Michael Lackhoff a écrit : On 08.07.2009 00:50 Jan H

Re: Preparing the ground for a real multilang index

2009-07-07 Thread Michael Lackhoff
On 08.07.2009 00:50 Jan Høydahl wrote: > itself and do not need to know the query language. You may then want > to do a copyfield from all your text_ -> text for convenient one- > field-to-rule-them-all search. Would that really help? As I understand it, copyfield takes the raw, not yet analyz

Re: Preparing the ground for a real multilang index

2009-07-07 Thread Benson Margulies
There is an alternative to knowing the language at query: multiply-process for stems or lemmas of all the possible languages. This may well be a cure much worse than the disease. Yes, LI can sell you our lemma-production capability. --benson margulies basis technology On Tue, Jul 7, 2009 at 6

Re: Preparing the ground for a real multilang index

2009-07-07 Thread Jan Høydahl
When using stemming, you have to know the query language. For your project, perhaps you should look into switching to a lemmatizer instead. I believe Lucid can provide integration with a commercial lemmatizer. This way you can expand the document field itself and do not need to know the quer

Re: Preparing the ground for a real multilang index

2009-07-03 Thread Paul Libbrecht
Le 03-juil.-09 à 07:43, Michael Lackhoff a écrit : On 03.07.2009 00:49 Paul Libbrecht wrote: [I'll try to address the other responses as well] I believe the proper way is for the server to compute a list of accepted languages in order of preferences. The web-platform language (e.g. the user-

Re: Preparing the ground for a real multilang index

2009-07-02 Thread Michael Lackhoff
On 03.07.2009 00:49 Paul Libbrecht wrote: [I'll try to address the other responses as well] > I believe the proper way is for the server to compute a list of > accepted languages in order of preferences. > The web-platform language (e.g. the user-setting), and the values in > the Accept-Langu

Re: Preparing the ground for a real multilang index

2009-07-02 Thread Paul Libbrecht
I believe the proper way is for the server to compute a list of accepted languages in order of preferences. The web-platform language (e.g. the user-setting), and the values in the Accept-Language http header (which are from the browser or platform). Then you expand your query for surfing w

Re: Preparing the ground for a real multilang index

2009-07-02 Thread Walter Underwood
Not to mention Americans who call themselves "wunder". Or brand names, like LaserJet, which are the same in all languages. Queries are far too short for effective language id. You can get language preferences from an HTTP request headers, then allow people to override them. I think the header is A

Re: Preparing the ground for a real multilang index

2009-07-02 Thread Otis Gospodnetic
Michael, I think you really aught to know the language of the query (from a pulldown, from the browser, from user settings, somewhere) and pass that to the backend unless your queries are sufficiently long that their language can be identified. Here is a handy tool for playing with langua