Michael, you're of course right, copyfield would copy from source.
The lack of built-in language awareness in Solr is unfortunate :(
I have not tried Lucid's BasisTech lemmatizer implementation, but check
with them whether they can support multi languages in the same field.
--
Jan Høydahl
On 8. j
Can't the copy field use a different analyzer?
Both for query and indexing?
Otherwise you need to craft your own analyzer which reads the language
from the field-name... there's several classes ready for this.
paul
Le 08-juil.-09 à 02:36, Michael Lackhoff a écrit :
On 08.07.2009 00:50 Jan H
On 08.07.2009 00:50 Jan Høydahl wrote:
> itself and do not need to know the query language. You may then want
> to do a copyfield from all your text_ -> text for convenient one-
> field-to-rule-them-all search.
Would that really help? As I understand it, copyfield takes the raw, not
yet analyz
There is an alternative to knowing the language at query:
multiply-process for stems or lemmas of all the possible languages.
This may well be a cure much worse than the disease.
Yes, LI can sell you our lemma-production capability.
--benson margulies
basis technology
On Tue, Jul 7, 2009 at 6
When using stemming, you have to know the query language.
For your project, perhaps you should look into switching to a
lemmatizer instead. I believe Lucid can provide integration with a
commercial lemmatizer. This way you can expand the document field
itself and do not need to know the quer
Le 03-juil.-09 à 07:43, Michael Lackhoff a écrit :
On 03.07.2009 00:49 Paul Libbrecht wrote:
[I'll try to address the other responses as well]
I believe the proper way is for the server to compute a list of
accepted languages in order of preferences.
The web-platform language (e.g. the user-
On 03.07.2009 00:49 Paul Libbrecht wrote:
[I'll try to address the other responses as well]
> I believe the proper way is for the server to compute a list of
> accepted languages in order of preferences.
> The web-platform language (e.g. the user-setting), and the values in
> the Accept-Langu
I believe the proper way is for the server to compute a list of
accepted languages in order of preferences.
The web-platform language (e.g. the user-setting), and the values in
the Accept-Language http header (which are from the browser or
platform).
Then you expand your query for surfing w
Not to mention Americans who call themselves "wunder". Or brand names, like
LaserJet, which are the same in all languages. Queries are far too short for
effective language id.
You can get language preferences from an HTTP request headers, then allow
people to override them. I think the header is A
Michael,
I think you really aught to know the language of the query (from a pulldown,
from the browser, from user settings, somewhere) and pass that to the
backend unless your queries are sufficiently long that their language can
be identified.
Here is a handy tool for playing with langua
10 matches
Mail list logo