Proper nouns are the worst for language ID. What language is "Laserjet" or "Obama"? --wunder
On Jul 7, 2013, at 10:47 AM, Jack Krupansky wrote: > The problem at query time is simple: a typical query has too few terms to > reliably identify the language using statistical techniques, especially for a > language like English which is famous for "borrowing" words from other > languages. I mean, is "raison d'être" REALLY French anymore? Or, are > "sombrero" or "poncho" or "mañana" really strictly Spanish anymore? > > Multi-lingual support is an art/craft; don't expect cookbook answers that > will apply to all apps in all environments. > > That said, Edismax searching of multiple field, one for each language is > probably the best you're going to do without doing something > super-sophisticated. > > -- Jack Krupansky > > -----Original Message----- From: adfel70 > Sent: Sunday, July 07, 2013 1:32 PM > To: solr-user@lucene.apache.org > Subject: Why shouldn't lang-id component work at query-time? > > Hi, > I'm trying to integrate solr's lang-id component in my solr environment. > In my scenario, I have documents in many different languages. I want to > index them in the same solr collection, to different fields and apply > language-specific analyzers on each field by its language. > > So far lang-id component does exactly what I need. > > The problem is that in all recepies that I've read, eventually at query-time > I have to indicate which language I'm querying. > Either by specifying the field I want to search: > /solr/collection/select?q=text_it:abc abc > Or by creating a language-specific request handler which I would have to use > like this: > /solr/collection/selectIT?q=text:abc abc > > Either way, I must tell solr the language, which in my case - a web > client+many different languages, it's quite problematic. > > I was wondering why shouldn't lang-id component provide a full ability to > index and query on multi-languages when both in indexing and in querying the > language is transparent to the client. > This could be achieved by applying the same language-detection tool at query > time. > > Any insights? > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Why-shouldn-t-lang-id-component-work-at-query-time-tp4076057.html > Sent from the Solr - User mailing list archive at Nabble.com. -- Walter Underwood wun...@wunderwood.org