Proper nouns are the worst for language ID. What language is "Laserjet" or 
"Obama"?  --wunder

On Jul 7, 2013, at 10:47 AM, Jack Krupansky wrote:

> The problem at query time is simple: a typical query has too few terms to 
> reliably identify the language using statistical techniques, especially for a 
> language like English which is famous for "borrowing" words from other 
> languages. I mean, is "raison d'être" REALLY French anymore? Or, are 
> "sombrero" or "poncho" or "mañana" really strictly Spanish anymore?
> 
> Multi-lingual support is an art/craft; don't expect cookbook answers that 
> will apply to all apps in all environments.
> 
> That said, Edismax searching of multiple field, one for each language is 
> probably the best you're going to do without doing something 
> super-sophisticated.
> 
> -- Jack Krupansky
> 
> -----Original Message----- From: adfel70
> Sent: Sunday, July 07, 2013 1:32 PM
> To: solr-user@lucene.apache.org
> Subject: Why shouldn't lang-id component work at query-time?
> 
> Hi,
> I'm trying to integrate solr's lang-id component in my solr environment.
> In my scenario, I have documents in many different languages. I want to
> index them in the same solr collection, to different fields and apply
> language-specific analyzers on each field by its language.
> 
> So far lang-id component does exactly what I need.
> 
> The problem is that in all recepies that I've read, eventually at query-time
> I have to indicate which language I'm querying.
> Either by specifying the field I want to search:
> /solr/collection/select?q=text_it:abc abc
> Or by creating a language-specific request handler which I would have to use
> like this:
> /solr/collection/selectIT?q=text:abc abc
> 
> Either way, I must tell solr the language, which in my case - a web
> client+many different languages, it's quite problematic.
> 
> I was wondering why shouldn't lang-id component provide a full ability to
> index and query on multi-languages when both in indexing and in querying the
> language is transparent to the client.
> This could be achieved by applying the same language-detection tool at query
> time.
> 
> Any insights?
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Why-shouldn-t-lang-id-component-work-at-query-time-tp4076057.html
> Sent from the Solr - User mailing list archive at Nabble.com. 

--
Walter Underwood
wun...@wunderwood.org



Reply via email to