I had meant to also include a link to a blog post of mine that lists some useful links:
http://fschiettecatte.wordpress.com/2008/07/23/language-recognition/ François On Mar 25, 2011, at 11:59 AM, Grant Ingersoll wrote: > You are looking for a language identification tool. You could check > https://issues.apache.org/jira/browse/SOLR-1979 for the start of this. > Otherwise, you have to roll your own or buy a third party one. > > On Mar 24, 2011, at 12:24 PM, fr.jur...@voila.fr wrote: > >> Hello Solrists, >> >> As it says in the subject line, I'm looking for a Java component that, >> given an ISO 639-1 code or some equivalent, >> would return a Lucene Analyzer ready to gobble documents in the >> corresponding language. >> Solr looks like it has to contain one, >> only I've not been able to locate it so far; >> can you point the spot? >> >> I've found org.apache.solr.analysis, >> and thing like org.apache.lucene.analysis.bg &c in lucene/modules, >> with many classes which I'm sure are related, however the factory itself >> still eludes me; >> I mean the Java class.method that'd decide on request, what to do with all >> these packages >> to bring the requisite object to existence, once the language is specified. >> Where should I look? Or was I mistaken & Solr has nothing of the kind, at >> least in Java? >> Thanks in advance for your help. >> >> Best regards, >> François Jurain. >> >> ____________________________________________________ >> >> Retrouvez les 10 conseils pour économiser votre carburant sur Voila : >> http://actu.voila.fr/evenementiel/LeDossierEcologie/l-eco-conduite/ >> >> >> > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem docs using Solr/Lucene: > http://www.lucidimagination.com/search >