You are looking for a language identification tool. You could check https://issues.apache.org/jira/browse/SOLR-1979 for the start of this. Otherwise, you have to roll your own or buy a third party one.
On Mar 24, 2011, at 12:24 PM, fr.jur...@voila.fr wrote: > Hello Solrists, > > As it says in the subject line, I'm looking for a Java component that, > given an ISO 639-1 code or some equivalent, > would return a Lucene Analyzer ready to gobble documents in the corresponding > language. > Solr looks like it has to contain one, > only I've not been able to locate it so far; > can you point the spot? > > I've found org.apache.solr.analysis, > and thing like org.apache.lucene.analysis.bg &c in lucene/modules, > with many classes which I'm sure are related, however the factory itself > still eludes me; > I mean the Java class.method that'd decide on request, what to do with all > these packages > to bring the requisite object to existence, once the language is specified. > Where should I look? Or was I mistaken & Solr has nothing of the kind, at > least in Java? > Thanks in advance for your help. > > Best regards, > François Jurain. > > ____________________________________________________ > > Retrouvez les 10 conseils pour économiser votre carburant sur Voila : > http://actu.voila.fr/evenementiel/LeDossierEcologie/l-eco-conduite/ > > > -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem docs using Solr/Lucene: http://www.lucidimagination.com/search