I was under the impression that solr does Tika and the language identifier that Shuyo did. The page at http://wiki.apache.org/solr/LanguageDetectionlists them both.
<processor class="org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory"> <processor class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory"> Again, I'm just trying to understand why it was moved to solr. On Fri, Apr 20, 2012 at 6:02 PM, Jan Høydahl <jan....@cominvent.com> wrote: > Hi, > > Solr just reuses Tika's language identifier. But you are of course free to > do your language detection on the Nutch side if you choose and not invoke > the one in Solr. > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > Solr Training - www.solrtraining.com > > On 20. apr. 2012, at 21:49, Bai Shen wrote: > > > I'm working on using Shuyo's work to improve the language identification > of > > our search. Apparently, it's been moved from Nutch to Solr. Is there a > > reason for this? > > > > http://code.google.com/p/language-detection/issues/detail?id=34 > > > > I would prefer to have the processing done in Nutch as that has the > benefit > > of more hardware and not interfering with Solr latency. > > > > Thanks. > >