I was under the impression that solr does Tika and the language identifier
that Shuyo did.  The page at
http://wiki.apache.org/solr/LanguageDetectionlists them both.

<processor 
class="org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory">
<processor 
class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">

Again, I'm just trying to understand why it was moved to solr.


On Fri, Apr 20, 2012 at 6:02 PM, Jan Høydahl <jan....@cominvent.com> wrote:

> Hi,
>
> Solr just reuses Tika's language identifier. But you are of course free to
> do your language detection on the Nutch side if you choose and not invoke
> the one in Solr.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 20. apr. 2012, at 21:49, Bai Shen wrote:
>
> > I'm working on using Shuyo's work to improve the language identification
> of
> > our search.  Apparently, it's been moved from Nutch to Solr.  Is there a
> > reason for this?
> >
> > http://code.google.com/p/language-detection/issues/detail?id=34
> >
> > I would prefer to have the processing done in Nutch as that has the
> benefit
> > of more hardware and not interfering with Solr latency.
> >
> > Thanks.
>
>

Reply via email to