We should add a simple filter in Solr for this. The current way requires indexing.
https://github.com/kkrugler/yalder is good, it would be a great filter: if NOT english, fail the whole text. On Fri, Jul 1, 2016 at 6:33 AM, Allison, Timothy B. <talli...@mitre.org> wrote: > +1 to langdetect > > In Tika 2.0, we're going to remove our own language detection code and > allow users to select Optimaize (fork of langdetect), MIT Lincoln Lab’s > Text.jl library or Yalder (https://github.com/kkrugler/yalder). The > first two are now available in Tika 1.13. > > -----Original Message----- > From: Markus Jelsma [mailto:markus.jel...@openindex.io] > Sent: Wednesday, June 22, 2016 8:27 AM > To: solr-user@lucene.apache.org; solr-user <solr-user@lucene.apache.org> > Subject: RE: Automatic Language Identification > > Hello, > > I recommend using the langdetect language detector, it supports many more > languages and has much higher precission than Tika's detector. > > Markus > > > -- Bill Bell billnb...@gmail.com cell 720-256-8076