+1 to langdetect In Tika 2.0, we're going to remove our own language detection code and allow users to select Optimaize (fork of langdetect), MIT Lincoln Lab’s Text.jl library or Yalder (https://github.com/kkrugler/yalder). The first two are now available in Tika 1.13.
-----Original Message----- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Wednesday, June 22, 2016 8:27 AM To: solr-user@lucene.apache.org; solr-user <solr-user@lucene.apache.org> Subject: RE: Automatic Language Identification Hello, I recommend using the langdetect language detector, it supports many more languages and has much higher precission than Tika's detector. Markus