Three basic options: 1) one generic field that handles non-whitespace languages and normalization robustly (downside: no language specific stopwords, stemming, etc) 2) one field per language (hope lang id works and that you don't have many multilingual docs) 3) one Solr core for language (ditto)
For the first option (a good first start, no matter what), see: http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/98895 > <fieldType name="text_all" class="solr.TextField" > positionIncrementGap="100"> > <analyzer> > <tokenizer class="solr.ICUTokenizerFactory"/> > <!-- for any non-CJK --> > <filter class="solr.ICUFoldingFilterFactory"/> > <filter class="solr.CJKBigramFilterFactory" outputUnigrams="true"/> > </analyzer> > </fieldType> Second two options are well described here: http://www.basistech.com/multilingual-search-with-solr-no-problem/ See also: http://www.basistech.com/indexing-strategies-for-multilingual-search-with-solr-and-rosette/ -----Original Message----- From: vidya [mailto:vidya.nade...@tcs.com] Sent: Monday, February 01, 2016 8:35 AM To: solr-user@lucene.apache.org Subject: Multi-lingual search Hi My use case is to index and able to query different languages in solr which are not in-built languages supported by solr. How can i implement this ? My input document consists of different languages in a field. I came across "Solr in action" book with searching content in multiple languages i.e., chapter 14. For built in languages i have implemented this approach. But for languages like Tamil, how to implement? Do i need to find for filter classes of that particular language or any libraries in specific. Please help me on this. Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-lingual-search-tp4254398.html Sent from the Solr - User mailing list archive at Nabble.com.