RE: Multi-lingual search

Allison, Timothy B. Tue, 02 Feb 2016 11:04:12 -0800

Three basic options: 
1) one generic field that handles non-whitespace languages and normalization 
robustly (downside: no language specific stopwords, stemming, etc)
2) one field per language (hope lang id works and that you don't have many 
multilingual docs)
3) one Solr core for language (ditto)


For the first option (a good first start, no matter what), see:
http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/98895 

>     <fieldType name="text_all" class="solr.TextField" 
> positionIncrementGap="100">
>       <analyzer>
>         <tokenizer class="solr.ICUTokenizerFactory"/>
>         <!-- for any non-CJK -->
>         <filter class="solr.ICUFoldingFilterFactory"/>
>         <filter class="solr.CJKBigramFilterFactory" outputUnigrams="true"/>
>       </analyzer>
>     </fieldType>

Second two options are well described here: 
http://www.basistech.com/multilingual-search-with-solr-no-problem/ 

See also:
http://www.basistech.com/indexing-strategies-for-multilingual-search-with-solr-and-rosette/
 


-----Original Message-----
From: vidya [mailto:vidya.nade...@tcs.com] 
Sent: Monday, February 01, 2016 8:35 AM
To: solr-user@lucene.apache.org
Subject: Multi-lingual search

Hi

 My use case is to index and able to query different languages in solr which 
are not in-built languages supported by solr. How can i implement this ? 

My input document consists of different languages in a field. I came across 
"Solr in action" book with searching content in multiple languages i.e., 
chapter 14. For built in languages i have implemented this approach. But for 
languages like Tamil, how to implement? Do i need to find for filter classes of 
that particular language or any libraries in specific.

Please help me on this.

Thanks in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-lingual-search-tp4254398.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Multi-lingual search

Reply via email to