This page is a handy reference for individual languages... http://wiki.apache.org/solr/LanguageAnalysis
But the usual approach, especially for Chinese/Japanese/Korean (CJK) is to index the content in different fields with language-specific analyzers then spread your search across the language-specific fields (e.g. title_en, title_fr, title_ar). Stemming and stopwords particularly give "surprising" results if you put words from different languages in the same field. Best Erick On Wed, Jun 8, 2011 at 8:34 AM, Mohammad Shariq <shariqn...@gmail.com> wrote: > Hi, > I had setup solr( solr-1.4 on Ubuntu 10.10) for indexing news articles in > English, but my requirement extend to index the news of other languages too. > > This is how my schema looks : > <field name="news" type="text" indexed="true" stored="false" > required="false"/> > > > And the "text" Field in schema.xml looks like : > > <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" enablePositionIncrements="true"/> > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > catenateAll="0" splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SnowballPorterFilterFactory" language="English" > protected="protwords.txt"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" enablePositionIncrements="true"/> > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" > generateNumberParts="1" catenateWords="0" catenateNumbers="0" > catenateAll="0" splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SnowballPorterFilterFactory" language="English" > protected="protwords.txt"/> > </analyzer> > </fieldType> > > > My Problem is : > Now I want to index the news articles in other languages to e.g. > Chinese,Japnese. > How I can I modify my text field so that I can Index the news in other lang > too and make it searchable ?? > > Thanks > Shariq > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/how-to-Index-and-Search-non-Eglish-Text-in-solr-tp3038851p3038851.html > Sent from the Solr - User mailing list archive at Nabble.com. >