Re: how to Index and Search non-Eglish Text in solr

Erick Erickson Thu, 09 Jun 2011 06:37:43 -0700

No, you'd have to create multiple fieldTypes, one for each language....

Best
Erick


On Thu, Jun 9, 2011 at 5:26 AM, Mohammad Shariq <shariqn...@gmail.com> wrote:
> Can I specify multiple language in filter tag in schema.xml ???  like below
>
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>   <analyzer type="index">
>      <tokenizer class="solr.
> WhitespaceTokenizerFactory"/>
>      <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true"/>
>      <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1"/>
>
> <filter class="solr.SnowballPorterFilterFactory" language="Dutch" />
> <filter class="solr.SnowballPorterFilterFactory" language="English" />
> <filter class="solr.SnowballPorterFilterFactory" language="Chinese" />
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <tokenizer class="solr.CJKTokenizerFactory"/>
>
>
>
>      <filter class="solr.LowerCaseFilterFactory"/><filter
> class="solr.SnowballPorterFilterFactory" language="Hungarian" />
>
>
> On 8 June 2011 18:47, Erick Erickson <erickerick...@gmail.com> wrote:
>
>> This page is a handy reference for individual languages...
>> http://wiki.apache.org/solr/LanguageAnalysis
>>
>> But the usual approach, especially for Chinese/Japanese/Korean
>> (CJK) is to index the content in different fields with language-specific
>> analyzers then spread your search across the language-specific
>> fields (e.g. title_en, title_fr, title_ar). Stemming and stopwords
>> particularly give "surprising" results if you put words from different
>> languages in the same field.
>>
>> Best
>> Erick
>>
>> On Wed, Jun 8, 2011 at 8:34 AM, Mohammad Shariq <shariqn...@gmail.com>
>> wrote:
>> > Hi,
>> > I had setup solr( solr-1.4 on Ubuntu 10.10) for indexing news articles in
>> > English, but my requirement extend to index the news of other languages
>> too.
>> >
>> > This is how my schema looks :
>> > <field name="news" type="text" indexed="true" stored="false"
>> > required="false"/>
>> >
>> >
>> > And the "text" Field in schema.xml looks like :
>> >
>> > <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>> >    <analyzer type="index">
>> >       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> >       <filter class="solr.StopFilterFactory" ignoreCase="true"
>> > words="stopwords.txt" enablePositionIncrements="true"/>
>> >       <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1"
>> > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
>> > catenateAll="0" splitOnCaseChange="1"/>
>> >       <filter class="solr.LowerCaseFilterFactory"/>
>> >       <filter class="solr.SnowballPorterFilterFactory" language="English"
>> > protected="protwords.txt"/>
>> >    </analyzer>
>> >    <analyzer type="query">
>> >       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> >       <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>> > ignoreCase="true" expand="true"/>
>> >       <filter class="solr.StopFilterFactory" ignoreCase="true"
>> > words="stopwords.txt" enablePositionIncrements="true"/>
>> >       <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1"
>> > generateNumberParts="1" catenateWords="0" catenateNumbers="0"
>> > catenateAll="0" splitOnCaseChange="1"/>
>> >       <filter class="solr.LowerCaseFilterFactory"/>
>> >       <filter class="solr.SnowballPorterFilterFactory" language="English"
>> > protected="protwords.txt"/>
>> >    </analyzer>
>> > </fieldType>
>> >
>> >
>> > My Problem is :
>> > Now I want to index the news articles in other languages to e.g.
>> > Chinese,Japnese.
>> > How I can I modify my text field so that I can Index the news in other
>> lang
>> > too and make it searchable ??
>> >
>> > Thanks
>> > Shariq
>> >
>> >
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> http://lucene.472066.n3.nabble.com/how-to-Index-and-Search-non-Eglish-Text-in-solr-tp3038851p3038851.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>> >
>>
>
>
>
> --
> Thanks and Regards
> Mohammad Shariq
>

Re: how to Index and Search non-Eglish Text in solr

Reply via email to