Hello all,

I have gotten my DataImporthandler to index my data from my MySQL database. I 
was looking at the schema tool and noticing that stopwords in different 
languages are being indexed as terms. The 6 languages we have are English, 
French, Spanish, Chinese, German and Italian.

Right now I am using the basic schema configuration for English. How do I 
define them for others languages? I have looked at the wiki page 
(http://wiki.apache.org/solr/LanguageAnalysis) but I would like to have an 
example configuration for all the languages I need. Also I need a list of 
stopwords for these languages.  So far I have this

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" 
ignoreCase="true" expand="false"/>
        -->

        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll=" 
splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" 
protected="protwords.txt"/>
      </analyzer>

Thanks in advance

Greg

Reply via email to