Greg, You need to get stopword lists for your 6 languages. Then you need to create new field types just like that 'text' type, one for each language. Point them to the appropriate stopwords files and instead of "English" specify each one of your languages. You can either index each language in its own index or put them all in the same index, in which case you'll want fields like title_en, title_fr, etc.
Check http://search-lucene.com/ - this multilingual stuff is a common topic. Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ ----- Original Message ---- > From: Greg Georges <greg.geor...@biztree.com> > To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> > Sent: Mon, February 21, 2011 4:27:46 PM > Subject: Question regarding indexing multiple languages, stopwords, etc. > > Hello all, > > I have gotten my DataImporthandler to index my data from my MySQL database. > I >was looking at the schema tool and noticing that stopwords in different >languages are being indexed as terms. The 6 languages we have are English, >French, Spanish, Chinese, German and Italian. > > Right now I am using the basic schema configuration for English. How do I >define them for others languages? I have looked at the wiki page >(http://wiki.apache.org/solr/LanguageAnalysis) but I would like to have an >example configuration for all the languages I need. Also I need a list of >stopwords for these languages. So far I have this > > <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <!-- in this example, we will only use synonyms at query time > <filter class="solr.SynonymFilterFactory" >synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> > --> > > <filter class="solr.StopFilterFactory" > ignoreCase="true" > words="stopwords.txt" > enablePositionIncrements="true" > /> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" >generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll=" >splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SnowballPorterFilterFactory" language="English" >protected="protwords.txt"/> > </analyzer> > > Thanks in advance > > Greg >