Sorry if this has already been discussed, but I have already spent a couple of days googling in vain....
The problem: - documents in multiple languages (us, de, fr, es). - language is known (a team of editors determines the language manually, and users are asked to specify language option for searching). My intended approach: - one index. - a multiplexing token filter, a MultilingualSnowballFilterFactory that instantiates a Snowball Stemmer for the appropriate language. - language is a facet, to get rid of cross-language ambiguities with multiple languages mixed in the same field. The problem is how to communicate the language to the MultilingualSnowballFilterFactory. Once the language is known, instantiating the Snowball Stemmer for the right language is easy. I got a working version attached below. My solution: - append the language as the first token for the FilterFactory to pick up. E.g. "es This is a spanish document....". - this would mean I need to duplicate the fields - an original version for storing, and a version with the language marker appended for indexing. E.g description (indexed=false, stored=true), description_i (indexed=true, stored=false). Is there a better way? Many thanks in advance. Yee http://lucene.472066.n3.nabble.com/file/n3235341/MultilingualSnowballFilterFactory.java MultilingualSnowballFilterFactory.java -- View this message in context: http://lucene.472066.n3.nabble.com/Multiplexing-TokenFilter-for-multi-language-tp3235341p3235341.html Sent from the Solr - User mailing list archive at Nabble.com.