You would have to carefully analyze the source code and tables of these two
stemmers to determine if one might incorrectly stem words in the other
language. Technically, that could be fine for indexing, but it might give
users some unexpected results for queries. There might also be cases where
the second stemmer would stem a term that was already stemmed by the first
stemmer.
You could avoid the latter issue by using the duplicate token technique. For
a single stemmer this is generally:
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordRepeatFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
For two (or more) languages:
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordRepeatFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="German2" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
This would produce the stemmed term for both languages, or either language,
or neither, as the case may be.
-- Jack Krupansky
-----Original Message-----
From: Croci Francesco Luigi (ID SWS)
Sent: Friday, March 14, 2014 8:17 AM
To: solr-user@lucene.apache.org
Subject: analyzer with multiple stem-filters for more languages
It is possible to define an analyzer with more than one Stem-filter for more
languages?
Something like this:
<analyzer type="index">
...
<filter class="solr.PorterStemFilterFactory"/> (default for english)
<filter class="solr.SnowballPorterFilterFactory" language="German2" />
</analyzer>
Greetings
Francesco