I've just started experimenting with the solr.KeywordMarkerFilterFactory in Solr 3.1, and I'm seeing some strange behavior. It seems that every word subsequent to a protected word is also treated as being protected.
For testing purposes, I have put the word "spelling" in my protwords.txt. If I do a test for "spelling bees" in the analyze tool, the stemmer produces "spelling bees" - nothing is stemmed. But if I do a test for "bees spelling", I get "bee spelling", the expected result with "bees" stemmed but "spelling" left unstemmed. I have tried extended examples - in every case I tried, all of the words prior to "spelling" get stemmed, but none of the words after "spelling" get stemmed. When turning on the verbose mode of the analyze tool, I can see that the settings of the "keyword" attribute introduced by solr.KeywordMarkerFilterFactory correspond with the the stemming behavior... so I think the solr.KeywordMarkerFilterFactory component is to blame, and not anything later in the analyze chain. Any idea what might be going wrong? Is this a known issue? Here is my field type definition, in case it makes a difference: <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.ICUTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/> <filter class="solr.ICUFoldingFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.SnowballPorterFilterFactory" language="English"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.ICUTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/> <filter class="solr.ICUFoldingFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.SnowballPorterFilterFactory" language="English"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> thanks, Demian