In trying to understand the various options for WordDelimiterFilterFactory, I tried setting all options to 0. This seems to prevent a number of words from being output at all. In particular "can't" and "99dxl" don't get output, nor do any wods containing hypens. Is this correct behavior?
Here is what the Solr Analyzer output org.apache.solr.analysis.WhitespaceTokenizerFactory {} term position 1 2 3 4 5 6 7 8 9 term text ca-55 99_3_a9 55-67 powerShot ca999x15 foo-bar can't joe's 99dxl org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=0, generateNumberParts=0, catenateWords=0, generateWordParts=0, catenateAll=0, catenateNumbers=0} term position 1 5 term text powerShot joe term type word word source start,end 20,29 53,56 Here is the schema <fieldtype name="mbooksOcrXPatLike" class="solr.TextField"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="0" generateWordParts="0" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldtype> Tom