: In trying to understand the various options for : WordDelimiterFilterFactory, I tried setting all options to 0. This seems : to prevent a number of words from being output at all. In particular : "can't" and "99dxl" don't get output, nor do any wods containing hypens. : Is this correct behavior?
For the record: there are other options you haven't set... splitOnNumerics defaults to "1"; preserveOriginal defaults to "0" ... i'm guessing if you set splitOnNumerics="0" you'd see a lot more tokens come through, and if you set preserveOriginal="1" you'd definitely see a lot more tokens come through my default. : <fieldtype name="mbooksOcrXPatLike" class="solr.TextField"> : <analyzer> : <tokenizer class="solr.WhitespaceTokenizerFactory"/> : <filter class="solr.WordDelimiterFilterFactory" : splitOnCaseChange="0" : generateWordParts="0" : generateNumberParts="0" : catenateWords="0" : catenateNumbers="0" : catenateAll="0" : /> : <filter class="solr.LowerCaseFilterFactory"/> : </analyzer> : </fieldtype> -Hoss