: In trying to understand the various options for 
: WordDelimiterFilterFactory, I tried setting all options to 0. This seems 
: to prevent a number of words from being output at all. In particular 
: "can't" and "99dxl" don't get output, nor do any wods containing hypens. 
: Is this correct behavior?

For the record: there are other options you haven't set... splitOnNumerics 
defaults to "1"; preserveOriginal defaults to "0" ... i'm guessing if you 
set splitOnNumerics="0" you'd see a lot more tokens come through, and if 
you set preserveOriginal="1" you'd definitely see a lot more tokens come 
through my default.

: <fieldtype name="mbooksOcrXPatLike" class="solr.TextField">
:       <analyzer>
:           <tokenizer class="solr.WhitespaceTokenizerFactory"/>
:           <filter class="solr.WordDelimiterFilterFactory"
:                 splitOnCaseChange="0"
:                 generateWordParts="0"
:                 generateNumberParts="0"
:               catenateWords="0"
:                 catenateNumbers="0"
:                 catenateAll="0"
:                 />
:           <filter class="solr.LowerCaseFilterFactory"/>
:       </analyzer>
:     </fieldtype>

-Hoss

Reply via email to