Pretty sure what you need is called KeywordMarkerFilterFactory.

|<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" />|

On 11/5/14 17:24, Tang, Rebecca wrote:
Hi there,

For some hyphenated terms, I want them to stay as is instead of being 
tokenized.  For example: e-cigarette, e-cig, I-pad.  I don't want them to be 
split into e and cig or I and pad  because the single letter e and I produces 
too many false positive matches.

Is there a way to tell the standard tokenizer to skip tokenizing some terms?

Rebecca Tang
Applications Developer, UCSF CKM
Legacy Tobacco Document Library<legacy.library.ucsf.edu/>
E: rebecca.t...@ucsf.edu


Reply via email to