: I need to tokenize my field on whitespaces, html, punctuation, apostrophe

: but if I use HTMLStripStandardTokenizerFactory it strips only html.... 
: but no apostrophes

you might consider using one of the HTML Tokenizers, and then use a 
PatternReplaceFilterFilter ... or if you know java write a 
simple Tokenizer that uses the HTMLStripReader.

in the long run, changing the HTMLStripReader to be useble as a 
"CharFilter" so it can work with any Tokenizer is probably the way we'll 
go -- but i don't think anyone has started working on a patch for that.



-Hoss

Reply via email to