: I need to tokenize my field on whitespaces, html, punctuation, apostrophe
: but if I use HTMLStripStandardTokenizerFactory it strips only html.... : but no apostrophes you might consider using one of the HTML Tokenizers, and then use a PatternReplaceFilterFilter ... or if you know java write a simple Tokenizer that uses the HTMLStripReader. in the long run, changing the HTMLStripReader to be useble as a "CharFilter" so it can work with any Tokenizer is probably the way we'll go -- but i don't think anyone has started working on a patch for that. -Hoss