Chris Hostetter wrote:
: I need to tokenize my field on whitespaces, html, punctuation, apostrophe

: but if I use HTMLStripStandardTokenizerFactory it strips only html.... : but no apostrophes

you might consider using one of the HTML Tokenizers, and then use a PatternReplaceFilterFilter ... or if you know java write a simple Tokenizer that uses the HTMLStripReader.

in the long run, changing the HTMLStripReader to be useble as a "CharFilter" so it can work with any Tokenizer is probably the way we'll go -- but i don't think anyone has started working on a patch for that.

I opened:
https://issues.apache.org/jira/browse/SOLR-1343

Koji

Reply via email to