Hi all,

I need to tokenize my field on whitespaces, html, punctuation, apostrophe

but if I use HTMLStripStandardTokenizerFactory it strips only html.... but no 
apostrophes

If I use PatternTokenizerFactory i don't know if i can create a pattern to 
tokenizer all of theese characters...(hmtl, apostrophes..)...
I can filter with a pattern theese chars [^0-9A-Za-z] but with filter if I use 
" " as replacement it brokes my text

could you help me to solve this problem?

Bye



      

Reply via email to