Re: Preserving punctuation tokens with ICUTokenizerFactory
you can actually plug in customized grammars and stuff like that, but the simplest approach is to configure mappingcharfilter before your tokenizer, with mappings like: "c++" => "cplusplus" On Tue, Apr 10, 2012 at 11:50 AM, Demian Katz wrote: > It has been brought to my attention that ICUTokenize
Preserving punctuation tokens with ICUTokenizerFactory
It has been brought to my attention that ICUTokenizerFactory drops tokens like the ++ in "The C++ Programming Language." Is there any way to persuade it to preserve these types of tokens? thanks, Demian