Does anybody know of a tokenizer which can be configured with (multiple) 
regular expressions to mark some of the input text as keyword
and behave like StandardTokenizer (or UAX29URLEmailTokenizer) otherwise?

Input:
Does my order 4711.0815!-somecode_and.other(stuff) arrive on friday?

Tokens:
does|my|order|4711.0815!-somecode_and.other(stuff)|arrive|on|Friday


Any pointer? How to code?

Regards,

Kai Gülzau




Reply via email to