On 5/17/2013 10:26 AM, Kai Gülzau wrote:
> Is there some StandardTokenizer Implementation which does not break words on
> hyphens?
>
> I think it would be more flexible to retain hyphens and use a
> WordDelimiterFactory to split these tokens.
You can use the whitespace tokenizer with WDF. This
Is there some StandardTokenizer Implementation which does not break words on
hyphens?
I think it would be more flexible to retain hyphens and use a
WordDelimiterFactory to split these tokens.
StandardTokenizer today:
doc1: email -> email
doc2: e-mail -> e|mail
doc3: e mail -> e|mail
query1: e