subject:"StandardTokenizer vs. hyphens"

Re: StandardTokenizer vs. hyphens

2013-05-17 Thread Shawn Heisey

On 5/17/2013 10:26 AM, Kai Gülzau wrote: > Is there some StandardTokenizer Implementation which does not break words on > hyphens? > > I think it would be more flexible to retain hyphens and use a > WordDelimiterFactory to split these tokens. You can use the whitespace tokenizer with WDF. This

StandardTokenizer vs. hyphens

2013-05-17 Thread Kai Gülzau

Is there some StandardTokenizer Implementation which does not break words on hyphens? I think it would be more flexible to retain hyphens and use a WordDelimiterFactory to split these tokens. StandardTokenizer today: doc1: email -> email doc2: e-mail -> e|mail doc3: e mail -> e|mail query1: e