Re: Question on Tokenizing email address

2010-02-11 Thread Jan Høydahl / Cominvent
My point is that I WANT the AT, DOT to be indexed, to avoid these being treated the same: foo-...@brown.fox and foo-bar.brown.fox By using the LowerCaseFilterFactory before the replacements, you actually ensure that a search for email:at will not give a match because the query will be lower-case

Re: Question on Tokenizing email address

2010-02-09 Thread abhishes
Thank you! it works very well. I think that the field type suggested by you will index words like DOT, AT, com also In order to prevent these words from getting indexed, I have changed the field type to

Re: Question on Tokenizing email address

2010-02-09 Thread Jan Høydahl / Cominvent
Hi, To match 1, 2, 3, 4 below you could use a fieldtype based on TextField, with just a simple WordDelimiterFactory. However, this would also match abc-def, def.alpha, xyz-com and a...@def, because all punctuation is treated the same. To avoid this, you could do some custom handling of "-", "."