Hi,

To match 1, 2, 3, 4 below you could use a fieldtype based on TextField, with 
just a simple WordDelimiterFactory. However, this would also match abc-def, 
def.alpha, xyz-com and a...@def, because all punctuation is treated the same. 
To avoid this, you could do some custom handling of "-", "." and "@":

    <!-- An unstemmed text field optimized for emails -->
    <fieldType name="email" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.PatternReplaceFilterFactory" pattern="\." 
replacement=" DOT " replace="all" />
        <filter class="solr.PatternReplaceFilterFactory" pattern="@" 
replacement=" AT " replace="all" />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" 
splitOnCaseChange="0"/>
      </analyzer>
    </fieldType>

You will see that this splits "foo....@apache.org" into "foo DOT bar AT apache 
DOT org" on both index and query side, and thus avoids false matches as above.

To support the "must match" case, you could use the "lowercase" fieldtype, 
which will give a case insensitive match for the whole content of the field 
only.

--
Jan Høydahl  - search architect
Cominvent AS - www.cominvent.com

On 9. feb. 2010, at 18.13, Abhishek Srivastava wrote:

> Hello Everyone,
> 
> I have a field in my solr schema which stores emails. The way I want the
> emails to be tokenized is like this.
> if the email address is abc....@alpha-xyz.com
> User should be able to search on
> 
> 1. abc....@alpha-xyz.com  (whole address)
> 2. abc
> 3. def
> 4. alpha-xyz
> 
> Which tokenizer should I use?
> 
> Also, is there a feature like "Must Match" in solr? in my schema there is
> field called "from" which contains the email address of the person who sent
> an email. For this field, I don't want any tokenization. When the user
> issues a search. The users email ID must exactly match the "for" column
> value for that document/record to be returned.
> How can I do this?
> 
> Regards,
> Abhishek

Reply via email to