Re: Question on Tokenizing email address

abhishes Tue, 09 Feb 2010 23:34:36 -0800

Thank you! it works very well.

I think that the field type suggested by you will index words like DOT, AT,
com also


In order to prevent these words from getting indexed, I have changed the
field type to 

<fieldType name="email" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>                      
        <filter class="solr.PatternReplaceFilterFactory" pattern="\." 
replacement="
DOT " replace="all" />
        <filter class="solr.PatternReplaceFilterFactory" pattern="@" 
replacement="
AT " replace="all" />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />                
  </analyzer>
</fieldType>

I have added the words dot, com to the stoplist file (at was already there).

Is this correct?

-- 
View this message in context: 
http://old.nabble.com/Question-on-Tokenizing-email-address-tp27518673p27527033.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question on Tokenizing email address

Reply via email to