RE: Exact matching on names?

Olson, Ron Wed, 17 Aug 2011 07:47:32 -0700

Thank you Sujit and Rob for your help; I took the "easy" way and created a new 
field type that is identical to text, but with the stemmer removed. This seems, 
so far, to work exactly as needed.


To help anyone else who comes across this issue, this is the field type I used:

<fieldType name="textNoStem" class="solr.TextField" positionIncrementGap="100">
       <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" 
ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" 
splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" 
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" 
splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" 
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>


-----Original Message-----
From: Sujit Pal [mailto:sujit....@comcast.net]
Sent: Tuesday, August 16, 2011 12:18 PM
To: solr-user@lucene.apache.org
Subject: Re: Exact matching on names?

Hi Ron,

There was a discussion about this some time back, which I implemented
(with great success btw) in my own code...basically you store both the
analyzed and non-analyzed versions (use string type) in the index, then
send in a query like this:

+name:clarke name_s:"clarke"^100

The name field is text so it will analyze down "clarke" to "clark" but
it will match both "clark" and "clarke" and the second clause would
boost the entry with "clarke" up to the top, which you then select with
rows=1.

-sujit

On Tue, 2011-08-16 at 10:20 -0500, Olson, Ron wrote:
> Hi all-
>
> I'm missing something fundamental yet I've been unable to find the definitive 
> answer for exact name matching. I'm indexing names using the standard "text" 
> field type and my search is for the name "clarke". My results include 
> "clark", which is incorrect, it needs to match clarke exactly (case 
> insensitive).
>
> I tried textType but that doesn't work because I believe it needs to be 
> *really* exact, whereas I'm looking for things like "clark oil", "bob, frank, 
> and clark", etc.
>
> Thanks for any help,
>
> Ron
>
> DISCLAIMER: This electronic message, including any attachments, files or 
> documents, is intended only for the addressee and may contain CONFIDENTIAL, 
> PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
> recipient, you are hereby notified that any use, disclosure, copying or 
> distribution of this message or any of the information included in or with it 
> is  unauthorized and strictly prohibited.  If you have received this message 
> in error, please notify the sender immediately by reply e-mail and 
> permanently delete and destroy this message and its attachments, along with 
> any copies thereof. This message does not create any contractual obligation 
> on behalf of the sender or Law Bulletin Publishing Company.
> Thank you.



DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.

RE: Exact matching on names?

Reply via email to