Search term matching on part of a token, not the whole token

John, Phil (CSS) Tue, 05 Mar 2013 05:55:04 -0800

Hi,


I'm hitting a brick wall trying to diagnose this issue. We have a field,
configured like this:

 

                <fieldType name="class" class="solr.TextField"
positionIncrementGap="100">

                        <analyzer>

                                <tokenizer
class="solr.WhiteSpaceTokenizerFactory"/>

                                <filter
class="solr.LowerCaseFilterFactory"/>

                        </analyzer>

                </fieldType>

 

And it has Dewey Decimal Classifications fed into it, e.g.

 

100

100.10

100.22 

 

Etc.

 

When performing a search against the field (using the edismax parser) a
search like:

 

class:100

 

or

 

class:"100"

 

is matching both records with the exact token of 100, but also records
where 100 is only a part of the token, e.g. 100.10, 100.22 etc.

 

I've checked the analysis section of the admin interface and the field
is being tokenised correctly (eg, 100.10 is a single token), so I'm at a
loss as to why this is happening.

 

Does anyone have any ideas?

 

Regards,

Phil John
Technical Lead

Software services
Capita, Knights Court, Solihull Parkway, B37 7YB

Office: 0870 400 5000
Fax: 0870 400 5001
email: philj...@capita.co.uk <mailto:philj...@capita.co.uk> 

Part of Capita plc www.capita.co.uk <http://www.capita.co.uk>  

 



This email and any attachment to it are confidential.  Unless you are the 
intended recipient, you may not use, copy or disclose either the message or any 
information contained in the message. If you are not the intended recipient, 
you should delete this email and notify the sender immediately.

Any views or opinions expressed in this email are those of the sender only, 
unless otherwise stated.  All copyright in any Capita material in this email is 
reserved.

All emails, incoming and outgoing, may be recorded by Capita and monitored for 
legitimate business purposes. 

Capita exclude all liability for any loss or damage arising or resulting from 
the receipt, use or transmission of this email to the fullest extent permitted 
by law.

Search term matching on part of a token, not the whole token

Reply via email to