Try the Solr Admin Analysis page and see how your failing examples analyze for both index and query.

Also, if you experiment with analyzer settings, be sure to FULLY reindex your documents since a mismatch between how the documents were ORIGINALLY analyzed and the latest query analysis can cause mismatches. Changing an index analyzer does not force an automatic reindex.

Also, check to see that there is not a delimiter character, such as a colon, immediately before a term with no white space.

-- Jack Krupansky

-----Original Message----- From: Sohail Aboobaker
Sent: Wednesday, November 21, 2012 8:13 AM
To: solr-user@lucene.apache.org
Subject: Inconsistent search results.

Hi,

We have 500k+ documents indexed with many fields. One of the fields is a
simple text filled that is defined as default search field and we copy many
field values into that field.

Some values are composed of two components with a "." as separator. When we
search for the partial terms for such values, we get inconsistent results.
Following are some examples:

Value: KWJ1112.MC2850

we search on MC2850, it returns result.
we search on KWJ1112, no results.

Value: ACW9920.KL1230

we search on ACW9920, gives results.
we search on KL1230, gives results.

The results are inconsistent. Sometimes, it will give results on both sides
of partial search. For others, it would give results on only the last part
of word. The last part search always works.

We are using standard tokenizer as follows:

<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100"><analyzer type="index"><tokenizer
class="solr.StandardTokenizerFactory"/><filter
class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
enablePositionIncrements="true"/><!-- in this example, we will only use
synonyms at query time
       <filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
       --><filter
class="solr.LowerCaseFilterFactory"/></analyzer><analyzer
type="query"><tokenizer class="solr.StandardTokenizerFactory"/><filter
class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
enablePositionIncrements="true"/><filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/><filter
class="solr.LowerCaseFilterFactory"/></analyzer></fieldType>

What should we use in order to get consistent results for both sides of
component? Should we be using whitespace with worddelimiterfactory? Some
examples will be helpful.

Thanks

Sohail

Reply via email to