Try the Solr Admin Analysis page and see how your failing examples analyze
for both index and query.
Also, if you experiment with analyzer settings, be sure to FULLY reindex
your documents since a mismatch between how the documents were ORIGINALLY
analyzed and the latest query analysis can cause mismatches. Changing an
index analyzer does not force an automatic reindex.
Also, check to see that there is not a delimiter character, such as a colon,
immediately before a term with no white space.
-- Jack Krupansky
-----Original Message-----
From: Sohail Aboobaker
Sent: Wednesday, November 21, 2012 8:13 AM
To: solr-user@lucene.apache.org
Subject: Inconsistent search results.
Hi,
We have 500k+ documents indexed with many fields. One of the fields is a
simple text filled that is defined as default search field and we copy many
field values into that field.
Some values are composed of two components with a "." as separator. When we
search for the partial terms for such values, we get inconsistent results.
Following are some examples:
Value: KWJ1112.MC2850
we search on MC2850, it returns result.
we search on KWJ1112, no results.
Value: ACW9920.KL1230
we search on ACW9920, gives results.
we search on KL1230, gives results.
The results are inconsistent. Sometimes, it will give results on both sides
of partial search. For others, it would give results on only the last part
of word. The last part search always works.
We are using standard tokenizer as follows:
<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100"><analyzer type="index"><tokenizer
class="solr.StandardTokenizerFactory"/><filter
class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
enablePositionIncrements="true"/><!-- in this example, we will only use
synonyms at query time
<filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
--><filter
class="solr.LowerCaseFilterFactory"/></analyzer><analyzer
type="query"><tokenizer class="solr.StandardTokenizerFactory"/><filter
class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
enablePositionIncrements="true"/><filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/><filter
class="solr.LowerCaseFilterFactory"/></analyzer></fieldType>
What should we use in order to get consistent results for both sides of
component? Should we be using whitespace with worddelimiterfactory? Some
examples will be helpful.
Thanks
Sohail