Hi. I recently started using solr in a project, and experienced what I think
is strange matching behaviour, and would like some help in understanding
what happened. I'm using solr 3.1 with java 1.6 on linux.

My index consists of a set of phrases, which I'd like to match against
incoming text such as an email message or tweets. If I use OR as the default
operator, as I understand, a query such as "I am looking for car insurance"
should match "car insurance". Is this correct?

Basically, I'm seeing 2 types of matches which I don't understand:
1. When I search for "I am looking for car insurance", no document matches
even though "car insurance" is one of the terms. i.e., a search for "car
insurance" matches many documents. Why doesn't the OR operator work?
2. When I search for "looking for car", the document matched has text as
"Looking for discounts". Why doesn't "for" get caught as part of stopwords
removal?

The stopwords used are from the file which came packaged with the solr
release. The query handler used is the dismax handler. I am also using the
same schema definition for the field names, and my document fields look like
this:

 <fields>
   <!-- Common metadata fields, named specifically to match up with
     SolrCell metadata when parsing rich documents such as Word, PDF.
     Some fields are multiValued only because Tika currently may return
     multiple values for them.
   -->
   <field name="text" type="text" indexed="true" stored="false"
multiValued="true"/>
   <field name="defaultquery" type="query_text" indexed="true"
stored="false" multiValued="true"/>

   <field name="kpid" type="string" indexed="true" stored="true"
required="true" />
   <field name="keywords" type="text" indexed="true" required="true" />

 </fields>

 <!-- Field to use to determine and enforce document uniqueness.
      Unless this field is marked with required="false", it will be a
required field
   -->
 <uniqueKey>kpid</uniqueKey>

 <!-- field for the QueryParser to use when an explicit fieldname is absent
-->
 <defaultSearchField>defaultquery</defaultSearchField>

 <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
 <solrQueryParser defaultOperator="OR"/>

 <!-- default field for query, if explicit quer is not given
   -->
 <copyField source="keywords" dest="defaultquery"/>

The query handler is configured like this:

  <requestHandler name="dismax" class="solr.SearchHandler" default="true">
    <lst name="defaults">
     <str name="defType">dismax</str>
     <str name="echoParams">explicit</str>
     <float name="tie">0.01</float>
     <str name="fl">
       kpid,keywords
     </str>
     <str name="mm">
        2&lt;-1 5&lt;-2 6&lt;90%
     </str>
     <int name="ps">100</int>
     <str name="q.alt">*:*</str>
     <!-- example highlighter config, enable per-query with hl=true -->
     <str name="hl.fl">text features name</str>
     <!-- for this field, we want no fragmenting, just highlighting -->
     <str name="f.name.hl.fragsize">0</str>
     <!-- instructs Solr to return the field itself if no query terms are
          found -->
     <str name="f.name.hl.alternateField">title</str>
     <str name="f.text.hl.fragmenter">regex</str> <!-- defined below -->
    </lst>
  </requestHandler>

Here's what the debug output looks like for the 2nd query "looking for car"

<lst name="explain">
-
<str name="dab16663c2770bb64e7d681284af40bfb83e9db965ea48217d0b0916">

2.6861415 = (MATCH) sum of:
  2.6861415 = (MATCH) product of:
    4.029212 = (MATCH) sum of:
      2.92451 = (MATCH) weight(defaultquery:looking in 2160), product of:
        0.72399944 = queryWeight(defaultquery:looking), product of:
          8.078763 = idf(docFreq=1, maxDocs=2373)
          0.08961761 = queryNorm
        4.0393815 = (MATCH) fieldWeight(defaultquery:looking in 2160),
product of:
          1.0 = tf(termFreq(defaultquery:looking)=1)
          8.078763 = idf(docFreq=1, maxDocs=2373)
          0.5 = fieldNorm(field=defaultquery, doc=2160)
      1.1047021 = (MATCH) weight(defaultquery:for in 2160), product of:
        0.44497362 = queryWeight(defaultquery:for), product of:
          4.9652476 = idf(docFreq=44, maxDocs=2373)
          0.08961761 = queryNorm
        2.4826238 = (MATCH) fieldWeight(defaultquery:for in 2160), product
of:
          1.0 = tf(termFreq(defaultquery:for)=1)
          4.9652476 = idf(docFreq=44, maxDocs=2373)
          0.5 = fieldNorm(field=defaultquery, doc=2160)
    0.6666667 = coord(2/3)
</str>
</lst>

Please help!

thanks so much,
Vijay

Reply via email to