Hello. I'm trying to understand the behaviour of edismax in solr 3.4 when
it comes to searching fields similar to "string" types, i.e., untokenized.
My document is data about products available in various stores. One of the
fields in my schema is the name of the merchant, and I would like to match
only the entire name in the merchant field to cut out false positives. For
e.g., I want "The Gap" to match in merchant, but not "gap".

To do this, I configured the field as such:

    <fieldType name="text_full_match" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.SynonymFilterFactory"
synonyms="names-synonyms.txt" ignoreCase="true" expand="true"/>
      </analyzer>
    </fieldType>

All the other fields are product descriptors such as category, product
name, etc., which I store as "text_en" field from the example schemas.

I have a merchant in the data called "Jones New York". If my query is
simply the 3 words, i.e., "q=jones+new+york", the merchant field doesn't
match. The debugQuery shows that the query splits the words up, like thus:
<str name="parsedquery">+((DisjunctionMaxQuery((summary:jones^2.0 |
title:jones^3.0 | merchant:jones^3.0 | cats4match:jones)~0.1)
DisjunctionMaxQuery((merchant:new^3.0)~0.1)
DisjunctionMaxQuery((summary:york^2.0 | title:york^3.0 | merchant:york^3.0
| cats4match:york)~0.1))~1) DisjunctionMaxQuery((summary:"jones ?
york"~3^5.0 | title:"jones ? york"~3^10.0 | cats4match:"jones ?
york"~3^5.0)~0.1) ()</str>

My edismax is configured this:
  <requestHandler name="edismax" class="solr.SearchHandler" default="true">
    <lst name="defaults">
     <str name="defType">edismax</str>
     <str name="echoParams">explicit</str>
     <float name="tie">0.1</float>
     <str name="fl">
       dealid,category,subcategory,merchant, merchant_id, title
     </str>
     <str name="mm">1</str>
     <str name="qf">
       cats4match^1.0 merchant^3.0 title^3.0 summary^2.0
     </str>
     <str name="pf">
       cats4match^5.0 merchant^10.0 title^10.0 summary^5.0
     </str>
     <int name="ps">3</int>
     <str name="pf2">
       cats4match^5.0 merchant^10.0 title^10.0 summary^5.0
title_phrases^10.0 summary_phrases^5.0
     </str>
     <str name="pf3">
       cats4match^5.0 merchant^10.0 title^10.0 summary^5.0
title_phrases^10.0 summary_phrases^5.0
     </str>
     <int name="qs">3</int>
     <str name="q.alt">*:*</str>
    </lst>
  </requestHandler>


What gives? Can I achieve trying to query a string type field together with
other tokenized fields? Or am I missing the point entirely, and I need to
do this some other way?

thanks in advance for your help.
Vijay

Reply via email to