Possible issue in edismax?

Sandeep Mestry Wed, 30 Jan 2013 07:13:33 -0800

Hi All,

I'm facing an issue in relevancy calculation by dismax query parser.
The boost factor applied does not work as expected in certain cases when
the keyword is generic and by generic I mean, if the keyword is appearing
many times in the document as well as in the index.


I have parser configuration as below:

<requestHandler name="querydismax" class="solr.SearchHandler" >
        <lst name="defaults">
            <str name="defType">edismax</str>
            <str name="echoParams">explicit</str>
            <float name="tie">0.01</float>
            <str name="qf">series_title^500 title^100 description^15
contribution</str>
            <str name="pf">series_title^200</str>
            <int name="ps">0</int>
            <str name="q.alt">*:*</str>
        </lst>
</requestHandler>

As you can see above, I'd expect the documents containing the matches for
series title should rank higher than the ones in contribution.

This works well, if I type in a query like 'wonderworld' which is a less
occurring term and the series titles rank higher. But, if I type in a
keyword like 'news' which is the most common term in the index, I get hits
in contributions even though I have lots of documents having word news in
series title.

The field definition is as below:

<field name="series_title" type="text_wc" indexed="true" stored="true"
multiValued="false" />
<field name="title" type="text_wc" indexed="true" stored="true"
multiValued="false" />
<field name="description" type="text_wc" indexed="true" stored="true"
multiValued="false" />
<field name="contribution" type="text" indexed="true" stored="true"
multiValued="true" />

<fieldType name="text" class="solr.TextField" positionIncrementGap="100"
compressThreshold="10">
            <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
                <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
                <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>
        </fieldType>

<fieldType name="text_wc" class="solr.TextField" positionIncrementGap="100"
>
            <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory"
stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
splitOnNumerics="0" preserveOriginal="1" />
                <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory"
stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
splitOnNumerics="0" preserveOriginal="1" />
                <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>
 </fieldType>

I have tried debugging and when I use query term news, I see that matches
for contributions are ranked higher than series title. The parsed queries
look like below:
(Note that I have edited the query as in reality I have lot of fields that
are searchable and I have only mentioned the fields containing text data -
rest all contain uuids)

<str name="parsedquery">
(+DisjunctionMaxQuery((description:news^15.0 | title:news^100.0 |
contributions:news | series_title:news^500.0)~0.01) () () () () () () () ()
() () () () () () () () () () () () () () () () () () () ())/no_coord
</str>
<str name="parsedquery_toString">
+(description:news^15 | title:news^100.0 | contributions:news |
series_title:news^500.0)~0.01 () () () () () () () () () () () () () () ()
() () () () () () () () () () () () ()


Could you guide me in right direction please?

Many Thanks,
Sandeep

Possible issue in edismax?

Reply via email to