Hi All,

I'm facing an issue in relevancy calculation by dismax query parser.
The boost factor applied does not work as expected in certain cases when
the keyword is generic and by generic I mean, if the keyword is appearing
many times in the document as well as in the index.

I have parser configuration as below:

<requestHandler name="querydismax" class="solr.SearchHandler" >
        <lst name="defaults">
            <str name="defType">edismax</str>
            <str name="echoParams">explicit</str>
            <float name="tie">0.01</float>
            <str name="qf">series_title^500 title^100 description^15
contribution</str>
            <str name="pf">series_title^200</str>
            <int name="ps">0</int>
            <str name="q.alt">*:*</str>
        </lst>
</requestHandler>

As you can see above, I'd expect the documents containing the matches for
series title should rank higher than the ones in contribution.

This works well, if I type in a query like 'wonderworld' which is a less
occurring term and the series titles rank higher. But, if I type in a
keyword like 'news' which is the most common term in the index, I get hits
in contributions even though I have lots of documents having word news in
series title.

The field definition is as below:

<field name="series_title" type="text_wc" indexed="true" stored="true"
multiValued="false" />
<field name="title" type="text_wc" indexed="true" stored="true"
multiValued="false" />
<field name="description" type="text_wc" indexed="true" stored="true"
multiValued="false" />
<field name="contribution" type="text" indexed="true" stored="true"
multiValued="true" />

<fieldType name="text" class="solr.TextField" positionIncrementGap="100"
compressThreshold="10">
            <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
                <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
                <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>
        </fieldType>

<fieldType name="text_wc" class="solr.TextField" positionIncrementGap="100"
>
            <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory"
stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
splitOnNumerics="0" preserveOriginal="1" />
                <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory"
stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"
catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
splitOnNumerics="0" preserveOriginal="1" />
                <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>
 </fieldType>

I have tried debugging and when I use query term news, I see that matches
for contributions are ranked higher than series title. The parsed queries
look like below:
(Note that I have edited the query as in reality I have lot of fields that
are searchable and I have only mentioned the fields containing text data -
rest all contain uuids)

<str name="parsedquery">
(+DisjunctionMaxQuery((description:news^15.0 | title:news^100.0 |
contributions:news | series_title:news^500.0)~0.01) () () () () () () () ()
() () () () () () () () () () () () () () () () () () () ())/no_coord
</str>
<str name="parsedquery_toString">
+(description:news^15 | title:news^100.0 | contributions:news |
series_title:news^500.0)~0.01 () () () () () () () () () () () () () () ()
() () () () () () () () () () () () ()


Could you guide me in right direction please?

Many Thanks,
Sandeep

Reply via email to