Hi All, I'm facing an issue in relevancy calculation by dismax query parser. The boost factor applied does not work as expected in certain cases when the keyword is generic and by generic I mean, if the keyword is appearing many times in the document as well as in the index.
I have parser configuration as below: <requestHandler name="querydismax" class="solr.SearchHandler" > <lst name="defaults"> <str name="defType">edismax</str> <str name="echoParams">explicit</str> <float name="tie">0.01</float> <str name="qf">series_title^500 title^100 description^15 contribution</str> <str name="pf">series_title^200</str> <int name="ps">0</int> <str name="q.alt">*:*</str> </lst> </requestHandler> As you can see above, I'd expect the documents containing the matches for series title should rank higher than the ones in contribution. This works well, if I type in a query like 'wonderworld' which is a less occurring term and the series titles rank higher. But, if I type in a keyword like 'news' which is the most common term in the index, I get hits in contributions even though I have lots of documents having word news in series title. The field definition is as below: <field name="series_title" type="text_wc" indexed="true" stored="true" multiValued="false" /> <field name="title" type="text_wc" indexed="true" stored="true" multiValued="false" /> <field name="description" type="text_wc" indexed="true" stored="true" multiValued="false" /> <field name="contribution" type="text" indexed="true" stored="true" multiValued="true" /> <fieldType name="text" class="solr.TextField" positionIncrementGap="100" compressThreshold="10"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> <fieldType name="text_wc" class="solr.TextField" positionIncrementGap="100" > <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1" splitOnNumerics="0" preserveOriginal="1" /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1" splitOnNumerics="0" preserveOriginal="1" /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> I have tried debugging and when I use query term news, I see that matches for contributions are ranked higher than series title. The parsed queries look like below: (Note that I have edited the query as in reality I have lot of fields that are searchable and I have only mentioned the fields containing text data - rest all contain uuids) <str name="parsedquery"> (+DisjunctionMaxQuery((description:news^15.0 | title:news^100.0 | contributions:news | series_title:news^500.0)~0.01) () () () () () () () () () () () () () () () () () () () () () () () () () () () ())/no_coord </str> <str name="parsedquery_toString"> +(description:news^15 | title:news^100.0 | contributions:news | series_title:news^500.0)~0.01 () () () () () () () () () () () () () () () () () () () () () () () () () () () () Could you guide me in right direction please? Many Thanks, Sandeep