Hi,

I have tested MoreLikeThis in Solr (nightly build from September 8
which includes the patch SOLR-595) and have problems with the
mlt.qf-boosting option.

The standard (example) Solr configuration is modified the following
ways. The solrconfig.xml is only modified to:

   <requestHandler name="/mlt" class="solr.MoreLikeThisHandler">
     <lst name="defaults">
     </lst>
   </requestHandler>

In Schema.xml the following fields are declared:

   <field name="uid" type="slong" indexed="true" stored="true"
required="true" />
   <field name="title" type="text" indexed="true" stored="true"
termVectors="true"/>
   <field name="content" type="text" indexed="true" stored="true"
termVectors="true"/>

and the following is added/changed:

 <uniqueKey>uid</uniqueKey>
 <defaultSearchField>text</defaultSearchField>
   <copyField source="title" dest="text"/>
   <copyField source="content" dest="text"/>

I have indexed the following two docs:
<add>
<doc>
  <field name="uid">1</field>
  <field name="title">Google</field>
  <field name="content">yahoo yahoo google</field>
</doc>
<doc>
  <field name="uid">2</field>
  <field name="title">yahoo</field>
  <field name="content">yahoo google google</field>
</doc>
</add>

I can retrieve both documents by the query:

http://server:8983/solr/select?q=*:*

When trying the following MoreLikeThis query:

http://server:8983/solr/mlt?stream.body=what%20is%20yahoo&mlt.interestingTerms=details&mlt.fl=title,content&mlt.mintf=1&mlt.mindf=1&mlt.boost=true

I get the results:
<response>
−
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">6</int>
</lst>
−
<result name="response" numFound="2" start="0">
−
<doc>
<str name="content">yahoo yahoo google</str>
<str name="title">Google</str>
<long name="uid">1</long>
</doc>
−
<doc>
<str name="content">yahoo google google</str>
<str name="title">yahoo</str>
<long name="uid">2</long>
</doc>
</result>
−
<lst name="interestingTerms">
<float name="content:yahoo">1.0</float>
</lst>
</response>

Document 1 is probably a better match since the word yahoo is present
two times. That seems fine, although I did not expect to see the
"content:" part in the list of interestingTerms.

So, I would like to boost the title field so that Docment 2 appears on
top. I add a mlt.qf part to the query as:

http://server:8983/solr/mlt?stream.body=what%20is%20yahoo&mlt.interestingTerms=details&mlt.fl=title,content&mlt.mintf=1&mlt.mindf=1&mlt.boost=true&mlt.qf=title^10.0%20content^0.1

but the response is exactly the same as for the query without the mlt.qf.

The problem seems to me to be related to the "content:" part of the
interestingTerms list. I would have expected to read only "yahoo",
"text:yahoo" or maybe "title:yahoo content:yahoo".

Anyone with any idea of what can be wrong?

Cheers
Clas

Reply via email to