Re: Understanding the MoreLikeThis Handler

Alessandro Benedetti Mon, 22 Jun 2015 09:16:08 -0700

The syntax seems find to me.
One of the requirement for the MLT is to have the field(s) to use for the
processing to be stored ( if termVector is enabled better) .


Apparently from your snippets this is not your problem. Can you confirm you
have the field you are interested stored ( it seems so from your snippet) .

Another thing that comes to my mind is the minimum term frequency and
inverse document frequencies :

mlt.mintf
mlt.mindf

mlt.mintf

Minimum Term Frequency - the frequency below which terms will be ignored in
the source doc.

DEFAULT_MIN_TERM_FREQ = 2

mlt.mindf

Minimum Document Frequency - the frequency at which words will be ignored
which do not occur in at least this many docs.

DEFAULT_MIN_DOC_FREQ = 5

Pretty sure we found the "killer" .
The default is preventing you to find the proper matches  as your field is
not containing more than one occurrence for the relevant term.
The default will ignore the terms appearing less than 2 times in the field
and less then 5 docs in the corpus.

This is my example MLT and I am using it with a sample of your data and it
is working :
*Config : *

<requestHandler name="/mlt" class="solr.MoreLikeThisHandler">
  <lst name="defaults">
    <str name="mlt.boost">true</str>
    <str name="mlt.qf">
      title^2.0 description^1.5 content^0.5
    </str>
    <str name="mlt.fl">title,description,content</str>
    <int name="mlt.mintf">1</int>
    <int name="mlt.mindf">1</int>
  </lst>
</requestHandler>

*Query :*

mlt?q=id:86C544948369405D822FA6FBE5EBD49E&wt=json&indent=true&mlt.match.include=true

*Result :*

{
  "responseHeader":{
    "status":0,
    "QTime":1},
  "match":{"numFound":1,"start":0,"docs":[
      {
        "id":"86C544948369405D822FA6FBE5EBD49E",
        "title":"flirt"}]
  },
  "response":{"numFound":3,"start":0,"docs":[
      {
        "id":"86C544948364505D822FA6FBE5EBD49E",
        "title":"flirt"},
      {
        "id":"86C532948364505D822FA6FBE5EBD49E",
        "title":"flirt"},
      {
        "id":"864R2948364505D822FA6FBE5EBD49E",
        "title":"flirt"}]
  }}

Cheers

2015-06-22 16:24 GMT+01:00 Sreekant Sreedharan <sreeka...@alamy.com>:

> Also, in the documentation it says:
>
> /MoreLikeThis constructs a lucene query based on terms within a document.
> For best results, use stored TermVectors in the schema.xml for fields you
> will use for similarity.
>
> If termVectors are not stored, MoreLikeThis will generate terms from stored
> fields.
> /
>
> Does this mean that if a field is not declared as a termVector, it wont be
> recognized by the MLT handler? What specifically does 'stored fields' mean
> when you say MoreLikeThis defaults to it if termVectors are not defined?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Understanding-the-MoreLikeThis-Handler-tp4213279p4213281.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: Understanding the MoreLikeThis Handler

Reply via email to