Re: Implementing MoreLikeThis feature

Zheng Lin Edwin Yeo Tue, 14 Jul 2015 18:23:05 -0700

Thanks for the suggestions. The results are more accurate now after I
adjust those settings.


Regards,
Edwin

On 14 July 2015 at 16:46, Upayavira <u...@odoko.co.uk> wrote:

> There's two ways to "tweak" MLT. Use the parameters (such as minimum
> term frequency) and so on, or use stop words when indexing.
>
> I'd suggest you try those as a means to improve quality!
>
> Upayavira
>
> On Tue, Jul 14, 2015, at 09:28 AM, Zheng Lin Edwin Yeo wrote:
> > Thanks for your advice. I've indexed more content in and it's working
> > better now. Not all the index will be returned everytime now.
> >
> > However, I found that the longer documents will tend to have a higher
> > score
> > than those shorter documents, even though the shorter documents is
> > suppose
> > to have a better match (more similar) to the query than the longer
> > documents. Is it because of words like "and", "the", etc that causes the
> > score of the longer documents to increase?
> >
> > Is there anyway to configure this so that I can get the shorter documents
> > to have a higher score if they are of better match, or is it just more
> > indexes will solve this problem?
> >
> >
> > Regards,
> > Edwin
> >
> >
> >
> > On 14 July 2015 at 15:40, Upayavira <u...@odoko.co.uk> wrote:
> >
> > > Look at your "interesting terms". If your index is too small, it will
> > > consider words like "and", "the", etc to be "interesting" and form a
> > > part of the query, thus returning your entire index, which doesn't
> help.
> > >
> > > Effectively what MLT does is attempt to pick the 25 (configurable) best
> > > terms in the source document and forms a Lucene query based upon them.
> > > It takes the frequency of the terms in your index and in the document
> > > into account when scoring the terms (much like TF/IDF). For this to
> > > really work, you need a reasonable amount of content.
> > >
> > > Upayavira
> > >
> > > On Tue, Jul 14, 2015, at 07:40 AM, Zheng Lin Edwin Yeo wrote:
> > > > Hi,
> > > >
> > > > I'm using Solr 5.2.1 and I'm trying to implement MoreLikeThis
> feature in
> > > > Solr.
> > > >
> > > > But the results that I've been getting for the MoreLikeThis has not
> been
> > > > accurate so far. I've been getting the entire documents in the
> collection
> > > > returned in the "response" section even though the documents has no
> > > > similar
> > > > match to my query.
> > > >
> > > > For example, if I have 10 records in the collections, 1 will be
> under the
> > > > "match" section, while the other 9 will be under the "response"
> section,
> > > > even though there's only 1 or 2 that's related to the one under the
> > > > "match"
> > > > section.
> > > >
> > > > Below is my configuration in solrconfig.xml:
> > > >
> > > > <requestHandler name="/mlt" class="solr.MoreLikeThisHandler" >
> > > > <lst name="defaults">
> > > > <str name="echoParams">explicit</str>
> > > > <str name="wt">json</str>
> > > > <str name="indent">true</str>
> > > >  <str name="defType">edismax</str>
> > > > <str name="fl">id, score</str>
> > > > <str name="mlt.qf">
> > > >  Objective^20.0 Summary^10.0
> > > > </str>
> > > >
> > > > <str name="df">Summary</str>
> > > > <str name="mlt.fl">Objective,Summary</str>
> > > > <str name="mlt.mintf">2</str>
> > > >                         <str name="mlt.mindf">5</str>
> > > > <str name="mlt.maxqt">10</str>
> > > > <str name="mlt.count">10</str>
> > > > <str name="mlt.boost">true</str>
> > > > <str name="mlt.interestingTerms">details</str>
> > > > </lst>
> > > > </requestHandler>
> > > >
> > > >
> > > > Regards,
> > > > Edwin
> > >
>

Re: Implementing MoreLikeThis feature

Reply via email to