Thanks for the suggestions. The results are more accurate now after I adjust those settings.
Regards, Edwin On 14 July 2015 at 16:46, Upayavira <u...@odoko.co.uk> wrote: > There's two ways to "tweak" MLT. Use the parameters (such as minimum > term frequency) and so on, or use stop words when indexing. > > I'd suggest you try those as a means to improve quality! > > Upayavira > > On Tue, Jul 14, 2015, at 09:28 AM, Zheng Lin Edwin Yeo wrote: > > Thanks for your advice. I've indexed more content in and it's working > > better now. Not all the index will be returned everytime now. > > > > However, I found that the longer documents will tend to have a higher > > score > > than those shorter documents, even though the shorter documents is > > suppose > > to have a better match (more similar) to the query than the longer > > documents. Is it because of words like "and", "the", etc that causes the > > score of the longer documents to increase? > > > > Is there anyway to configure this so that I can get the shorter documents > > to have a higher score if they are of better match, or is it just more > > indexes will solve this problem? > > > > > > Regards, > > Edwin > > > > > > > > On 14 July 2015 at 15:40, Upayavira <u...@odoko.co.uk> wrote: > > > > > Look at your "interesting terms". If your index is too small, it will > > > consider words like "and", "the", etc to be "interesting" and form a > > > part of the query, thus returning your entire index, which doesn't > help. > > > > > > Effectively what MLT does is attempt to pick the 25 (configurable) best > > > terms in the source document and forms a Lucene query based upon them. > > > It takes the frequency of the terms in your index and in the document > > > into account when scoring the terms (much like TF/IDF). For this to > > > really work, you need a reasonable amount of content. > > > > > > Upayavira > > > > > > On Tue, Jul 14, 2015, at 07:40 AM, Zheng Lin Edwin Yeo wrote: > > > > Hi, > > > > > > > > I'm using Solr 5.2.1 and I'm trying to implement MoreLikeThis > feature in > > > > Solr. > > > > > > > > But the results that I've been getting for the MoreLikeThis has not > been > > > > accurate so far. I've been getting the entire documents in the > collection > > > > returned in the "response" section even though the documents has no > > > > similar > > > > match to my query. > > > > > > > > For example, if I have 10 records in the collections, 1 will be > under the > > > > "match" section, while the other 9 will be under the "response" > section, > > > > even though there's only 1 or 2 that's related to the one under the > > > > "match" > > > > section. > > > > > > > > Below is my configuration in solrconfig.xml: > > > > > > > > <requestHandler name="/mlt" class="solr.MoreLikeThisHandler" > > > > > <lst name="defaults"> > > > > <str name="echoParams">explicit</str> > > > > <str name="wt">json</str> > > > > <str name="indent">true</str> > > > > <str name="defType">edismax</str> > > > > <str name="fl">id, score</str> > > > > <str name="mlt.qf"> > > > > Objective^20.0 Summary^10.0 > > > > </str> > > > > > > > > <str name="df">Summary</str> > > > > <str name="mlt.fl">Objective,Summary</str> > > > > <str name="mlt.mintf">2</str> > > > > <str name="mlt.mindf">5</str> > > > > <str name="mlt.maxqt">10</str> > > > > <str name="mlt.count">10</str> > > > > <str name="mlt.boost">true</str> > > > > <str name="mlt.interestingTerms">details</str> > > > > </lst> > > > > </requestHandler> > > > > > > > > > > > > Regards, > > > > Edwin > > > >