Common terms are not that useful for More Like This. Get rid of
terms with a low IDF. You want selective terms.

Usually, picking the top 20 or so terms by tf.idf will eliminate
the low IDF terms, but you might need to specifically toss those.

Phrase IDF is really, really useful for this.

Note: I wrote the MLT support for Ultraseek.

wunder

On 12/23/08 9:17 AM, "Eric Kilby" <ki...@stylefeeder.com> wrote:
> 
> That is correct, we see similar if not longer query times if run in a regular
> query through the admin tool using the same terms that MLT is selecting.  I
> was testing MLT to see if this was an unavoidable consequence of having
> terms that occur in a large number of documents, or whether it was some
> artifact of how I had the DisMax functionality configured on my other
> handler.
> 
> Am I correct in assuming that the answer to this is "wait for Moore's Law",
> and that I likely would have been looking at 3-5 second query times a couple
> of years ago?
> 
> 
> 
> Eric, from what I can tell from your description, it looks like this could
> indeed be caused by high frequency of some of the query terms.  This is not
> MLT component specific, and I imagine you will see similar performance if
> you just run queries with those terms from, say, the Solr admin page.
> 
> 
> Otis --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

Reply via email to