Eric, from what I can tell from your description, it looks like this could indeed be caused by high frequency of some of the query terms. This is not MLT component specific, and I imagine you will see similar performance if you just run queries with those terms from, say, the Solr admin page.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Eric Kilby <ki...@stylefeeder.com> > To: solr-user@lucene.apache.org > Sent: Monday, December 22, 2008 10:36:54 AM > Subject: is my MoreLikeThis performance normal? > > > After rebuilding my index over the weekend with termVectors enabled for the > relevant fields, I've run some basic testing against the MoreLikeThis > handler with these settings from the SOLR Wiki {boost=true, mindf=1, > mintf=1}. > > My index contains around 20M documents, averaging under 1K of content with > some outliers as large as 4-8K. The total index size on disk is 29G. The > two largest fields are not involved in the searches as they're fairly > "noisy" when it comes to similarity. There is a filter query enabled based > on a document status field, which cuts the number of documents under > consideration approximately in half (11M will pass it). > > I've got the MLT handler searching against three fields right now, each of > which typically has 3-10 words, and when searching for 10 matches I'm seeing > results typically in the 700ms to 1.4 seconds range on the first run for a > given document ID. Given that our main use case involves random access, I'm > curious as to if it's normal or not to see results in this range for a query > like this on an index this size. The index is optimized and not being > written to at the time of testing. > > I've tried limiting the fields returned to just the ID and the results are > similar, so I don't think this is related to stored fields. I tried > increasing the mintf, but since I'm using small fields that pretty much > resulted in no usable terms being extracted on most documents and thus no > results. Increasing the maxqt seems like it may help sometimes, but not all > the time and at the cost of visibly less relevant results. > > Turning on the debug information, it looks like most of the time is spent on > terms that are very common, some of which match hundreds of thousands of > documents. Is the query time just a natural extrapolation of scoring the > large number of documents? > > I appreciate any advice/evidence that anyone on here has run into. I'm new > to the MLT functionality, and hope I can get it to work well in our system. > > thanks, > Eric K. > -- > View this message in context: > http://www.nabble.com/is-my-MoreLikeThis-performance-normal--tp21129767p21129767.html > Sent from the Solr - User mailing list archive at Nabble.com.