Re: is my MoreLikeThis performance normal?

Otis Gospodnetic Mon, 22 Dec 2008 19:31:38 -0800

Eric, from what I can tell from your description, it looks like this could 
indeed be caused by high frequency of some of the query terms.  This is not MLT 
component specific, and I imagine you will see similar performance if you just 
run queries with those terms from, say, the Solr admin page.



Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Eric Kilby <[email protected]>
> To: [email protected]
> Sent: Monday, December 22, 2008 10:36:54 AM
> Subject: is my MoreLikeThis performance normal?
> 
> 
> After rebuilding my index over the weekend with termVectors enabled for the
> relevant fields, I've run some basic testing against the MoreLikeThis
> handler with these settings from the SOLR Wiki {boost=true, mindf=1,
> mintf=1}.
> 
> My index contains around 20M documents, averaging under 1K of content with
> some outliers as large as 4-8K.  The total index size on disk is 29G.  The
> two largest fields are not involved in the searches as they're fairly
> "noisy" when it comes to similarity.  There is a filter query enabled based
> on a document status field, which cuts the number of documents under
> consideration approximately in half (11M will pass it).  
> 
> I've got the MLT handler searching against three fields right now, each of
> which typically has 3-10 words, and when searching for 10 matches I'm seeing
> results typically in the 700ms to 1.4 seconds range on the first run for a
> given document ID.  Given that our main use case involves random access, I'm
> curious as to if it's normal or not to see results in this range for a query
> like this on an index this size.  The index is optimized and not being
> written to at the time of testing.  
> 
> I've tried limiting the fields returned to just the ID and the results are
> similar, so I don't think this is related to stored fields.  I tried
> increasing the mintf, but since I'm using small fields that pretty much
> resulted in no usable terms being extracted on most documents and thus no
> results.  Increasing the maxqt seems like it may help sometimes, but not all
> the time and at the cost of visibly less relevant results.
> 
> Turning on the debug information, it looks like most of the time is spent on
> terms that are very common, some of which match hundreds of thousands of
> documents.  Is the query time just a natural extrapolation of scoring the
> large number of documents?
> 
> I appreciate any advice/evidence that anyone on here has run into.  I'm new
> to the MLT functionality, and hope I can get it to work well in our system.
> 
> thanks,
> Eric K.
> -- 
> View this message in context: 
> http://www.nabble.com/is-my-MoreLikeThis-performance-normal--tp21129767p21129767.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: is my MoreLikeThis performance normal?

Reply via email to