After rebuilding my index over the weekend with termVectors enabled for the relevant fields, I've run some basic testing against the MoreLikeThis handler with these settings from the SOLR Wiki {boost=true, mindf=1, mintf=1}.
My index contains around 20M documents, averaging under 1K of content with some outliers as large as 4-8K. The total index size on disk is 29G. The two largest fields are not involved in the searches as they're fairly "noisy" when it comes to similarity. There is a filter query enabled based on a document status field, which cuts the number of documents under consideration approximately in half (11M will pass it). I've got the MLT handler searching against three fields right now, each of which typically has 3-10 words, and when searching for 10 matches I'm seeing results typically in the 700ms to 1.4 seconds range on the first run for a given document ID. Given that our main use case involves random access, I'm curious as to if it's normal or not to see results in this range for a query like this on an index this size. The index is optimized and not being written to at the time of testing. I've tried limiting the fields returned to just the ID and the results are similar, so I don't think this is related to stored fields. I tried increasing the mintf, but since I'm using small fields that pretty much resulted in no usable terms being extracted on most documents and thus no results. Increasing the maxqt seems like it may help sometimes, but not all the time and at the cost of visibly less relevant results. Turning on the debug information, it looks like most of the time is spent on terms that are very common, some of which match hundreds of thousands of documents. Is the query time just a natural extrapolation of scoring the large number of documents? I appreciate any advice/evidence that anyone on here has run into. I'm new to the MLT functionality, and hope I can get it to work well in our system. thanks, Eric K. -- View this message in context: http://www.nabble.com/is-my-MoreLikeThis-performance-normal--tp21129767p21129767.html Sent from the Solr - User mailing list archive at Nabble.com.