On Sep 12, 2006, at 3:41 PM, Michael Imbeault wrote:
I haven't looked at the specifics of how MoreLikeThis determine which items are similar; I'm mainly wondering about performance here. Yesterday I tried to code myself a poor man's similarity class (which was nothing more than doing a search with OR between words and sorting by score), and the performance was abysmal (well, I kinda expected it. 1000+ words queries on a 15 millions docs collection, you don't expect miracles). At first glance I think it searches for the most 'relevant' words, I'm I right? What kind of performance are you getting with it?

Performance with MoreLikeThis is not an issue. It has many parameters to tune how many terms are used in the query it builds, and it pulls these terms in an extremely efficient manner from the Lucene index.

I'm doing some traveling soon, which is always a good time to hack on something tractable like adding MoreLikeThis to Solr. So your wish may be granted in a week :)

        Erik

Reply via email to