Thanks for the answer; and try to enjoy your vacation / travel! Can't
wait to be able to interface with MoreLikeThis within Solr!
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212
Erik Hatcher wrote:
On Sep 12, 2006, at 3:41 PM, Michael Imbeault wrote:
I haven't looked at the specifics of how MoreLikeThis determine which
items are similar; I'm mainly wondering about performance here.
Yesterday I tried to code myself a poor man's similarity class (which
was nothing more than doing a search with OR between words and
sorting by score), and the performance was abysmal (well, I kinda
expected it. 1000+ words queries on a 15 millions docs collection,
you don't expect miracles). At first glance I think it searches for
the most 'relevant' words, I'm I right? What kind of performance are
you getting with it?
Performance with MoreLikeThis is not an issue. It has many parameters
to tune how many terms are used in the query it builds, and it pulls
these terms in an extremely efficient manner from the Lucene index.
I'm doing some traveling soon, which is always a good time to hack on
something tractable like adding MoreLikeThis to Solr. So your wish
may be granted in a week :)
Erik