Thanks for that Eric; It looks like a very good implementation of the
class. If you ever find time to add it to the query handlers in Solr,
I'm sure it would be wonderful for tons of users (solr has tons of
users, right? it definitively should!).
I haven't looked at the specifics of how MoreLikeThis determine which
items are similar; I'm mainly wondering about performance here.
Yesterday I tried to code myself a poor man's similarity class (which
was nothing more than doing a search with OR between words and sorting
by score), and the performance was abysmal (well, I kinda expected it.
1000+ words queries on a 15 millions docs collection, you don't expect
miracles). At first glance I think it searches for the most 'relevant'
words, I'm I right? What kind of performance are you getting with it?
Thanks a lot,
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212
Erik Hatcher wrote:
I use MoreLikeThis in a custom request handler for Collex, for example
the three items shown at the bottom left here:
<http://svn.sourceforge.net/viewvc/patacriticism/collex/trunk/src/solr/org/nines/TermQueryRequestHandler.java?revision=391&view=markup>
I would like to get MoreLikeThis hooked into the
StandardRequestHandler just like highlighting and facets are now. One
of these days I'll carve out time to do that if no one beats me to
it. It would not be difficult to do, it would just take some time to
iron out how to parameterize it cleanly for general-purpose use.
Erik