Hi,
there is an indexed field in my Solr's schema, in which one phrase is
stored per document.
I have to implement a feature that will allow users to have "more like
this" results, based on the contents of this field.
I think that the Solr's built in "more like this" feature requires too
many terms to be effective, maybe it's not the case.
I would like to use a custom algorithm, probably based on the Jaccard
Index <http://en.wikipedia.org/wiki/Jaccard_index>.
I see three options :
1 - create a Solr plug-in, which would introduce a custom "More like
this" feature. That might be tricky.
2 - the quick and dirty way : sending queries that are crafted from the
client side. Given the phrase : "term1 term2 term3 term4", it would be
something like that:
(term1 AND term2 AND term3) OR (term1 AND term2 AND term4) OR (term1 AND
term3 AND term4) OR ...
With a good list of stop words, and well thought thresholds for the
numbers of terms, the queries should not become too long.
3 - working with a stop word list and more like this parameters
I would have time to develop a solr's plugin, but I don't know how hard
it would be.
Thanks in advance for your advices,
Xavier S.