Hi,

there is an indexed field in my Solr's schema, in which one phrase is stored per document. I have to implement a feature that will allow users to have "more like this" results, based on the contents of this field. I think that the Solr's built in "more like this" feature requires too many terms to be effective, maybe it's not the case. I would like to use a custom algorithm, probably based on the Jaccard Index <http://en.wikipedia.org/wiki/Jaccard_index>.

I see three options :

1 - create a Solr plug-in, which would introduce a custom "More like this" feature. That might be tricky.

2 - the quick and dirty way : sending queries that are crafted from the client side. Given the phrase : "term1 term2 term3 term4", it would be something like that: (term1 AND term2 AND term3) OR (term1 AND term2 AND term4) OR (term1 AND term3 AND term4) OR ... With a good list of stop words, and well thought thresholds for the numbers of terms, the queries should not become too long.

3 - working with a stop word list and more like this parameters


I would have time to develop a solr's plugin, but I don't know how hard it would be.


Thanks in advance for your advices,


Xavier S.

Reply via email to