I'm going somewhere with this... be patient. :-) I had asked about this briefly at the SF meetup, but there was a lot going on.
1: Suppose you had Solr 1.4 and all the Carrot^2 DOCUMENT clustering was all in, and you had built the cluster index for all your docs. 2: Then, if you had a particular cluster, and one of the docs in that cluster happened to be your search, then the other documents in the cluster could be considered the results. In effect, the cluster is like the search results. 3: Now imagine you can take an arbitrary doc and find the clusters that document is in. (some clustering engines let you do this). 4: And then imagine that, when somebody submits a search, you quickly turn it into a document, add it to the index, redo the clusters, find the clusters this new temp doc is in, and use that as the results. Benefits? I'm not saying this would be practical, but would it be useful? Or, in particular, would it be more useful than the normal Solr/Lucene relevancy? As I recall Carrot^2 had 3 choices for clustering. And let's assume that the searches coming in are more than the 1.4 words average. Maybe a few sentences or something. I'm mot sure a 1 word query would really benefit from this. :-) Some clustering algorithms don't allow you to find a cluster containing a specific document, so those wouldn't work as a "search engine". More Like This as a "cluster" search? A similar scenario could be made for the "more like this" feature. Take a user's search text (presumably lengthy), quickly index it, then use that new temp doc as a MLT seed doc. I haven't looked deep into the code, it might be that it uses essentially the same relevancy as a query. -- Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513