With regards my second question, re. More Like this, I do see:
"The MoreLikeThisHandler can also use a ContentStream to find similar
documents. It will extract the "interesting terms" from the posted text."
at http://wiki.apache.org/solr/MoreLikeThisHandler
and that it uses the TF/IDF stuff.

Still wondering if anybody's tried MLK or Carrot clustering as a primary
search entry point.

On Tue, Aug 11, 2009 at 9:44 AM, Mark Bennett <mbenn...@ideaeng.com> wrote:

> I'm going somewhere with this... be patient.  :-)  I had asked about this
> briefly at the SF meetup, but there was a lot going on.
>
> 1: Suppose you had Solr 1.4 and all the Carrot^2 DOCUMENT clustering was
> all in, and you had built the cluster index for all your docs.
>
> 2: Then, if you had a particular cluster, and one of the docs in that
> cluster happened to be your search, then the other documents in the cluster
> could be considered the results.  In effect, the cluster is like the search
> results.
>
> 3: Now imagine you can take an arbitrary doc and find the clusters that
> document is in.  (some clustering engines let you do this).
>
> 4: And then imagine that, when somebody submits a search, you quickly turn
> it into a document, add it to the index, redo the clusters, find the
> clusters this new temp doc is in, and use that as the results.
>
> Benefits?
>
> I'm not saying this would be practical, but would it be useful?  Or, in
> particular, would it be more useful than the normal Solr/Lucene relevancy?
> As I recall Carrot^2 had 3 choices for clustering.
>
> And let's assume that the searches coming in are more than the 1.4 words
> average.  Maybe a few sentences or something.  I'm mot sure a 1 word query
> would really benefit from this.  :-)
>
> Some clustering algorithms don't allow you to find a cluster containing a
> specific document, so those wouldn't work as a "search engine".
>
> More Like This as a "cluster" search?
>
> A similar scenario could be made for the "more like this" feature.  Take a
> user's search text (presumably lengthy), quickly index it, then use that new
> temp doc as a MLT seed doc.  I haven't looked deep into the code, it might
> be that it uses essentially the same relevancy as a query.
>
> --
> Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
> Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
>

Reply via email to