Re: Solr 1.4 Clustering / mlt AS search?

Grant Ingersoll Tue, 11 Aug 2009 12:41:17 -0700

Inline...

On Aug 11, 2009, at 12:44 PM, Mark Bennett wrote:

I'm going somewhere with this... be patient. :-) I had asked aboutthis
briefly at the SF meetup, but there was a lot going on.
1: Suppose you had Solr 1.4 and all the Carrot^2 DOCUMENT clusteringwas all
in, and you had built the cluster index for all your docs.

2: Then, if you had a particular cluster, and one of the docs in that
cluster happened to be your search, then the other documents in theclustercould be considered the results. In effect, the cluster is like thesearch
results.
3: Now imagine you can take an arbitrary doc and find the clustersthat
document is in.  (some clustering engines let you do this).
4: And then imagine that, when somebody submits a search, youquickly turn
it into a document, add it to the index, redo the clusters, find the
clusters this new temp doc is in, and use that as the results.

I guess I'd argue that this is already what Lucene does, except forthe part about adding the query into the document set. The LuceneQuery is just your arbitrary document. Really, the primary differenceas I see it, I think, is that you want a the Carrot2 scoring mechanisminstead of theexisting Lucene one, no? Otherwise, I don't see much benefit toactually indexing the query, other than it could potentially be usedto skew results over time as people ask the same queries over and overagain.

Under a certain lens, couldn't you just argue that search is findingall the docs that cluster around your query? (I know that isn't thetraditional description, but regardless, the math underneath is oftenvery similar)

Benefits?
I'm not saying this would be practical, but would it be useful? Or,inparticular, would it be more useful than the normal Solr/Lucenerelevancy?
As I recall Carrot^2 had 3 choices for clustering.

And let's assume that the searches coming in are more than the 1.4wordsaverage. Maybe a few sentences or something. I'm mot sure a 1 wordquery
would really benefit from this.  :-)
Some clustering algorithms don't allow you to find a clustercontaining a
specific document, so those wouldn't work as a "search engine".

More Like This as a "cluster" search?
A similar scenario could be made for the "more like this" feature.Take auser's search text (presumably lengthy), quickly index it, then usethat newtemp doc as a MLT seed doc. I haven't looked deep into the code, itmight
be that it uses essentially the same relevancy as a query.

Again, I don't see the benefit of indexing it. You slightly peturbthe corpus statistics, but other than that, how is it different fromjust submitting the query and getting back the results?

Re: Solr 1.4 Clustering / mlt AS search?

Reply via email to