It is way too slow
Sent from my Mobile device
720-256-8076
On Mar 11, 2012, at 12:07 PM, Pat Ferrel wrote:
> I found a description here:
> http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/
>
> If it is the same four years later, it looks like lucene is doing an index
> look
I found a description here:
http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/
If it is the same four years later, it looks like lucene is doing an
index lookup for each important term in the example doc boosting each
term based on the term weights. My guess would be that this
MoreLikeThis looks exactly like what I need. I would probably create a new
"like" method to take a mahout vector and build a search? I build the vector by
starting from a doc and reweighting certain terms. The prototype just reweights words but
I may experiment with dirichlet clusters and rewei
Maybe that's exactly it but... given a document with n tokens A, and m tokens
B, a query A^n B^m would find what you're looking for or?
paul
PS I've always viewed queries as linear forms on the vector space and I'd like
to see this really mathematically written one day...
Le 11 mars 2012 à 07:
Look at the MoreLikeThis feature in Lucene. I believe it does roughly
what you describe.
On Sat, Mar 10, 2012 at 9:58 AM, Pat Ferrel wrote:
> I have a case where I'd like to get documents which most closely match a
> particular vector. The RowSimilarityJob of Mahout is ideal for
> precalculating
I have a case where I'd like to get documents which most closely match a
particular vector. The RowSimilarityJob of Mahout is ideal for
precalculating similarity between existing documents but in my case the
query is constructed at run time. So the UI constructs a vector to be
used as a query.