I have a case where I'd like to get documents which most closely match a
particular vector. The RowSimilarityJob of Mahout is ideal for
precalculating similarity between existing documents but in my case the
query is constructed at run time. So the UI constructs a vector to be
used as a query.
g for or?
paul
PS I've always viewed queries as linear forms on the vector space and I'd like
to see this really mathematically written one day...
Le 11 mars 2012 à 07:23, Lance Norskog a écrit :
Look at the MoreLikeThis feature in Lucene. I believe it does roughly
what you describe.
O
this is a little
slower than 2-3word query but still scalable.
Has anyone used this on a very large index?
Thanks,
Pat
On 3/11/12 10:45 AM, Pat Ferrel wrote:
MoreLikeThis looks exactly like what I need. I would probably create a
new "like" method to take a mahout vector and build a