More Like This already is kNN. It extracts features from the document (makes a 
query), and runs that query against the collection.

If you want the items most similar to the current item, use MLT.

wunder

On Jun 28, 2013, at 11:02 AM, Luis Carlos Guerrero Covo wrote:

> Hey saikat, thanks for your suggestion. I've looked into mahout and other
> alternatives for computing k nearest neighbors. I would have to run a job
> and computer the k nearest neighbors and track them in the index for
> retrieval. I wanted to see if this was something I could do with lucene
> using lucene's scoring function and solr's morelikethis component. The job
> you specifically mention is for Item based recommendation which would
> require me to track the different items users have viewed. I'm looking for
> a content based approach where I would use a distance measure to establish
> how near items are (how similar) and have some kind of training phase to
> adjust weights.
> 
> 
> On Fri, Jun 28, 2013 at 12:42 PM, Saikat Kanjilal <sxk1...@hotmail.com>wrote:
> 
>> Why not just use mahout to do this, there is an item similarity algorithm
>> in mahout that does exactly this :)
>> 
>> 
>> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.html
>> 
>> You can use mahout in distributed and non-distributed mode as well.
>> 
>>> From: lcguerreroc...@gmail.com
>>> Date: Fri, 28 Jun 2013 12:16:57 -0500
>>> Subject: Content based recommender using lucene/solr
>>> To: solr-user@lucene.apache.org; java-u...@lucene.apache.org
>>> 
>>> Hi,
>>> 
>>> I'm using lucene and solr right now in a production environment with an
>>> index of about a million docs. I'm working on a recommender that
>> basically
>>> would list the n most similar items to the user based on the current item
>>> he is viewing.
>>> 
>>> I've been thinking of using solr/lucene since I already have all docs
>>> available and I want a quick version that can be deployed while we work
>> on
>>> a more robust recommender. How about overriding the default similarity so
>>> that it scores documents based on the euclidean distance of normalized
>> item
>>> attributes and then using a morelikethis component to pass in the
>>> attributes of the item for which I want to generate recommendations? I
>> know
>>> it has its issues like recomputing scores/normalization/weight
>> application
>>> at query time which could make this idea unfeasible/impractical. I'm at a
>>> very preliminary stage right now with this and would love some
>> suggestions
>>> from experienced users.
>>> 
>>> thank you,
>>> 
>>> Luis Guerrero
>> 
>> 
> 
> 
> 
> -- 
> Luis Carlos Guerrero Covo
> M.S. Computer Engineering
> (57) 3183542047

--
Walter Underwood
wun...@wunderwood.org



Reply via email to