More Like This already is kNN. It extracts features from the document (makes a query), and runs that query against the collection.
If you want the items most similar to the current item, use MLT. wunder On Jun 28, 2013, at 11:02 AM, Luis Carlos Guerrero Covo wrote: > Hey saikat, thanks for your suggestion. I've looked into mahout and other > alternatives for computing k nearest neighbors. I would have to run a job > and computer the k nearest neighbors and track them in the index for > retrieval. I wanted to see if this was something I could do with lucene > using lucene's scoring function and solr's morelikethis component. The job > you specifically mention is for Item based recommendation which would > require me to track the different items users have viewed. I'm looking for > a content based approach where I would use a distance measure to establish > how near items are (how similar) and have some kind of training phase to > adjust weights. > > > On Fri, Jun 28, 2013 at 12:42 PM, Saikat Kanjilal <sxk1...@hotmail.com>wrote: > >> Why not just use mahout to do this, there is an item similarity algorithm >> in mahout that does exactly this :) >> >> >> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.html >> >> You can use mahout in distributed and non-distributed mode as well. >> >>> From: lcguerreroc...@gmail.com >>> Date: Fri, 28 Jun 2013 12:16:57 -0500 >>> Subject: Content based recommender using lucene/solr >>> To: solr-user@lucene.apache.org; java-u...@lucene.apache.org >>> >>> Hi, >>> >>> I'm using lucene and solr right now in a production environment with an >>> index of about a million docs. I'm working on a recommender that >> basically >>> would list the n most similar items to the user based on the current item >>> he is viewing. >>> >>> I've been thinking of using solr/lucene since I already have all docs >>> available and I want a quick version that can be deployed while we work >> on >>> a more robust recommender. How about overriding the default similarity so >>> that it scores documents based on the euclidean distance of normalized >> item >>> attributes and then using a morelikethis component to pass in the >>> attributes of the item for which I want to generate recommendations? I >> know >>> it has its issues like recomputing scores/normalization/weight >> application >>> at query time which could make this idea unfeasible/impractical. I'm at a >>> very preliminary stage right now with this and would love some >> suggestions >>> from experienced users. >>> >>> thank you, >>> >>> Luis Guerrero >> >> > > > > -- > Luis Carlos Guerrero Covo > M.S. Computer Engineering > (57) 3183542047 -- Walter Underwood wun...@wunderwood.org