I wrote something like this for Ultraseek. After the document was parsed and analyzed, I took the top terms (by tf.idf) and did a search, then added fields with the categories.
You might be able to use the document analysis request handler for this. Analyze it, then choose terms, do the search, modify the doc, then submit it for indexing. It would get parsed twice, but that might not be a big deal. Warning, this could put a big load on Solr. My implementation really pounded Ultraseek. The queries are long and they don't really match what is in the caches. wunder On Nov 5, 2012, at 8:40 AM, Raimon Bosch wrote: > Hi, > > I'm designing a K-nearest neighbors classifier for Solr. So I am taking > information IMDB and creating a set of documents with the description of > each movie and the categories selected for each document. > > To validate if the classification is correct I'm using cross-validation. So > I do not include in the index the documents that I want to guess. > > If I want to use MoreLikeThis algorithm I need to add this documents in the > index? The MoreLikeThis will work with soft commits? Is there a solution to > do a MoreLikeThis without adding the document in the index? > > Thanks, > Raimon Bosch.