I wrote something like this for Ultraseek. After the document was parsed and 
analyzed, I took the top terms (by tf.idf) and did a search, then added fields 
with the categories.

You might be able to use the document analysis request handler for this. 
Analyze it, then choose terms, do the search, modify the doc, then submit it 
for indexing. It would get parsed twice, but that might not be a big deal.

Warning, this could put a big load on Solr. My implementation really pounded 
Ultraseek. The queries are long and they don't really match what is in the 
caches.

wunder

On Nov 5, 2012, at 8:40 AM, Raimon Bosch wrote:

> Hi,
> 
> I'm designing a K-nearest neighbors classifier for Solr. So I am taking
> information IMDB and creating a set of documents with the description of
> each movie and the categories selected for each document.
> 
> To validate if the classification is correct I'm using cross-validation. So
> I do not include in the index the documents that I want to guess.
> 
> If I want to use MoreLikeThis algorithm I need to add this documents in the
> index? The MoreLikeThis will work with soft commits? Is there a solution to
> do a MoreLikeThis without adding the document in the index?
> 
> Thanks,
> Raimon Bosch.




Reply via email to