I have a question regarding using large number of weighted tags in order to 
compare documents using Solr.

Basically, I have a set of domain objects, each of which has many properties, 
and from these I'm creating documents which are added to Solr. The properties 
are all being turned into tags, so the Solr document simply has a field to 
identify the object, and a large number of tags describing it (say on average ~ 
150 of these tags for each document). Right now the tags are bound to specific 
terms, but are sometimes accompanied with a numeric value. Each tag will need 
to be weighted since some of the properties are more significant for comparison 
than others.

Given one document, I want to be able to find similar documents by comparing 
the tags.   Should I utilize Term Vectors and MoreLikeThis functionality for 
this, or do Term Vectors only work with the frequency of the term (which will 
usually only be at most once for each tag in each document)? Should I be 
looking at the DisMax query handler instead in order to apply boosts to tag 
values?


Aidan

Reply via email to