: I have some documents, each has a number of tags. I'd like to
: have a query to return "similar" documents which share largest
: number of tags with a given document. For example, if I have
: doc that has 4 tags, and I'd like to return docs that also
: have these 4 tags. And if this doesn't make up a number of records,
: say, 10 records, I'd like to have some more docs that share 3

if by "tags": you mean in the web folksonomy sense then assuming:
  1) your tag field doesn't contain any duplicate tags per doc
  2) you omitNorms="true" on your tags field

...a generic search on all of hte tag names you are interested in should
be almost exctly what you asked for ... the one difference being that
Lucene by defualt weights terms that are infrequent in your index more
then terms that are frequent .. so doc A matching on 3 tags might score
higher then doc B matching on 4 tags if the tags A matches on are really
rare tags that not a lot of docs match on but the tags B matches on are
really REALLY common.

...not exactly what you asked about, but probably something that you'll
appreciate hving once you see it in action.




-Hoss

Reply via email to