: I have some documents, each has a number of tags. I'd like to : have a query to return "similar" documents which share largest : number of tags with a given document. For example, if I have : doc that has 4 tags, and I'd like to return docs that also : have these 4 tags. And if this doesn't make up a number of records, : say, 10 records, I'd like to have some more docs that share 3
if by "tags": you mean in the web folksonomy sense then assuming: 1) your tag field doesn't contain any duplicate tags per doc 2) you omitNorms="true" on your tags field ...a generic search on all of hte tag names you are interested in should be almost exctly what you asked for ... the one difference being that Lucene by defualt weights terms that are infrequent in your index more then terms that are frequent .. so doc A matching on 3 tags might score higher then doc B matching on 4 tags if the tags A matches on are really rare tags that not a lot of docs match on but the tags B matches on are really REALLY common. ...not exactly what you asked about, but probably something that you'll appreciate hving once you see it in action. -Hoss