We have a distributed index consisting of several shards. There could be some documents repeated across shards. We want to remove the duplicate records from the documents returned from the shards, and re-order the results by grouping them on the basis of a clustering algorithm and reranking the documents within a cluster on the basis of log of a particular returned field value. How do we go about achieving this? Should we write this logic by implementing QueryResponseWriter. Also if we remove duplicate records, the total number of records that are actually returned are less than what were asked for in the query.
Regards, CI