We have a distributed index consisting of several shards. There could be
some documents repeated across shards. We want to remove the duplicate
records from the documents returned from the shards, and re-order the
results by grouping them on the basis of a clustering algorithm and
reranking the documents within a cluster on the basis of log of a particular
returned field value.
How do we go about achieving this? Should we write this logic by
implementing QueryResponseWriter. Also if we remove duplicate records, the
total number of records that are actually returned are less than what were
asked for in the query.

Regards,
CI

Reply via email to