Hello - your similarity should rely on numDoc instead, it solves the problem. I
believe it is already fixed in trunk, but i am not sure.
Markus
-----Original message-----
> From:Upayavira <upayav...@odoko.co.uk>
> Sent: Thursday 4th August 2016 13:59
> To: solr-user@lucene.apache.org
> Subject: Out of sync deletions causing differing IDF
>
> We have a system that has a reasonable number of changes going on on a
> daily basis (maybe 60m docs, and around 1m updates per day). Using Solr
> Cloud, the data is split into 10 shards and those shards are replicated.
>
> What we are finding is that the number of deletions is causing differing
> maxDocs across the different replicas, and that is causing significantly
> different IDF values between replicas of the same shard, giving
> different scores and thus different orders depending upon which replica
> we hit.
>
> I would have expected that, because the data is being indexed
> concurrently across replicas, that the pattern of delete/merge would be
> similar across replicas, but that doesn't seem to be the case in
> practice.
>
> We could, of course, optimise the index to merge down to a single
> segment. This would clear all deletes out, but would leave us in a worse
> place for the future, as now most of our deletes would be concentrated
> into a single large segment.
>
> Has anyone seen this sort of thing before, and does anyone have
> suggested strategies as to how to encourage IDF values into a similar
> range across replicas?
>
> Upayavira
>