Hi, This is regarding the issue that we are facing with SOLR distributed search. In our application, we are managing multiple shards at SOLR server to manage the load. But there is a problem with the order of results that we going to return to client during the search.
For Example: Currently there are two shards on which data is randomly distributed. When I search something, it was observerd that the results from one shard appear first and then results from other shard. Moreover, we are ordering results by applying two levels of sorting (configurable as per user also): 1. Score 2. Modified Time I did investigations for the above scenario and found that it is not necessary that documents coming from one shard will always have the same score as documents coming from other shard, even if they are identical. I also went through the various SOLR documentations and links, and found that currently there is a limitation to distributed search in SOLR that Inverse-document frequency (IDF) calculations cannot be distributed and TF/IDF computations are per shard. This issue is particularly visible when there is significant difference between the number of documents indexed in each shard. (For Ex: first shard has 15000 docs and second shard has 5000). Please review and let me know whether our findings for the above scenario are appropriate or not. Also, as per our investigation currently there is work ongoing in SOLR community to support this concept of distributed/Global IDF. But, I wanted to know if there is any solution possible right now to manage/control the score of the documents during distributed search, so that the results seem more relevant. Thanks Rashi