Hi all, I'm trying to make sure that I understand under what circumstance a distributed search is performed against Solr and if my general understanding of what constitutes a distributed search is correct.
I have a Solr collection that was created using the Collections API with the following parameters: numShards=5 maxShardsPerNode=5 replicationFactor=4. Given that we have 4 servers this will result in 5 shards being created on each server. All documents indexed into Solr have a shard key specified as a part of their document id, such that we can use the same shard key prefix as a part of our query by specifying: shard.keys=myshardkey! My assumption was that when the search request is submitted, given that my deployment topology has all possible shards available on each server, there will be no need to call out to other servers in the cluster to fulfill the search. What I am noticing is the following: 1. Submit a search to Server 1 with the shard.keys parameter specified. (Note again that replicas for shard 1-5 are all available on the Server 1.) 2. The request is forwarded to a server other than Server 1, for example Server 3. 3. The /select request handler of Server 3 is invoked. This proceeds to execute the /select request, asking for the id and score fields for each document that matches the submittted query. I also noticed that it passes the shard.url parameter but states that distrib=false. 4. Then *another* request is executed on Server 3 for the /select request handler *again*. This time the ids returned from the previous search are passed in as the ids parameters. 5. Finally the results are passed back to the caller through the original server, Server 1. This appears to a be full blown distributed shard being performed. My expectation was that the search would be localized to the original server (Server 1 in the example used above), given that it *should* be able to deduce that the current server has a replica that can fulfill the requested search. As the very least localizing the search against the shards on Server 1 instead of going against the entire Solr cluster. My hope was that we would not have to go across the network, paying the network transport penalty, for a search that could have been fulfilled from the original Solr node, when the shard.keys param is specified. Any insight that can be provided will be greatly appreciated. Thanks all!