Hi all,

I'm trying to make sure that I understand under what circumstance a distributed 
search is performed against Solr and if my general understanding of what 
constitutes a distributed search is correct.

I have a Solr collection that was created using the Collections API with the 
following parameters: numShards=5  maxShardsPerNode=5  replicationFactor=4.  
Given that we have 4 servers this will result in 5 shards being created on each 
server. All documents indexed into Solr have a shard key specified as a part of 
their document id, such that we can use the same shard key prefix as a part of 
our query by specifying: shard.keys=myshardkey! 

My assumption was that when the search request is submitted, given that my 
deployment topology has all possible shards available on each server, there 
will be no need to call out to other servers in the cluster to fulfill the 
search. What I am noticing is the following:

1. Submit a search to Server 1 with the shard.keys parameter specified. (Note 
again that replicas for shard 1-5 are all available on the Server 1.)
2. The request is forwarded to a server other than Server 1, for example Server 
3.
3. The  /select request handler of Server 3 is invoked. This proceeds to 
execute the /select request, asking for the id and score fields for each 
document that matches the submittted query. I also noticed that it passes the 
shard.url parameter but states that distrib=false.
4. Then *another* request is executed on Server 3 for the /select request 
handler *again*. This time the ids returned from the previous search are passed 
in as the ids parameters.
5. Finally the results are passed back to the caller through the original 
server, Server 1.

This appears to a be full blown distributed shard being performed. My 
expectation was that the search would be localized to the original server 
(Server 1 in the example used above), given that it *should* be able to deduce 
that the current server has a replica that can fulfill the requested search. As 
the very least localizing the search against the shards on Server 1 instead of 
going against the entire Solr cluster.

My hope was that we would not have to go across the network, paying the network 
transport penalty, for a search that could have been fulfilled from the 
original Solr node, when the shard.keys param is specified.

Any insight that can be provided will be greatly appreciated. 

Thanks all! 

Reply via email to