Re: Shard Keys and Distributed Search

Daniel Collins Sat, 01 Jun 2013 02:10:59 -0700

Yes it is doing a distributed search, Solr cloud will do that by defaultunless you say distrib=false.

My understanding of Solr's Load balancer is that it picks a random instancefrom the list of available instances serving each shard.

So in your example:

1. Query comes in to Server 1, server 1 de-constructs it and works out whichshards it needs to query. It then gets a list (from ZK) of all the instancesin that collection which can service that shard, and the LB in Solr justpicks one (at random).

2. It has picked Server 3 in your case, so the request goes there.

3. The request is still a 2-stage process (in terms of what you see in thelogs), 1 query to get the docIds (using your query data) and then a second"query" to get the stored fields, once it has the correct list of docs.This is necessary because in a general multi-shard query, the responses willhave to go back to server 1 and be consolidated (not 100% sure of this areabut I believe this is true and it makes logical sense to me), so if you hada query for 10 records that needed to access 4 shards, it would ask for the"top 10" from each shard, then combine/sort them to get the overall "top10", and then get the stored fields for those 10 (which might be 5 fromshard 1, 2 from shard2 and 3 from shard3, nothing from shard4 for example).

You are right that it seems counter intuitive from the users's perspective,but I don't think Solr Cloud currently has any logic to favour a localinstance over a remote one, I guess that would be a change toCloudSolrServer? Alternatively, you can do it in your client, send anon-distributed query, so append"distrib=false&shards=localhost:8983/solr,localhost:7574/solr".

-----Original Message-----From: Niran Fajemisin

Sent: Friday, May 31, 2013 5:00 PM
To: Solr User
Subject: Shard Keys and Distributed Search

Hi all,

I'm trying to make sure that I understand under what circumstance adistributed search is performed against Solr and if my general understandingof what constitutes a distributed search is correct.

I have a Solr collection that was created using the Collections API with thefollowing parameters: numShards=5 maxShardsPerNode=5 replicationFactor=4.Given that we have 4 servers this will result in 5 shards being created oneach server. All documents indexed into Solr have a shard key specified as apart of their document id, such that we can use the same shard key prefix asa part of our query by specifying: shard.keys=myshardkey!

My assumption was that when the search request is submitted, given that mydeployment topology has all possible shards available on each server, therewill be no need to call out to other servers in the cluster to fulfill thesearch. What I am noticing is the following:

1. Submit a search to Server 1 with the shard.keys parameter specified.(Note again that replicas for shard 1-5 are all available on the Server 1.)2. The request is forwarded to a server other than Server 1, for exampleServer 3.3. The /select request handler of Server 3 is invoked. This proceeds toexecute the /select request, asking for the id and score fields for eachdocument that matches the submittted query. I also noticed that it passesthe shard.url parameter but states that distrib=false.4. Then *another* request is executed on Server 3 for the /select requesthandler *again*. This time the ids returned from the previous search arepassed in as the ids parameters.5. Finally the results are passed back to the caller through the originalserver, Server 1.

This appears to a be full blown distributed shard being performed. Myexpectation was that the search would be localized to the original server(Server 1 in the example used above), given that it *should* be able todeduce that the current server has a replica that can fulfill the requestedsearch. As the very least localizing the search against the shards on Server1 instead of going against the entire Solr cluster.

My hope was that we would not have to go across the network, paying thenetwork transport penalty, for a search that could have been fulfilled fromthe original Solr node, when the shard.keys param is specified.


Any insight that can be provided will be greatly appreciated.

Thanks all!

Re: Shard Keys and Distributed Search

Reply via email to