RE: Solr Cloud Query Scaling

Tim Potter Thu, 09 Jan 2014 10:33:06 -0800

Absolutely adding replicas helps you scale query load. Queries do not need to 
be routed to leaders; they can be handled by any replica in a shard. Leaders 
are only needed for handling update requests.


In general, a distributed query has two phases, driven by a controller node 
(what you called collator below). The controller is the Solr that received the 
query request from the client. In Phase 1, the controller distributes the query 
to one of the replicas for all shards and receives back the list of matching 
document IDs from each replica (only a page worth btw). 

The controller merges the results and sorts them to generate a final page of 
results to be returned to the client. In Phase 2, the controller collects all 
the fields from the documents to generate the final result set by querying the 
replicas involved in Phase 1.

The controller uses SolrJ's LBSolrServer to query the shards in Phase 1 so you 
get some basic load-balancing amongst replicas for a shard. I've not done any 
research to see how balanced that selection process is in production but I 
suspect if you have 3 replicas in a shard, then roughly 1/3 of the queries go 
to each.

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com

________________________________________
From: Sir Gilligan <sirgilli...@yahoo.com>
Sent: Thursday, January 09, 2014 11:02 AM
To: solr-user@lucene.apache.org
Subject: Solr Cloud Query Scaling

Question: Does adding replicas help with query load?

Scenario: 3 Physical Machines. 3 Shards
Query any machine, get results. Standard Solr Cloud stuff.

Update Scenario: 6 Physical Machines. 3 Shards.
M = Machine, S = Shard, -L = Leader
M1S1-L
M2S2
M3S3
M4S1
M5S2-L
M6S3-L

Incoming Query to M2S2. How will Solr Cloud (4.6.0) distribute the query?
Will M2S2 handle the query for shard 2? Or, will it send it to the
leader of S2 which is M5S2?
When the query is distributed, will it send it to the other leaders? OR,
will it send it to any shard?
Specifically:
Query sent to M2S2. Solr Cloud distributes the query. Could it possibly
send the query on to M3S3 and M4S1? Some kind of query load balance
functionality (maybe like a round robin to the shard members).
OR will M2S2 just be the collator, and send the query to the leaders?
OR something different that I have not described?

If queries do not have to be processed by leaders then we could add
three more physical machines (now total 9 machines) and handle more
query load.

Thank you.

RE: Solr Cloud Query Scaling

Reply via email to