As a follow-up question on this....

One would want to use some kind of load balancing 'above' the SolrCloud 
installation for search queries, correct?  To ensure that the initial requests 
would get distributed evenly to all nodes?

If you don't have that, and send all requests to M2S2 (IRT OP), it would be the 
only node that would ever act as controller, and it could become a bottleneck 
that further replicas won't be able to alleviate.  Correct?

Or is there something in the SolrCloud itself that even distributes the 
controller role, regardless of which node the query initially arrives at?

-----Original Message-----
From: Tim Potter [mailto:tim.pot...@lucidworks.com] 
Sent: Thursday, January 09, 2014 12:28 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr Cloud Query Scaling

Absolutely adding replicas helps you scale query load. Queries do not need to 
be routed to leaders; they can be handled by any replica in a shard. Leaders 
are only needed for handling update requests.

In general, a distributed query has two phases, driven by a controller node 
(what you called collator below). The controller is the Solr that received the 
query request from the client. In Phase 1, the controller distributes the query 
to one of the replicas for all shards and receives back the list of matching 
document IDs from each replica (only a page worth btw). 

The controller merges the results and sorts them to generate a final page of 
results to be returned to the client. In Phase 2, the controller collects all 
the fields from the documents to generate the final result set by querying the 
replicas involved in Phase 1.

The controller uses SolrJ's LBSolrServer to query the shards in Phase 1 so you 
get some basic load-balancing amongst replicas for a shard. I've not done any 
research to see how balanced that selection process is in production but I 
suspect if you have 3 replicas in a shard, then roughly 1/3 of the queries go 
to each.

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com

________________________________________
From: Sir Gilligan <sirgilli...@yahoo.com>
Sent: Thursday, January 09, 2014 11:02 AM
To: solr-user@lucene.apache.org
Subject: Solr Cloud Query Scaling

Question: Does adding replicas help with query load?

Scenario: 3 Physical Machines. 3 Shards
Query any machine, get results. Standard Solr Cloud stuff.

Update Scenario: 6 Physical Machines. 3 Shards.
M = Machine, S = Shard, -L = Leader
M1S1-L
M2S2
M3S3
M4S1
M5S2-L
M6S3-L

Incoming Query to M2S2. How will Solr Cloud (4.6.0) distribute the query?
Will M2S2 handle the query for shard 2? Or, will it send it to the leader of S2 
which is M5S2?
When the query is distributed, will it send it to the other leaders? OR, will 
it send it to any shard?
Specifically:
Query sent to M2S2. Solr Cloud distributes the query. Could it possibly send 
the query on to M3S3 and M4S1? Some kind of query load balance functionality 
(maybe like a round robin to the shard members).
OR will M2S2 just be the collator, and send the query to the leaders?
OR something different that I have not described?

If queries do not have to be processed by leaders then we could add three more 
physical machines (now total 9 machines) and handle more query load.

Thank you.

Reply via email to