Re: Load balancer for indexing?

Chris Hostetter Tue, 28 Apr 2015 13:55:15 -0700

: I would still use ConcurrentUpdateSolrServer as it is good for catching up
: when my indexing has fallen behind.  I know it swallows exceptions.


I feel like you are missing the point of when/why 
ConcurrentUpdateSolrServer compared to your goal of "load balancing" 
updates.

The *only* feature ConcurrentUpdateSolrServer gives you over any other 
type of SolrServer is that it, internally, has a background thread which 
collects up and sends big blocks of documents to the server it points at 
behind the scenes.  sending big blocks of documents is the antithesis of 
your stated goal to "load balance" those updates to multiple servers.

If the reason you like using ConcurrentUpdateSolrServer is because of hte 
background thread, but you don't wnat the "big batches" you could just use 
wrap a regular HttpSolrServer (pointed at your load balancer) inside of a 
helper method that used an ExecutorServvice (or something else like it) to 
handle hte background thread yourself.

But again, because of how SolrCloud works, the *best* way to speed up 
performance of indexing is to minimize the amount of data going over the 
wire -- when you send documents to an *arbitrary* SolrCloud node, it will 
do the right thing, and forward those documents to the correct "leader" 
for the appropriate shard of hat document -- but if you use 
CloudSolrServer you can help eliminate one completely "hop" of that 
document in the network by letting your client talk directly to the 
*right* leader.

So even better then using HttpSolrServer with a load balancer (behind a 
multi-threaded executor if that's what you want) is using a 
CloudSolrServer -- it's like using a smart software load balancer that 
knows *exactly* which node to send the HTTP request to -- not because of 
CPU load, or current net connections, but because of where the data *must* 
ultimately be sent anyway.


-Hoss
http://www.lucidworks.com/

Re: Load balancer for indexing?

Reply via email to