On 11/1/2014 9:52 AM, Ian Rose wrote:
> Just to make sure I am thinking about this right: batching will certainly
> make a big difference in performance, but it should be more or less a
> constant factor no matter how many Solr nodes you are using, right?  Right
> now in my load tests, I'm not actually that concerned about the absolute
> performance numbers; instead I'm just trying to figure out why relative
> performance (no matter how bad it is since I am not batching) does not go
> up with more Solr nodes.  Once I get that part figured out and we are
> seeing more writes per sec when we add nodes, then I'll turn on batching in
> the client to see what kind of additional performance gain that gets us.

The basic problem I see with your methodology is that you are sending an
update request and waiting for it to complete before sending another.
No matter how big the batches are, this is an inefficient use of resources.

If you send many such requests at the same time, then they will be
handled in parallel.  Lucene (and by extension, Solr) has the thread
synchronization required to keep multiple simultaneous update requests
from stomping on each other and corrupting the index.

If you have enough CPU cores, such handling will *truly* be in parallel,
otherwise the operating system will just take turns giving each thread
CPU time.  This results in a pretty good facsimile of parallel
operation, but because it splits the available CPU resources, isn't as
fast as true parallel operation.

Thanks,
Shawn

Reply via email to