Re: Solr Cloud, 100 shards, shards progressively become slower

Shawn Heisey Thu, 08 Jan 2015 07:33:18 -0800

On 1/8/2015 7:26 AM, Andrew Butkus wrote:
> Hi, we have 8 solr servers, split 4x4 across 2 data centers.
> 
> We have a collection of around ½ billion documents, split over 100 shards, 
> each is replicated 4 times on separate nodes (evenly distributed across both 
> data centers).
> 
> The problem we have is that when we use cursormark (and also when we don't 
> use cursormark the pattern below is the same but just shorter in time) the 
> time it takes to query each shard gets progressively longer when distrib=true 
> , I have tried to query shards directly (with shards=) and select my own 
> shards to query to see if it was a bandwidth bottleneck and the performance 
> is normal / fine - when using pre-defined shards.
> 
> Does anyone know why the shards become progressively slower when 
> distrib=true? Or any suggestions on how I can fix, or how to debug the 
> problem further?
> 
> I have monitored the performance of CPU and it never goes above 10% on each 
> server, so its not cpu, also the memory usage is about 4gb out of 16gb so its 
> not a memory issue either.
> 
> I have tried all shard shuffling strategies incase it was a bottleneck at a 
> server being over used but as above, the cpu never goes above 10%, and when I 
> use shards= there are never any querytime bottlenecks.


The part about memory usage is not clear.  That 4GB and 16GB could refer
to the operating system view of memory, or the view of memory within the
JVM.  I'm curious about how much total RAM each machine has, how large
the Java heap is, and what the total size of the indexes that live on
each machine is.

Even if they are individually very small, 500 million documents will
result in a very large index, so I'm guessing that you don't have enough
RAM on each server for your index size.

What can happen with a highly sharded index that is too large for
available RAM:  Index data for the initial queries gets read from the OS
disk cache, but as those queries run, the information required for the
shards that come later in the distributed query gets pushed out of the
disk cache, so Solr must actually read the disk to do those later
queries.  Disks are slow, so if the machine has to actually read from
the disk, Solr will be slow.

http://wiki.apache.org/solr/SolrPerformanceProblems#RAM

Thanks,
Shawn

Re: Solr Cloud, 100 shards, shards progressively become slower

Reply via email to