On 8/5/2013 10:17 AM, adfel70 wrote:
I have a solr cluster of 7 shards, replicationFactor 2, running on 7 physical
machines.
Machine spec:
cpu: 16
memory: 32gb
storage is on local disks
Each machine runs 2 solr processes, each process with 6gb memory to jvm.
The cluster currently has 330 million documents, each process around 30gb of
data.
Until recently performance was fine, but after a recent indexing which added
arround 25 million docs, the search performance degraded dramatically.
I'm now getting qtime of 30 second and sometimes even 60 seconds, for simple
queries (fieldA:value AND fieldB:value + facets + highlighting)
Any idea how can I check where the problem is?
Sounds like a "not enough RAM" scenario. It's likely that you were
sitting at a threshold for a performance problem, and the 25 million
additional documents pushed your installation over that threshold. I
think there are two possibilities:
1) Not enough Java heap, resulting in major GC pauses as it works to
free up memory for basic operation. If this is the problem, increasing
your 6GB heap and/or using facet.method=enum will help. Note that
facet.method=enum will make facet performance much more dependent on the
OS disk cache being big enough, which leads into the other problem:
2) Not enough OS disk cache for the size of your index. You have two
processes each eating up 6GB of your 32GB RAM. If Solr is the only
thing running on these servers, then you have slightly less than 20GB of
memory for your OS disk cache. If other things are running on the
hardware, then you have even less available.
With 60GB of data (two shard replicas at 30GB each) on each server, you
want between 30GB and 60GB of RAM available for your OS disk cache,
making 64GB an ideal RAM size for your servers. The alternative is to
add servers that each have 32GB and make a new index with a larger
numShards.
http://wiki.apache.org/solr/SolrPerformanceProblems
The first thing I'd try is running only one Solr process per machine.
You might need an 8GB heap instead of a 6GB heap, but that would give
you 4GB more per machine for the OS disk cache. There's no need to have
two complete containers running Solr on every machine - SolrCloud's
Collections API has a maxShardsPerNode parameter that lets it run
multiple indexes on one instance.
For any change other than just adding RAM to the hardware, it's likely
that you'll need to start over and rebuild your collection from scratch.
Thanks,
Shawn