On 8/5/2013 10:17 AM, adfel70 wrote:
I have a solr cluster of 7 shards, replicationFactor 2, running on 7 physical
machines.
Machine spec:
cpu: 16
memory: 32gb
storage is on local disks

Each machine runs 2 solr processes, each process with 6gb memory to jvm.

The cluster currently has 330 million documents, each process around 30gb of
data.

Until recently performance was fine, but after a recent indexing which added
arround 25 million docs, the search performance degraded dramatically.
I'm now getting qtime of 30 second and sometimes even 60 seconds, for simple
queries (fieldA:value AND fieldB:value + facets + highlighting)

Any idea how can I check where the problem is?

Sounds like a "not enough RAM" scenario. It's likely that you were sitting at a threshold for a performance problem, and the 25 million additional documents pushed your installation over that threshold. I think there are two possibilities:

1) Not enough Java heap, resulting in major GC pauses as it works to free up memory for basic operation. If this is the problem, increasing your 6GB heap and/or using facet.method=enum will help. Note that facet.method=enum will make facet performance much more dependent on the OS disk cache being big enough, which leads into the other problem:

2) Not enough OS disk cache for the size of your index. You have two processes each eating up 6GB of your 32GB RAM. If Solr is the only thing running on these servers, then you have slightly less than 20GB of memory for your OS disk cache. If other things are running on the hardware, then you have even less available.

With 60GB of data (two shard replicas at 30GB each) on each server, you want between 30GB and 60GB of RAM available for your OS disk cache, making 64GB an ideal RAM size for your servers. The alternative is to add servers that each have 32GB and make a new index with a larger numShards.

http://wiki.apache.org/solr/SolrPerformanceProblems

The first thing I'd try is running only one Solr process per machine. You might need an 8GB heap instead of a 6GB heap, but that would give you 4GB more per machine for the OS disk cache. There's no need to have two complete containers running Solr on every machine - SolrCloud's Collections API has a maxShardsPerNode parameter that lets it run multiple indexes on one instance.

For any change other than just adding RAM to the hardware, it's likely that you'll need to start over and rebuild your collection from scratch.

Thanks,
Shawn

Reply via email to