On 6/6/2019 5:45 AM, vishal patel wrote:
One server(256GB RAM) has two below Solr instance and other application also
1) shards1 (80GB heap ,790GB Storage, 449GB Indexed data)
2) replica of shard2 (80GB heap, 895GB Storage, 337GB Indexed data)
The second server(256GB RAM and 1 TB storage) has two below Solr instance and
other application also
1) shards2 (80GB heap, 790GB Storage, 338GB Indexed data)
2) replica of shard1 (80GB heap, 895GB Storage, 448GB Indexed data)
An 80GB heap is ENORMOUS. And you have two of those per server. Do you
*know* that you need a heap that large? You only have 50 million
documents total, two instances that each have 80GB seems completely
unnecessary. I would think that one instance with a much smaller heap
would handle just about anything you could throw at 50 million documents.
With 160GB taken by heaps, you're leaving less than 100GB of memory to
cache over 700GB of index. This is not going to work well, especially
if your index doesn't have many fields that are stored. It will cause a
lot of disk I/O.
Both server memory and disk usage:
https://drive.google.com/drive/folders/11GoZy8C0i-qUGH-ranPD8PCoPWCxeS-5
Unless you have changed the DirectoryFactory to something that's not
default, your process listing does not reflect over 700GB of index data.
If you have changed the DirectoryFactory, then I would strongly
recommend removing that part of your config and letting Solr use its
default.
Note: Average 40GB heap used normally in each Solr instance. when replica gets
down at that time disk IO are high and also GC pause time above 15 seconds. We
can not identify the exact issue of replica recovery OR down from logs. due to
the GC pause? OR due to disk IO high? OR due to time-consuming query? OR due to
heavy indexing?
With an 80GB heap, I'm not really surprised you're seeing GC pauses
above 15 seconds. I have seen pauses that long with a heap that's only 8GB.
GC pauses lasting that long will cause problems with SolrCloud. Nodes
going into recovery is common.
Thanks,
Shawn