Hello
I have a cluster of 16 shards, 3 replicas. the cluster indexed nested
documents.
it currently has 3 billion documents overall (parent and children).
each shard has around 200 million docs. size of each shard is 250GB.
this runs on 12 machines. each machine has 4 SSD disks and 4 solr processes.
each process has 28GB heap.  each machine has 196GB RAM.

I perform periodic indexing throughout the day. each indexing cycle adds
around 1.5 million docs. I keep the indexing load light - 2 processes with
bulks of 20 docs.

My use case demands that each indexing cycle will be visible only when the
whole cycle finishes.

I tried various methods of using soft and hard commits:

1. using auto hard commit with time=10secs (opensearcher=false) and an
explicit soft commit when the indexing finishes.
2. using auto soft commit with time=10/30/60secs during the indexing.
3. not using soft commit at all, just using auto hard commit with
time=10secs during the indexing (opensearcher=false) and an explicit hard
commit with opensearcher=true when the cycle finishes.


with all methods I encounter pretty much the same problem:
1. heavy GCs when soft commit is performed (methods 1,2) or when hardcommit
opensearcher=true is performed. these GCs cause heavy latency (average
latency is 3 secs. latency during the problem is 80secs)
2. if indexing cycles come too often, which causes softcommits or
hardcommits(opensearcher=true) occur with a small interval one after another
(around 5-10minutes), I start getting many OOM exceptions.


Thank you.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/severe-problems-with-soft-and-hard-commits-in-a-large-index-tp4204068.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to