Hi
I am migrating from master slave to Solr Cloud but I'm running into problems
with indexing.
Cluster details:
8 machines of 64GB memory, each hosting 1 replica.
4 shards, 2 replica of each. Heap size is 16GB.
Collection details:
Total number of docs: ~250k (but only 50k are indexed right now)
Size of collection (master slave number for reference): ~10GB
Our collection is fairly heavy with some dynamic fields with high
cardinality (of order of ~1000s), which is why the large heap size for even
a small collection.
Relevant solrconfig settings:
commit settings:
<autoCommit>
<maxDocs>10000</maxDocs>
<maxTime>3600000</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>
<autoSoftCommit>
<maxTime>${solr.autoSoftCommit.maxTime:1800000}</maxTime>
</autoSoftCommit>
index config:
<ramBufferSizeMB>500</ramBufferSizeMB>
<maxBufferedDocs>10000</maxBufferedDocs>
<mergePolicyFactory
class="org.apache.solr.index.TieredMergePolicyFactory">
<int name="maxMergeAtOnce">10</int>
<int name="segmentsPerTier">10</int>
</mergePolicyFactory>
<mergeScheduler
class="org.apache.lucene.index.ConcurrentMergeScheduler">
<int name="maxMergeCount">6</int>
<int name="maxThreadCount">4</int>
</mergeScheduler>
Problem:
I setup the cloud and started indexing at the throughput of our earlier
master-slave setup, but soon the machines ran into full blown Garbage
Collection. This throughput was not a lot though. We index the whole
collection overnight, so roughly ~250k documents in 6 hours. That's roughly
12rps.
So now I'm doing indexing at an extremely slow rate trying to find the
problem.
Currently I'm indexing at 1 document/2seconds, so every minute ~30
documents.
Observations:
1. I'm noticing extremely small segments in the segments UI. Example:
Segment _1h4:
#docs: 5
#dels: 0
size: 1,586,878 bytes
age: 2021-02-12T11:05:33.050Z
source: flush
Why is lucene creating such small segments? I understood that segments are
created when ramBufferSizeMB or maxBufferedDocs limit is hit. Or on a hard
commit. Neither of those should lead to such small segments.
2. The index/ directory has a large number of files. For one shard with 30k
documents & 1.5GB size, there are ~450-550 files in this directory. I
understand that each segment is composed of a bunch of files. Even
accounting for that, the number of segments seems very large.
Note: Nothing out of the ordinary in logs. Only /update request logs.
Please help with making sense of the 2 observations above.
--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html