On 6/6/2019 9:00 AM, Rahul Goswami wrote:
*OP Reply* : Total 48 GB per node... I couldn't see another software using
a lot of memory.
I am honestly not sure about the reason for change of directory factory to
SimpleFSDirectoryFactory. But I was told that with mmap at one point we
started to see the shared memory usage on Windows go up significantly,
intermittently freezing the system.
Could the choice of DirectoryFactory here be a factor for the long
updates/frequent merges?
With about 24GB of RAM to cache 1.4TB of index data, you're never going
to have good performance. Any query you do is probably going to read
more than 24GB of data from the index, which means that it cannot come
from memory, some of it must come from disk, which is incredibly slow
compared to memory.
MMap is more efficient than "simple" filesystem access. I do not know
if you would see markedly better performance, but getting rid of the
DirectoryFactory config and letting Solr choose its default might help.
How many total documents (maxDoc, not numDoc) are in that 1.4 TB of
space?
*OP Reply:* Also, there are nearly 12.8 million total docs (maxDoc, NOT
numDoc) in that 1.4 TB space
Unless you're doing faceting or grouping on fields with extremely high
cardinality, which I find to be rarely useful except for data mining,
24GB of heap for 12.8 million docs seems very excessive. I was
expecting this number to be something like 500 million or more ... that
small document count must mean each document is HUGE. Can you take
steps to reduce the index size, perhaps by setting stored, indexed,
and/or docValues to "false" on some of your fields, and having your
application go to the system of record for full details on each
document? You will have to reindex after making changes like that.
Can you share the GC log that Solr writes?
*OP Reply:* Please find the GC logs and thread dumps at this location
https://drive.google.com/open?id=1slsYkAcsH7OH-7Pma91k6t5T72-tIPlw
The larger GC log was unrecognized by both gcviwer and gceasy.io ... the
smaller log shows heap usage about 10GB, but it only covers 10 minutes,
so it's not really conclusive for diagnosis. The first thing I can
suggest to try is to reduce the heap size to 12GB ... but I do not know
if that's actually going to work. Indexing might require more memory.
The idea here is to make more memory available to the OS disk cache ...
with your index size, you're probably going to need to add memory to the
system (not the heap).
Another observation is that the CPU usage reaches around 70% (through
manual monitoring) when the indexing starts and the merges are observed. It
is well below 50% otherwise.
Indexing will increase load, and that increase is often very
significant. Adding memory to the system is your best bet for better
performance. I'd want 1TB of memory for a 1.4TB index ... but I know
that memory sizes that high are extremely expensive, and for most
servers, not even possible. 512GB or 256GB is more attainable, and
would have better performance than 48GB.
Also, should something be altered with the mergeScheduler setting ?
"mergeScheduler":{
"class":"org.apache.lucene.index.ConcurrentMergeScheduler",
"maxMergeCount":2,
"maxThreadCount":2},
Do not configure maxThreadCount beyond 1 unless your data is on SSD. It
will slow things down a lot due to the fact that standard disks must
move the disk head to read/write from different locations, and head
moves take time. SSD can do I/O from any location without pauses, so
more threads would probably help performance rather than hurt it.
Increase maxMergeCount to 6 -- at 2, large merges will probably stop
indexing entirely. With a larger number, Solr can keep indexing even
when there's a huge segment merge happening.
Thanks,
Shawn