Re: SolrCloud indexing triggers merges and timeouts

Shawn Heisey Thu, 13 Jun 2019 04:33:05 -0700

On 6/6/2019 9:00 AM, Rahul Goswami wrote:

*OP Reply* : Total 48 GB per node... I couldn't see another software using
a lot of memory.
I am honestly not sure about the reason for change of directory factory to
SimpleFSDirectoryFactory. But I was told that with mmap at one point we
started to see the shared memory usage on Windows go up significantly,
intermittently freezing the system.
Could the choice of DirectoryFactory here be a factor for the long
updates/frequent merges?

With about 24GB of RAM to cache 1.4TB of index data, you're never goingto have good performance. Any query you do is probably going to readmore than 24GB of data from the index, which means that it cannot comefrom memory, some of it must come from disk, which is incredibly slowcompared to memory.

MMap is more efficient than "simple" filesystem access. I do not knowif you would see markedly better performance, but getting rid of theDirectoryFactory config and letting Solr choose its default might help.

How many total documents (maxDoc, not numDoc) are in that 1.4 TB of
space?
*OP Reply:* Also, there are nearly 12.8 million total docs (maxDoc, NOT
numDoc) in that 1.4 TB space

Unless you're doing faceting or grouping on fields with extremely highcardinality, which I find to be rarely useful except for data mining,24GB of heap for 12.8 million docs seems very excessive. I wasexpecting this number to be something like 500 million or more ... thatsmall document count must mean each document is HUGE. Can you takesteps to reduce the index size, perhaps by setting stored, indexed,and/or docValues to "false" on some of your fields, and having yourapplication go to the system of record for full details on eachdocument? You will have to reindex after making changes like that.

Can you share the GC log that Solr writes?

*OP Reply:*  Please find the GC logs and thread dumps at this location
https://drive.google.com/open?id=1slsYkAcsH7OH-7Pma91k6t5T72-tIPlw

The larger GC log was unrecognized by both gcviwer and gceasy.io ... thesmaller log shows heap usage about 10GB, but it only covers 10 minutes,so it's not really conclusive for diagnosis. The first thing I cansuggest to try is to reduce the heap size to 12GB ... but I do not knowif that's actually going to work. Indexing might require more memory.The idea here is to make more memory available to the OS disk cache ...with your index size, you're probably going to need to add memory to thesystem (not the heap).

Another observation is that the CPU usage reaches around 70% (through
manual monitoring) when the indexing starts and the merges are observed. It
is well below 50% otherwise.

Indexing will increase load, and that increase is often verysignificant. Adding memory to the system is your best bet for betterperformance. I'd want 1TB of memory for a 1.4TB index ... but I knowthat memory sizes that high are extremely expensive, and for mostservers, not even possible. 512GB or 256GB is more attainable, andwould have better performance than 48GB.

Also, should something be altered with the mergeScheduler setting ?
"mergeScheduler":{
         "class":"org.apache.lucene.index.ConcurrentMergeScheduler",
         "maxMergeCount":2,
         "maxThreadCount":2},

Do not configure maxThreadCount beyond 1 unless your data is on SSD. Itwill slow things down a lot due to the fact that standard disks mustmove the disk head to read/write from different locations, and headmoves take time. SSD can do I/O from any location without pauses, somore threads would probably help performance rather than hurt it.

Increase maxMergeCount to 6 -- at 2, large merges will probably stopindexing entirely. With a larger number, Solr can keep indexing evenwhen there's a huge segment merge happening.


Thanks,
Shawn

Re: SolrCloud indexing triggers merges and timeouts

Reply via email to