On 6/6/2019 9:00 AM, Rahul Goswami wrote:
*OP Reply* : Total 48 GB per node... I couldn't see another software using
a lot of memory.
I am honestly not sure about the reason for change of directory factory to
SimpleFSDirectoryFactory. But I was told that with mmap at one point we
started to see the shared memory usage on Windows go up significantly,
intermittently freezing the system.
Could the choice of DirectoryFactory here be a factor for the long
updates/frequent merges?

With about 24GB of RAM to cache 1.4TB of index data, you're never going to have good performance. Any query you do is probably going to read more than 24GB of data from the index, which means that it cannot come from memory, some of it must come from disk, which is incredibly slow compared to memory.

MMap is more efficient than "simple" filesystem access. I do not know if you would see markedly better performance, but getting rid of the DirectoryFactory config and letting Solr choose its default might help.

How many total documents (maxDoc, not numDoc) are in that 1.4 TB of
space?
*OP Reply:* Also, there are nearly 12.8 million total docs (maxDoc, NOT
numDoc) in that 1.4 TB space

Unless you're doing faceting or grouping on fields with extremely high cardinality, which I find to be rarely useful except for data mining, 24GB of heap for 12.8 million docs seems very excessive. I was expecting this number to be something like 500 million or more ... that small document count must mean each document is HUGE. Can you take steps to reduce the index size, perhaps by setting stored, indexed, and/or docValues to "false" on some of your fields, and having your application go to the system of record for full details on each document? You will have to reindex after making changes like that.

Can you share the GC log that Solr writes?
*OP Reply:*  Please find the GC logs and thread dumps at this location
https://drive.google.com/open?id=1slsYkAcsH7OH-7Pma91k6t5T72-tIPlw

The larger GC log was unrecognized by both gcviwer and gceasy.io ... the smaller log shows heap usage about 10GB, but it only covers 10 minutes, so it's not really conclusive for diagnosis. The first thing I can suggest to try is to reduce the heap size to 12GB ... but I do not know if that's actually going to work. Indexing might require more memory. The idea here is to make more memory available to the OS disk cache ... with your index size, you're probably going to need to add memory to the system (not the heap).

Another observation is that the CPU usage reaches around 70% (through
manual monitoring) when the indexing starts and the merges are observed. It
is well below 50% otherwise.

Indexing will increase load, and that increase is often very significant. Adding memory to the system is your best bet for better performance. I'd want 1TB of memory for a 1.4TB index ... but I know that memory sizes that high are extremely expensive, and for most servers, not even possible. 512GB or 256GB is more attainable, and would have better performance than 48GB.

Also, should something be altered with the mergeScheduler setting ?
"mergeScheduler":{
         "class":"org.apache.lucene.index.ConcurrentMergeScheduler",
         "maxMergeCount":2,
         "maxThreadCount":2},

Do not configure maxThreadCount beyond 1 unless your data is on SSD. It will slow things down a lot due to the fact that standard disks must move the disk head to read/write from different locations, and head moves take time. SSD can do I/O from any location without pauses, so more threads would probably help performance rather than hurt it.

Increase maxMergeCount to 6 -- at 2, large merges will probably stop indexing entirely. With a larger number, Solr can keep indexing even when there's a huge segment merge happening.

Thanks,
Shawn

Reply via email to