I completely agree with Shawn. I’d emphasize that your heap is that large probably to accommodate badly mis-configured caches.
Why it’s different in 5.4 I don’t quite know, but 10-12 minutes is unacceptable anyway. My guess is that you made your heaps that large as a consequence of having low hit rates. If you were using bare NOW in fq clauses, perhaps you were getting very low hit rates as a result and expanded the cache size, see: https://dzone.com/articles/solr-date-math-now-and-filter At any rate, I _strongly_ recommend that you drop your filterCache to the default size of 512, and drop your autowarmCount to something very small, say 16. Ditto for queryResultCache. The documentCache to maybe 10,000 (autowarm is a no-op for documentCache). Then drop your heap to something closer to 16G. Then test, tune, test. Do NOT assume bigger caches are the answer until you have evidence. Keep reducing your heap size until you start to see GC problems (on a test system obviously) to get your lower limit. Then add some back for your production to give you some breathing room. Finally, see Uwe’s blog: https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html to get a sense of why the size on disk is not necessarily a good indicator of the heap requirements. Best, Erick > On Nov 4, 2020, at 2:40 AM, Shawn Heisey <apa...@elyograg.org> wrote: > > On 11/3/2020 11:46 PM, raj.yadav wrote: >> We have two parallel system one is solr 8.5.2 and other one is solr 5.4 >> In solr_5.4 commit time with opensearcher true is 10 to 12 minutes while in >> solr_8 it's around 25 minutes. > > Commits on a properly configured and sized system should take a few seconds, > not minutes. 10 to 12 minutes for a commit is an enormous red flag. > >> This is our current caching policy of solr_8 >> <filterCache class="solr.CaffeineCache" >> size="32768" >> initialSize="6000" >> autowarmCount="6000"/> > > This is probably the culprit. Do you know how many entries the filterCache > actually ends up with? What you've said with this config is "every time I > open a new searcher, I'm going to execute up to 6000 queries against the new > index." If each query takes one second, running 6000 of them is going to > take 100 minutes. I have seen these queries take a lot longer than one > second. > > Also, each entry in the filterCache can be enormous, depending on the number > of docs in the index. Let's say that you have five million documents in your > core. With five million documents, each entry in the filterCache is going to > be 625000 bytes. That means you need 20GB of heap memory for a full > filterCache of 32768 entries -- 20GB of memory above and beyond everything > else that Solr requires. Your message doesn't say how many documents you > have, it only says the index is 11GB. From that, it is not possible for me > to figure out how many documents you have. > >> While debugging this we came across this page. >> https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems#SolrPerformanceProblems-Slowcommits > > I wrote that wiki page. > >> Here one of the reasons for slow commit is mentioned as: >> */`Heap size issues. Problems from the heap being too big will tend to be >> infrequent, while problems from the heap being too small will tend to happen >> consistently.`/* >> Can anyone please help me understand the above point? > > If your heap is a lot bigger than it needs to be, then what you'll see is > slow garbage collections, but it won't happen very often. If the heap is too > small, then there will be garbage collections that happen REALLY often, > leaving few system resources for actually running the program. This applies > to ANY Java program, not just Solr. > >> System config: >> disk size: 250 GB >> cpu: (8 vcpus, 64 GiB memory) >> Index size: 11 GB >> JVM heap size: 30 GB > > That heap seems to be a lot larger than it needs to be. I have run systems > with over 100GB of index, with tens of millions of documents, on an 8GB heap. > My filterCache on each core had a max size of 64, with an autowarmCount of > four ... and commits STILL would take 10 to 15 seconds, which I consider to > be very slow. Most of that time was spent executing those four queries in > order to autowarm the filterCache. > > What I would recommend you start with is reducing the size of the > filterCache. Try a size of 128 and an autowarmCount of 8, see what you get > for a hit rate on the cache. Adjust from there as necessary. And I would > reduce the heap size for Solr as well -- your heap requirements should drop > dramatically with a reduced filterCache. > > Thanks, > Shawn