On 9/22/2016 3:27 PM, vsolakhian wrote: > This is not the cause of the problem though. The disk cache is > important for queries and overall performance during optimization, but > once it is done, everything should go back to "normal" (whatever that > normal is). In our case it is the SOFT COMMIT (that opens a new > Searcher) that takes 10 times longer AFTER the index was optimized and > deleted records were removed (and index size went down to 60 GB).
It's difficult to say without hard numbers, and that is complicated by my very limited understanding of how HDFS gets cached. "Normal" is achieved only when relevant data is in the disk cache. Which will most likely not be the case after an optimize, unless you have enough caching memory for both the before and after index to fit at the same time. Similar performance issues are likely to occur right after a server reboot. A soft commit opens a new searcher. When a new searcher is opened, the *Solr* caches (which are entirely different from the disk cache) look at their autowarmCount settings. Each cache gathers the top N queries contained in the cache, up to the autowarmCount number, and proceeds to execute the those queries on the index to create a brand new cache for the new searcher. The new searcher is not put into place until the warming is done. The commit will not finish until the new searcher is online. If the info sitting in the OS disk cache when the warming queries happen is not useful for fast queries, then those queries will be very slow, which makes the commit take longer. For better commit times, reduce autowarmCount on your Solr caches. This will make it more likely that users will notice slow queries, though. Good Solr performance with large indexes requires a LOT of memory. The amount required is usually very surprising to admins. Thanks, Shawn