On 5/10/2018 9:48 AM, Shivam Omar wrote:
I need some help in understanding solr soft commits. As soft commits are about
visibility and are fast in nature. They are advised for nrt use cases.
Soft commits *MIGHT* be faster than hard commits. There are situations
where the performance of a soft commit and a hard commit with
openSearcher=true will be about the same, particularly if indexing is
very heavy.
I want to understand does soft commit also honor merge policies and do segment
merging for docs in memory. For example, in case, I keep hard commit interval
very high and allow few million documents to be in memory by using soft commit
with no hard commit, can it affect solr query time performance.
Segments in memory are very likely not eligible for merging, but I do
not actually know whether that is the case.
Using soft commits will NOT keep millions of documents in memory. Solr
uses the NRTCachingDirectoryFactory from Lucene by default, and uses it
with default values, which are far too low to accommodate millions of
documents. See the Javadoc for the directory to see what those defaults
are:
https://lucene.apache.org/core/7_3_0/core/org/apache/lucene/store/NRTCachingDirectory.html
That page shows a directory creation with memory values of 5 and 60 MB,
but the defaults in the factory code (which is what Solr normally uses)
are 4 and 48. I'm pretty sure that you can increase these values in
solrconfig.xml, but really large values are not recommended. Large
enough values to accommodate millions of documents would require the
Java heap to also be large, likely with no real performance advantage.
If segment sizes exceed these values, then they will not be cached in
memory. Older segments and segments that do not meet the size
requirements are flushed to disk.
Thanks,
Shawn