Yeah, it's a bit confusing. I made Yonik and Mark take me through the process in detail in order to write that blog, misunderstandings my fault of course ;)
bq: This makes me think that at the time of soft-commit, the documents in preceding update requests are already flushed (might not be on the disk yet, but JVM has handed over the responsibility to Operating system) True. Soft commits aren't about the tlog at all, just making docs that are already indexed visible to searchers. Soft commits don't have any effect on the segment files either. Back to your original question: bq: Does it mean that flush protects against JVM crash but not power failure? While fsync will protect against both scenarios. In a word, "yes". In practice, the only time people will do an fsync (which you can specify when you commit) is in situations where they need to guard against the remote possibility that the bits would be lost if the power went out during that very short interval. And you have a one-replica system (assuming SolrCloud). And you don't have a tlog (see below). bq: If the JVM crashes or there is a loss of power, changes that occurred after the last *hard commit* will be lost." OK, there's a distinction between whether the tlog enabled or not. There's nothing at all that _requires_ the tlog. So you have two scenarios: 1> tlog not enabled. In this scenario the above is completely true. Unless and until the hard commit is performed, documents sent to the index are lost if there's a power outage or you kill Solr harshly. A hard commit will close all open segments so the state of the index is consistent. When Solr starts up it only "knows" about segments that were closed by a hard commit. 2> tlog enabled. In this scenario, any docs written to the tlog (and the flush/fsync discussion pertains here) then, upon restart, the Solr node will replay docs between the last hard commit from the tlog and no data successfully written to the tlog will be lost. Note that Solr doesn't "know" about the unclosed segments in this case either. But you don't care since any docs in those segments are re-indexed from the tlog. One implication here is that if you do _not_ hard commit, your tlogs will grow without limit. Which is one of the reasons you can specify openSearcher=false for hard commits, so you can commit frequently, preserving your index without having to replay and without worrying about the expense of opening new searchers. Best, Erick On Mon, May 29, 2017 at 12:47 PM, Nawab Zada Asad Iqbal <khi...@gmail.com> wrote: > Thanks Erick, > > I have read different documents in this area and I am getting confused due > to overloaded/"reused" terms. > > E.g., in that lucidworks page, the flow for an indexing request is > explained as follows. This makes me think that at the time of soft-commit, > the documents in preceding update requests are already flushed (might not > be on the disk yet, but JVM has handed over the responsibility to Operating > system). (even if we don't do it as part of soft-commit) > > "After all the leaders have responded, the originating node replies to the > client. At this point, > > *all documents have been flushed to the tlog for all the nodes in the > cluster!"* > > On Mon, May 29, 2017 at 7:57 AM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> There's a long post here on this that might help: >> >> https://lucidworks.com/2013/08/23/understanding- >> transaction-logs-softcommit-and-commit-in-sorlcloud/ >> >> Short form: soft commit doesn't flush tlogs, does not start a new >> tlog, does not close segments, does not open new segments. >> >> Hard commit does all of these things. >> >> Best, >> Erick >> >> On Sun, May 28, 2017 at 3:59 PM, Nawab Zada Asad Iqbal <khi...@gmail.com> >> wrote: >> > Hi, >> > >> > SolrCloud document <https://wiki.apache.org/solr/NewSolrCloudDesign> >> > mentions: >> > >> > "The sync can be tunable e.g. flush vs fsync by default can protect >> against >> > JVM crashes but not against power failure and can be much faster " >> > >> > Does it mean that flush protects against JVM crash but not power failure? >> > While fsync will protect against both scenarios. >> > >> > >> > Also, this NRT help >> > <https://cwiki.apache.org/confluence/display/solr/Near+ >> Real+Time+Searching> >> > explains soft commit as: >> > "A *soft commit* is much faster since it only makes index changes visible >> > and does not fsync index files or write a new index descriptor. If the >> JVM >> > crashes or there is a loss of power, changes that occurred after the >> last *hard >> > commit* will be lost." >> > >> > This is little confusing, as a soft-commit will only happen after a tlog >> > entry is flushed. Isn't it? Or doesn't tlog work differently for >> solrcloud >> > and non-solrCloud configurations. >> > >> > >> > Thanks >> > Nawab >>