Thanks Erick, that summary is very helpful.
Nawab On Mon, May 29, 2017 at 1:39 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Yeah, it's a bit confusing. I made Yonik and Mark take me through the > process in detail in order to write that blog, misunderstandings my > fault of course ;) > > bq: This makes me think that at the time of soft-commit, > the documents in preceding update requests are already flushed (might not > be on the disk yet, but JVM has handed over the responsibility to Operating > system) > > True. Soft commits aren't about the tlog at all, just making docs that > are already indexed visible to searchers. Soft commits don't have any > effect on the segment files either. > > Back to your original question: > > bq: Does it mean that flush protects against JVM crash but not power > failure? > While fsync will protect against both scenarios. > > In a word, "yes". In practice, the only time people will do an fsync > (which you can specify when you commit) is in situations where they > need to guard against the remote possibility that the bits would be > lost if the power went out during that very short interval. And you > have a one-replica system (assuming SolrCloud). And you don't have a > tlog (see below). > > bq: If the JVM crashes or there is a loss of power, changes that > occurred after the last *hard > commit* will be lost." > > OK, there's a distinction between whether the tlog enabled or not. > There's nothing at all that _requires_ the tlog. So you have two > scenarios: > > 1> tlog not enabled. In this scenario the above is completely true. > Unless and until the hard commit is performed, documents sent to the > index are lost if there's a power outage or you kill Solr harshly. A > hard commit will close all open segments so the state of the index is > consistent. When Solr starts up it only "knows" about segments that > were closed by a hard commit. > > 2> tlog enabled. In this scenario, any docs written to the tlog (and > the flush/fsync discussion pertains here) then, upon restart, the Solr > node will replay docs between the last hard commit from the tlog and > no data successfully written to the tlog will be lost. Note that Solr > doesn't "know" about the unclosed segments in this case either. But > you don't care since any docs in those segments are re-indexed from > the tlog. > > One implication here is that if you do _not_ hard commit, your tlogs > will grow without limit. Which is one of the reasons you can specify > openSearcher=false for hard commits, so you can commit frequently, > preserving your index without having to replay and without worrying > about the expense of opening new searchers. > > Best, > Erick > > On Mon, May 29, 2017 at 12:47 PM, Nawab Zada Asad Iqbal > <khi...@gmail.com> wrote: > > Thanks Erick, > > > > I have read different documents in this area and I am getting confused > due > > to overloaded/"reused" terms. > > > > E.g., in that lucidworks page, the flow for an indexing request is > > explained as follows. This makes me think that at the time of > soft-commit, > > the documents in preceding update requests are already flushed (might not > > be on the disk yet, but JVM has handed over the responsibility to > Operating > > system). (even if we don't do it as part of soft-commit) > > > > "After all the leaders have responded, the originating node replies to > the > > client. At this point, > > > > *all documents have been flushed to the tlog for all the nodes in the > > cluster!"* > > > > On Mon, May 29, 2017 at 7:57 AM, Erick Erickson <erickerick...@gmail.com > > > > wrote: > > > >> There's a long post here on this that might help: > >> > >> https://lucidworks.com/2013/08/23/understanding- > >> transaction-logs-softcommit-and-commit-in-sorlcloud/ > >> > >> Short form: soft commit doesn't flush tlogs, does not start a new > >> tlog, does not close segments, does not open new segments. > >> > >> Hard commit does all of these things. > >> > >> Best, > >> Erick > >> > >> On Sun, May 28, 2017 at 3:59 PM, Nawab Zada Asad Iqbal < > khi...@gmail.com> > >> wrote: > >> > Hi, > >> > > >> > SolrCloud document <https://wiki.apache.org/solr/NewSolrCloudDesign> > >> > mentions: > >> > > >> > "The sync can be tunable e.g. flush vs fsync by default can protect > >> against > >> > JVM crashes but not against power failure and can be much faster " > >> > > >> > Does it mean that flush protects against JVM crash but not power > failure? > >> > While fsync will protect against both scenarios. > >> > > >> > > >> > Also, this NRT help > >> > <https://cwiki.apache.org/confluence/display/solr/Near+ > >> Real+Time+Searching> > >> > explains soft commit as: > >> > "A *soft commit* is much faster since it only makes index changes > visible > >> > and does not fsync index files or write a new index descriptor. If the > >> JVM > >> > crashes or there is a loss of power, changes that occurred after the > >> last *hard > >> > commit* will be lost." > >> > > >> > This is little confusing, as a soft-commit will only happen after a > tlog > >> > entry is flushed. Isn't it? Or doesn't tlog work differently for > >> solrcloud > >> > and non-solrCloud configurations. > >> > > >> > > >> > Thanks > >> > Nawab > >> >