Thanks Erick, that summary is very helpful.

Nawab


On Mon, May 29, 2017 at 1:39 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Yeah, it's a bit confusing. I made Yonik and Mark take me through the
> process in detail in order to write that blog, misunderstandings my
> fault of course ;)
>
> bq: This makes me think that at the time of soft-commit,
> the documents in preceding update requests are already flushed (might not
> be on the disk yet, but JVM has handed over the responsibility to Operating
> system)
>
> True. Soft commits aren't about the tlog at all, just making docs that
> are already indexed visible to  searchers. Soft commits don't have any
> effect on the segment files either.
>
> Back to your original question:
>
> bq: Does it mean that flush protects against JVM crash but not power
> failure?
> While fsync will protect against both scenarios.
>
> In a word, "yes". In practice, the only time people will do an fsync
> (which you can specify when you commit) is in situations where they
> need to guard against the remote possibility that the bits would be
> lost if the power went out during that very short interval. And you
> have a one-replica system (assuming SolrCloud). And you don't have a
> tlog (see below).
>
> bq:  If the JVM crashes or there is a loss of power, changes that
> occurred after the last *hard
> commit* will be lost."
>
> OK, there's a distinction between whether the tlog enabled or not.
> There's nothing at all that _requires_ the tlog. So you have two
> scenarios:
>
> 1> tlog not enabled. In this scenario the above is completely true.
> Unless and until the hard commit is performed, documents sent to the
> index are lost if there's a power outage or you kill Solr harshly. A
> hard commit will close all open segments so the state of the index is
> consistent. When Solr starts up it only "knows" about segments that
> were closed by a hard commit.
>
> 2> tlog enabled. In this scenario, any docs written to the tlog (and
> the flush/fsync discussion pertains here) then, upon restart, the Solr
> node will replay docs between the last hard commit from the tlog and
> no data successfully written to the tlog will be lost. Note that Solr
> doesn't "know" about the unclosed segments in this case either. But
> you don't care since any docs in those segments are re-indexed from
> the tlog.
>
> One implication here is that if you do _not_ hard commit, your tlogs
> will grow without limit. Which is one of the reasons you can specify
> openSearcher=false for hard commits, so you can commit frequently,
> preserving your index without having to replay and without worrying
> about the expense of opening new searchers.
>
> Best,
> Erick
>
> On Mon, May 29, 2017 at 12:47 PM, Nawab Zada Asad Iqbal
> <khi...@gmail.com> wrote:
> > Thanks Erick,
> >
> > I have read different documents in this area and I am getting confused
> due
> > to overloaded/"reused" terms.
> >
> > E.g., in that lucidworks page, the flow for an indexing request is
> > explained as follows. This makes me think that at the time of
> soft-commit,
> > the documents in preceding update requests are already flushed (might not
> > be on the disk yet, but JVM has handed over the responsibility to
> Operating
> > system). (even if we don't do it as part of soft-commit)
> >
> > "After all the leaders have responded, the originating node replies to
> the
> > client. At this point,
> >
> > *all documents have been flushed to the tlog for all the nodes in the
> > cluster!"*
> >
> > On Mon, May 29, 2017 at 7:57 AM, Erick Erickson <erickerick...@gmail.com
> >
> > wrote:
> >
> >> There's a long post here on this that might help:
> >>
> >> https://lucidworks.com/2013/08/23/understanding-
> >> transaction-logs-softcommit-and-commit-in-sorlcloud/
> >>
> >> Short form: soft commit doesn't flush tlogs, does not start a new
> >> tlog, does not close segments, does not open new segments.
> >>
> >> Hard commit does all of these things.
> >>
> >> Best,
> >> Erick
> >>
> >> On Sun, May 28, 2017 at 3:59 PM, Nawab Zada Asad Iqbal <
> khi...@gmail.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > SolrCloud document <https://wiki.apache.org/solr/NewSolrCloudDesign>
> >> > mentions:
> >> >
> >> > "The sync can be tunable e.g. flush vs fsync by default can protect
> >> against
> >> > JVM crashes but not against power failure and can be much faster "
> >> >
> >> > Does it mean that flush protects against JVM crash but not power
> failure?
> >> > While fsync will protect against both scenarios.
> >> >
> >> >
> >> > Also, this NRT help
> >> > <https://cwiki.apache.org/confluence/display/solr/Near+
> >> Real+Time+Searching>
> >> > explains soft commit as:
> >> > "A *soft commit* is much faster since it only makes index changes
> visible
> >> > and does not fsync index files or write a new index descriptor. If the
> >> JVM
> >> > crashes or there is a loss of power, changes that occurred after the
> >> last *hard
> >> > commit* will be lost."
> >> >
> >> > This is little confusing, as a soft-commit will only happen after a
> tlog
> >> > entry is flushed. Isn't it? Or doesn't tlog work differently for
> >> solrcloud
> >> > and non-solrCloud configurations.
> >> >
> >> >
> >> > Thanks
> >> > Nawab
> >>
>

Reply via email to