Yeah, it's a bit confusing. I made Yonik and Mark take me through the
process in detail in order to write that blog, misunderstandings my
fault of course ;)

bq: This makes me think that at the time of soft-commit,
the documents in preceding update requests are already flushed (might not
be on the disk yet, but JVM has handed over the responsibility to Operating
system)

True. Soft commits aren't about the tlog at all, just making docs that
are already indexed visible to  searchers. Soft commits don't have any
effect on the segment files either.

Back to your original question:

bq: Does it mean that flush protects against JVM crash but not power failure?
While fsync will protect against both scenarios.

In a word, "yes". In practice, the only time people will do an fsync
(which you can specify when you commit) is in situations where they
need to guard against the remote possibility that the bits would be
lost if the power went out during that very short interval. And you
have a one-replica system (assuming SolrCloud). And you don't have a
tlog (see below).

bq:  If the JVM crashes or there is a loss of power, changes that
occurred after the last *hard
commit* will be lost."

OK, there's a distinction between whether the tlog enabled or not.
There's nothing at all that _requires_ the tlog. So you have two
scenarios:

1> tlog not enabled. In this scenario the above is completely true.
Unless and until the hard commit is performed, documents sent to the
index are lost if there's a power outage or you kill Solr harshly. A
hard commit will close all open segments so the state of the index is
consistent. When Solr starts up it only "knows" about segments that
were closed by a hard commit.

2> tlog enabled. In this scenario, any docs written to the tlog (and
the flush/fsync discussion pertains here) then, upon restart, the Solr
node will replay docs between the last hard commit from the tlog and
no data successfully written to the tlog will be lost. Note that Solr
doesn't "know" about the unclosed segments in this case either. But
you don't care since any docs in those segments are re-indexed from
the tlog.

One implication here is that if you do _not_ hard commit, your tlogs
will grow without limit. Which is one of the reasons you can specify
openSearcher=false for hard commits, so you can commit frequently,
preserving your index without having to replay and without worrying
about the expense of opening new searchers.

Best,
Erick

On Mon, May 29, 2017 at 12:47 PM, Nawab Zada Asad Iqbal
<khi...@gmail.com> wrote:
> Thanks Erick,
>
> I have read different documents in this area and I am getting confused due
> to overloaded/"reused" terms.
>
> E.g., in that lucidworks page, the flow for an indexing request is
> explained as follows. This makes me think that at the time of soft-commit,
> the documents in preceding update requests are already flushed (might not
> be on the disk yet, but JVM has handed over the responsibility to Operating
> system). (even if we don't do it as part of soft-commit)
>
> "After all the leaders have responded, the originating node replies to the
> client. At this point,
>
> *all documents have been flushed to the tlog for all the nodes in the
> cluster!"*
>
> On Mon, May 29, 2017 at 7:57 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> There's a long post here on this that might help:
>>
>> https://lucidworks.com/2013/08/23/understanding-
>> transaction-logs-softcommit-and-commit-in-sorlcloud/
>>
>> Short form: soft commit doesn't flush tlogs, does not start a new
>> tlog, does not close segments, does not open new segments.
>>
>> Hard commit does all of these things.
>>
>> Best,
>> Erick
>>
>> On Sun, May 28, 2017 at 3:59 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > SolrCloud document <https://wiki.apache.org/solr/NewSolrCloudDesign>
>> > mentions:
>> >
>> > "The sync can be tunable e.g. flush vs fsync by default can protect
>> against
>> > JVM crashes but not against power failure and can be much faster "
>> >
>> > Does it mean that flush protects against JVM crash but not power failure?
>> > While fsync will protect against both scenarios.
>> >
>> >
>> > Also, this NRT help
>> > <https://cwiki.apache.org/confluence/display/solr/Near+
>> Real+Time+Searching>
>> > explains soft commit as:
>> > "A *soft commit* is much faster since it only makes index changes visible
>> > and does not fsync index files or write a new index descriptor. If the
>> JVM
>> > crashes or there is a loss of power, changes that occurred after the
>> last *hard
>> > commit* will be lost."
>> >
>> > This is little confusing, as a soft-commit will only happen after a tlog
>> > entry is flushed. Isn't it? Or doesn't tlog work differently for
>> solrcloud
>> > and non-solrCloud configurations.
>> >
>> >
>> > Thanks
>> > Nawab
>>

Reply via email to