Re: [I] Incorrect use of fsync [lucene]

via GitHub Thu, 06 Mar 2025 06:43:02 -0800


viliam-durina commented on issue #14334:
URL: https://github.com/apache/lucene/issues/14334#issuecomment-2704000723

TL;DR: I think this issue is still relevant to Lucene today.

Explanation:

Quoting [from here](https://wiki.postgresql.org/wiki/Fsync_Errors):
> Linux 4.13 and 4.15: [fsync() only reports writeback errors that occurred
after you called
open()](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=088737f44bbf6378745f5b57b035e57ee3dc4750)
so our schemes for closing and opening files LRU-style and handing fsync()
work off to the checkpointer process can hide write-back errors; also buffers
are marked clean after errors so even if you opened the file before the
failure, retrying fsync() can falsely report success and the modified buffer
can be thrown away at any time due to memory pressure.

The [current man page for
`fsync`](https://www.man7.org/linux/man-pages/man2/fsync.2.html) says:
>fsync() transfers ("flushes") all modified in-core data of (i.e., modified
buffer cache pages for) the file referred to by the file descriptor fd to the
disk device

And in the ERRORS section:
>EIO An error occurred during synchronization. This error may
relate to data written to some other file descriptor on the
same file. Since Linux 4.13, errors from write-back will
be reported to all file descriptors that might have written
the data which triggered the error. Some filesystems
(e.g., NFS) keep close track of which data came through
which file descriptor, and give more precise reporting.
Other filesystems (e.g., most local filesystems) will
report errors to all file descriptors that were open on the
file when the error was recorded.

Mu understanding is this: if there are no errors, then `fsync` is
successful, and all dirty pages have been durably stored. But if there were
write-back errors for writes done using other file descriptors, they MAY (i.e.
are not required to) be reported when fsync-ing another descriptor. But we
require the write-back errors to be reported when fsyncing a file descriptor
opened later, which is not the case.

Intuitively, if you think about it, for how long exactly would the OS need
to report the failures? Indefinitely? Then, once a write-back fails (e.g. due
to insufficient space), you'd get errors forever for that file, even if there's
now enough space. The OS is free to reclaim the failing pages from the cache at
any time, it doesn't have to wait for further opening and fsync attempt to
report them. If you close a file without fsyncing first, then you're giving up
the guarantee of being told about any errors.

All in all, I think there's an issue here.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Incorrect use of fsync [lucene]

Reply via email to