* Theodore Ts'o:

> On Thu, Dec 26, 2024 at 01:19:34PM -0500, Michael Stone wrote:
>> Further reading: look at the auto_da_alloc option in ext4. Note that it says
>> that doing the rename without the sync is wrong, but there's now a heuristic
>> in ext4 that tries to insert an implicit sync when that anti-pattern is used
>> (because so much data got eaten when people did the wrong thing). By leaning
>> on that band-aid dpkg might get away with skipping the sync, but doing so
>> would require assuming a filesystem for which that implicit guarantee is
>> available. If you're on a different filesystem or a different kernel all
>> bets would be off. I don't know how much difference skipping the fsync's
>> makes these days if they get done implicitly.
>
> Note that it's not a sync, but rather, under certain circumstances, we
> initiate writeback --- but we don't wait for it to complete before
> allowing the close(2) or rename(2) to complete.  For close(2), we will
> initiate a writeback on a close if the file descriptor was opened
> using O_TRUNC and truncate took place to throw away the previous
> contents of the file.  For rename(2), if you rename on top of a
> previously existing file, we will initiate the writeback right away.
> This was a tradeoff between safety and performance, and this was done
> because there was an awful lot of buggy applications out there which
> didn't use fsync, and the number of application programmers greatly
> outnumbered the file system programmer.  This was a compromise that
> was discussed at a Linux Storage, File Systems, and Memory Management
> (LSF/MM) conference many years ago, and I think other file systems
> like btrfs and xfs had agreed in principle that this was a good thing
> to do --- but I can't speak to whether they actually implemented it.

As far as I know, XFS still truncates files with pending writes during
mount if the file system was not unmounted cleanly.  This means that
renaming for atomic replacement does not work reliably without fsync.
(But I'm not a file system developer.)

> So what what dpkg could do is whenever there is a file that dpkg
> would need to overwrite, to write it out to "filename.dpkg-new-$pid"
> and keep a list of all the files.  After all of the files are
> written out, call syncfs(2) --- on Linux, syncfs(2) is synchronous,
> although POSIX does not guarantee that the writes will be written
> and stable at the time that syncfs(2) returns.  But that should be
> OK, since Debian GNU/kFreeBSD is no longer a thing.  Only after
> syncfs(2) returns, do you rename all of the dpkg-new files to the
> final location on disk.

Does syncfs work for network file systems?

Maybe a more targeted approach with a first pass of sync_file_range
with SYNC_FILE_RANGE_WRITE, followed by a second pass with fsync would
work?

Reply via email to