> Why would sync() do anything on tmpfs?  The s_bdi field from its
> superblock is never set to non-NULL in mm/shmem.c, so that’s not it.
> Ah, but sync_filesystems() iterates over all filesystems, not just
> those accessible from the chroot.
> 
> This sucks.  To recap:
> 
> 1. On ext4 with certain mount options, using rename() without first
>    calling fsync() to get the data on disk has an unfortunate risk of
>    clearing out a file[1].

This issue was current at the beginning of 2010, around the time Bug
#567089 was filed and discussed. It's been fixed in the kernel since
then. See http://lwn.net/Articles/322823/, and
http://lwn.net/Articles/326471/

Does it still affect the shipping Debian kernel?

> 2. On ext4 with certain mount options, using fsync() instead of sync()
>    to sync a collection of newly installed files is unacceptably
>    slow[2].

The problem here was "data=ordered". ext3 also suffered from this
problem, since its default was "data=ordered".
In brief, ONE fsync() call cost about as much as ONE sync() call.
The solution was "don't use data=ordered" (and Linus patched the
kernel to change the default) then fsync() will be suitably faster.

The bug you cite here was also around April/May when this problem was
being sorted out by the Linux kernel community.

Though this may still affects the shipping Debian kernel for
"data=ordered" mounts (I don't actually know whether they've managed
to fix data=ordered), it should no longer affect default mount
options. Is that right?

See http://lwn.net/Articles/328363/

> 3. sync() obviously does way more than we want it too, since it
>    touches files and filesystems that have nothing to do with
>    dpkg’s work.
> 
> So what should we do?  Dear kernel, we will happily provide a list
> of files we want to be renamed in place.  Can you make sure they
> have the right data without _repeatedly_ incurring the penalty of
> fsync()?

Is a solution of "mount your hard drive in a way that fsync() doesn't
hurt" a good solution? I think that was the upstream kernel
developers' decision on how to handle this.

If not, maybe postponing sync() calls further is the solution.
I.e. instead of doing it after every package, do it every 10 packages,
or just do it once at the end of an apt-get dist-upgrade.


Just a benchmark on performance with sync() versus without sync(). This test
was done on ext4 in cowbuilder chroots, with all of the packages pre-cached by
apt-cacher-ng.

# time eatmydata apt-get install --no-install-recommends openoffice.org
0 upgraded, 142 newly installed, 0 to remove and 0 not upgraded.
...
real 0m57.682s
user 0m37.030s
sys 0m7.220s

# time apt-get install --no-install-recommends openoffice.org
0 upgraded, 142 newly installed, 0 to remove and 0 not upgraded.
...
real 3m17.158s
user 0m37.186s
sys 0m11.057s




-- 
Chanoch (Ken) Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to