18.04.2017 07:27, Chris Murphy пишет: > On Mon, Apr 17, 2017 at 10:05 PM, Andrei Borzenkov <[email protected]> > wrote: >> 18.04.2017 06:50, Chris Murphy пишет: > >>>> What exactly "changes" mean? Write() syscall? >>> >>> filefrag reported entries increase, it's using FIEMAP. >>> >> >> So far it sounds like btrfs allocates new extent on every write to >> journal file. Each journal record itself is relatively small indeed. > > Hence why it would be better if there's no fsync so that it can > accumulate these and do its own commit (30s default for Btrfs) and let > them accumulate. >
It is not related to fsync. I made some tests. Journald does not appear to preallocate file nor mmap the whole file (at least as far as I can see from the source); when it appends new record it basically does fallocate (fd, end_of_file, new_size) mmap (fd, end_of_file, new_size) write to new size This results in large number of extents as each fallocate() ends up in new extent. I can easily reproduce it with small program that is using similar pattern; actually mmap is also red herring. Just fallocat'ing file in small increments gives file consisting of overly large number of extents. How exactly those extents get distributed across device probably depends on overall filesystem activity. This is different from simply writing to file at the end, which still results in several extent, but significantly larger. BTW you get the same pattern from direct IO. Writing 100M file in 4K blocks using cached writes gives me here 7 extents of size between 25M and 500K. Writing the same with direct IO results in 25600 extents (same as growing file in 4K steps with fallocate). _______________________________________________ systemd-devel mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/systemd-devel
