On 2016-07-05 05:28, Joerg Schilling wrote:
Andreas Dilger <adil...@dilger.ca> wrote:
I think in addition to fixing btrfs (because it needs to work with existing
tar/rsync/etc. tools) it makes sense to *also* fix the heuristics of tar
to handle this situation more robustly. One option is if st_blocks == 0 then
tar should also check if st_mtime is less than 60s in the past, and if yes
then it should call fsync() on the file to flush any unwritten data to disk,
or assume the file is not sparse and read the whole file, so that it doesn't
incorrectly assume that the file is sparse and skip archiving the file data.
A broken filesystem is a broken filesystem.
If you try to change gtar to work around a specific problem, it may fail in
other situations.
The problem with this is that tar is assuming things that are not
guaranteed to be true. There is absolutely nothing that says that
st_blocks has to be non-zero if there's data in the file. In fact, the
behavior that BTRFS used to have of reporting st_blocks to be 0 for
files entirely inlined in the metadata is absolutely correct given the
description of the field by POSIX, because there _are_ no blocks
allocated to the file (because the metadata block is technically
equivalent to the inode, which isn't counted by st_blocks). This is yet
another example of an old interface (in this case, sparse file
detection) being short-sighted (read in this case as non-existent).
The proper fix for this is that tar (and anything else that handles
sparse files differently) should be parsing the file regardless. It has
to anyway for a normal sparse file to figure out where the sparse
regions are, and optimizing for a file that's completely sparse (and
therefore probably pre-allocated with fallocate) is not all that
reasonable considering that this is going to be a very rare case in
normal usage.