On 27/09/2024 18:18, Matthew Wilcox wrote:
Package: coreutils
Version: 9.4-3.1
strace cp --sparse=always dd dd-sparse
[extraneous stuff skipped]
openat(AT_FDCWD, "dd", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0755, st_size=134248, ...}) = 0
openat(AT_FDCWD, "dd-sparse", O_WRONLY|O_CREAT|O_EXCL, 0755) = 4
ioctl(4, BTRFS_IOC_CLONE or FICLONE, 3) = 0
close(4) = 0
close(3) = 0
It makes no attempt to look for sparse regions in the file, just uses
FICLONE (which succeeds because it's on XFS).
I worked around this by copying to /tmp, which is on a different
filesystem (tmpfs):
Without looking at the source code, it seems likely that cp blindly
tries FICLONE without checking to see whether the sparse flag is set.
I suggest that setting --sparse=always should disable the FICLONE
optimisation.
I see your point, however `cp --reflink=auto --sparse=always`
was documented as the way to make a copy taking the least
amount of space supported by the file system.
That would be a more common use case than
making a separate copy as sparse as possible.
--reflink=auto is the default since coreutils 9.0,
and it could be confusing to behave differently
depending on whether --reflink=auto was explicitly
or implicitly specified (considering aliases etc.)
In general it's difficult to reason about the combination
of factors impacting how a file is copied. For e.g. the sparseness
of the file, what file system it is on, what file system the destination is on,
the attributes of the file, and whether they're being copied or not.
Also the --reflink and --sparse options complicate things further.
To help, we added the --debug option to cp (and install and mv),
to explain how a file was being copied:
$ cp --debug --sparse=always dd dd-sparse
'dd' -> 'dd-sparse'
copy offload: unknown, reflink: yes, sparse detection: unknown
$ cp --debug --reflink=never --sparse=always dd dd-sparse
'dd' -> 'dd-sparse'
copy offload: avoided, reflink: no, sparse detection: zeros
thanks,
Pádraig