On 9/7/18 5:44 PM, Bruno Haible wrote:
Eric Blake wrote:
MacOS comes along, and now both queries return a
different offset than your input, but neither fails. If you optimized
by calling SEEK_DATA first, you end up treating the current offset as a
hole (data loss). And if you make both calls looking for the
POSIX-specified patterns, your logic can be thrown off
How about one of these approaches?
(a) Put some knowledge about the extent boundaries into the code.
I.e. round offset down to the previous extend boundary before the
two lseek calls?
How do you determine at runtime where an extent boundary begins, short
of making additional syscalls? If you're going to be turning one
syscall into multiple in order to work around the flaw, you might as
well try to minimize the work...
(b) Evaluate both
d = lseek (fd, SEEK_DATA, offset);
h = lseek (fd, SEEK_HOLE, offset);
and since you don't know anything about the range from offset
to min(d,h)-1, assume it's data.
Then, if d < h, you have a data block, or if h < d, you have a hole.
...and thus, this is the minimal solution anyways (ie. merely declaring
that trying to optimize by making only one of the two calls is going to
fail because of MacOS data semantics and Solaris hole-at-EOF semantics).
(c) for (int i = 1; i <= 16; i++)
{
unsigned long o = max (offset - (1 << i), 0);
d = lseek (fd, SEEK_DATA, o);
h = lseek (fd, SEEK_HOLE, o);
if (d < offset || h < offset)
break;
}
This seems like an implementation that tries to do (a) - but look at how
many syscalls it is adding instead of just going straight to (b).
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org