El 2023-04-17 12:43, Pawel Jakub Dawidek escribió:
On 4/17/23 18:15, Pawel Jakub Dawidek wrote:
There were three issues that I know of after the recent OpenZFS merge:
1. Data corruption unrelated to block cloning, so it can happen even
with block cloning disabled or not in use. This was the problematic
commit:
https://github.com/openzfs/zfs/commit/519851122b1703b8445ec17bc89b347cea965bb9
It was reverted in 63ee747febbf024be0aace61161241b53245449e.
2. Data corruption with embedded blocks when block cloning is enabled.
It can happen when compression is enabled and the block contains
between 60 to 112 bytes (this might be hard to determine). Fix exists,
it is merged to OpenZFS already, but isn't in FreeBSD yet.
OpenZFS pull request: https://github.com/openzfs/zfs/pull/14739
3. Panic on VERIFY(zil_replaying(zfsvfs->z_log, tx)). This is
triggered when block cloning is enabled, the sync property is set to
disabled and copy_file_range(2) is used. Easy fix exists, it is not
yet merged to OpenZFS and not yet in FreeBSD HEAD.
OpenZFS pull request: https://github.com/openzfs/zfs/pull/14758
Block cloning was disabled in
46ac8f2e7d9601311eb9b3cd2fed138ff4a11a66, so 2 and 3 should not occur.
As of 068913e4ba3dd9b3067056e832cefc5ed264b5cc all known issues are
fixed, as far as I can tell.
Block cloning remains disabled for now just to be on the safe side,
but can be enabled by setting sysctl vfs.zfs.bclone_enabled to 1.
Don't relay on this sysctl as it will be removed in 2-3 weeks.
Hi Pawel,
thank you for your reply and for the fixes.
I think there is a 4th issue that needs to be addressed: how do we
recover from the worst case scenario which is a machine with a kernel >
2a58b312b62f and ZFS root upgraded with block cloning enabled.
In particular, is it safe to turn such a machine on in the first place,
and what are the risks involved in doing so? Any potential data loss?
Would such a machine be able to fix itself by compiling a kernel, or
would compilation fail and might data be corrupted in the process?
I have two poudriere builders powered off (I am not alone in this
situation) and I need to recover them, ideally minimizing data loss. The
builders are also hosting current and used to build kernels and worlds
for 13 and current: as of now all my production machines are stuck on
the 13 they run, I cannot update binaries nor packages and I would like
to be back online.
Whatever the fixing procedure, it shall be outlined in the UPDATING
document.
Thank you.
BR,
--
José Pérez