On 2025-05-19 11:24, Florian Bach wrote:

Am 19.05.25 um 07:46 schrieb Antonio Russo:

There is progress on this, but it is occurring in real time.  My suggestion
is to wait for upstream to decide they understand it, finalize the PR, merge
it, and then back port the fix to 2.3.  This last step may be painful, since
the analysis may have to be re-done on the early state of the code in 2.3.

The fix has just been merged into master so it seems like upstream thinks it's 
finalized enough.

As for backporting, should be fairly easy. I took a look at the affected file, 
and that file (or at least the related code) hasn't been modified since it was 
introduced 6 years ago, so should be fairly simple to backport to whatever 
version, even way back to 2.0.0 if necessary. Unless I'm missing something?

I am not familiar with the code to be confident making any statement.  If
you believe you can make that judgement, I would recommend backporting it
yourself, running the ZTS and your own tests until you are satisfied it
works, deploy it onto your own machines, and then file a PR upstream.

This is what I always do when I believe my code is ready for production:
I put it into production.

I would recommend reducing the severity of this bug to important so that
zfs is not removed from trixie.

I thought a bug with potential data loss would be correctly classified as 
"grave", but I don't know much about how Debian usually handles situations like 
these where the bug has existed for a long time already. I'm happy to downgrade this to 
important (if necessary) if the package will be removed from trixie otherwise. But right 
now, at least to me, there should still be enough time to get this fixed, right?

If upstream is satisfied with a backport to 2.3, I would think it is
absolutely ready for trixie.  The reason I am urging caution is that
this bug is very well understood at this point: a very specific
workload causes encrypted snapshots sent non-raw to gain corruption
(and, as far as I can tell) does not readily cause corruption to
filesystem data outside of the snapshots.  People exposed to this
issue already have mitigations in place. (I myself rebuilt all my
pools without encryption, for example.)

On the other hand, reasoning about locks and threads in a complicated
system is very challenging.  If there is an issue in the patch, it
could cause another, possibly worse, issue.

I think the proper balance is to trust upstream's judgement---they are
indeed the experts.  They are absolutely interested in getting this fix
onto the 2.3 branch, but they are similarly motivated to avoid
introducing a new bug, or a buggy backport.

Notice that I am not saying we should wait for 2.3.3 to be released:
just for the patch to show up on zfs-2.3.3-staging (though I suspect it
will not sit unreleased on that branch for very long).  I am hopeful
that this happens relatively soon (but I have no special information
about that).

Best,
Antonio

Reply via email to