Public bug reported:
[Impact]
ZFS pools become completely unresponsive, with inflight I/O stalling with
kernel spews similar to the one below:
crash> bt -s 835
PID: 835 TASK: ffff9ef78c6d2880 CPU: 1 COMMAND: "txg_quiesce"
#0 [ffffaf7242e53ce8] __schedule+648 at ffffffffbcc01248
#1 [ffffaf7242e53d90] schedule+46 at ffffffffbcc0165e
#2 [ffffaf7242e53db0] cv_wait_common+258 at ffffffffc05224a2 [spl]
#3 [ffffaf7242e53e18] __cv_wait+21 at ffffffffc0522505 [spl]
#4 [ffffaf7242e53e28] txg_quiesce+384 at ffffffffc06f3f70 [zfs]
#5 [ffffaf7242e53e78] txg_quiesce_thread+205 at ffffffffc06f40bd [zfs]
#6 [ffffaf7242e53ec0] thread_generic_wrapper+100 at ffffffffc052d314 [spl]
#7 [ffffaf7242e53ee8] kthread+214 at ffffffffbbb32ce6
#8 [ffffaf7242e53f28] ret_from_fork+70 at ffffffffbba66b76
#9 [ffffaf7242e53f50] ret_from_fork_asm+27 at ffffffffbba052ab
This typically happens when creating new files on ZFS pools with a high
objnum count beyond 2^32 values. Due to a bug in the object allocation
function dmu_object_alloc_impl(), values beyond the 32-bit threshold get
silently truncated causing the function to keep trying to allocate space
in chunks that are already full.
[Test Plan]
We've been able to consistently reproduce this on ZFS pools with very high
object number count. Using the attached zfs_write_unified.py script, we can
cause a pool to hang due to this bug within a couple of days. Below is a high
level summary of the test procedure:
1. Create a ZFS pool with total capacity above 2TB (this is required so that we
can hit the high objnum count):
ubuntu@wringer-wooster:~$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
pooltest 6.22T 660G 96K /pooltest
pooltest/data 6.21T 660G 6.21T /pooltest/data
2. Run zfs_write_unified.py against the test pool:
ubuntu@wringer-wooster:~# python3 zfs_write_unified.py . $(nproc)
3. Monitor pool throughput through `zfs iostat` or similar, until no new
transactions get sync'ed to disk (or until a similar kernel spew to the
one above starts getting logged)
Once the pool has enough objects, the problem will manifest almost
immediately. It's very easy to verify this fix by running
zfs_write_unified.py on an affected pool, as `zpool iostat` will report
disk activity.
[Where problems could occur]
The fix is fairly straightforward, as we're changing the P2ALIGN macro to an
equivalent that is able to handle typecast values above 32 bits
(P2ALIGN_TYPED). This shouldn't affect current existing pools, as this code is
only exercised when creating new objects (files, directories, snapshots, etc).
We should test the write path extensively after this change, to make
sure there are no other hangs when using the new P2ALIGN_MACRO. Any
potential regressions due to this will affect the object allocation
path, so we should see similar kernel spews stating that `txg_sync` or
`txg_quiesce` are hanging:
[179404.940783] INFO: task txg_quiesce:2203494 blocked for more than 122
seconds.
[179404.944987] Tainted: P OE 6.8.0-1020-aws
#22~22.04.1-Ubuntu
[179404.949205] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[Other info]
This fix has been upstream since May/2024, and is included with ZFS releases
starting with 2.2.5. As such, Jammy and Noble are affected, and releases
starting with Oracular already have this fix.
** Affects: zfs-linux (Ubuntu)
Importance: High
Assignee: Heitor Alves de Siqueira (halves)
Status: Fix Released
** Affects: zfs-linux (Ubuntu Jammy)
Importance: High
Assignee: Heitor Alves de Siqueira (halves)
Status: In Progress
** Affects: zfs-linux (Ubuntu Noble)
Importance: High
Assignee: Heitor Alves de Siqueira (halves)
Status: In Progress
** Also affects: zfs-linux (Ubuntu Noble)
Importance: Undecided
Status: New
** Also affects: zfs-linux (Ubuntu Jammy)
Importance: Undecided
Status: New
** Changed in: zfs-linux (Ubuntu)
Status: New => Fix Released
** Changed in: zfs-linux (Ubuntu Jammy)
Importance: Undecided => High
** Changed in: zfs-linux (Ubuntu Noble)
Importance: Undecided => High
** Changed in: zfs-linux (Ubuntu Jammy)
Status: New => In Progress
** Changed in: zfs-linux (Ubuntu Noble)
Status: New => In Progress
** Changed in: zfs-linux (Ubuntu Jammy)
Assignee: (unassigned) => Heitor Alves de Siqueira (halves)
** Changed in: zfs-linux (Ubuntu Noble)
Assignee: (unassigned) => Heitor Alves de Siqueira (halves)
** Changed in: zfs-linux (Ubuntu)
Assignee: (unassigned) => Heitor Alves de Siqueira (halves)
** Changed in: zfs-linux (Ubuntu)
Importance: Undecided => High
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to zfs-linux in Ubuntu.
https://bugs.launchpad.net/bugs/2115683
Title:
ZFS hangs when writing to pools with high object count
Status in zfs-linux package in Ubuntu:
Fix Released
Status in zfs-linux source package in Jammy:
In Progress
Status in zfs-linux source package in Noble:
In Progress
Bug description:
[Impact]
ZFS pools become completely unresponsive, with inflight I/O stalling with
kernel spews similar to the one below:
crash> bt -s 835
PID: 835 TASK: ffff9ef78c6d2880 CPU: 1 COMMAND: "txg_quiesce"
#0 [ffffaf7242e53ce8] __schedule+648 at ffffffffbcc01248
#1 [ffffaf7242e53d90] schedule+46 at ffffffffbcc0165e
#2 [ffffaf7242e53db0] cv_wait_common+258 at ffffffffc05224a2 [spl]
#3 [ffffaf7242e53e18] __cv_wait+21 at ffffffffc0522505 [spl]
#4 [ffffaf7242e53e28] txg_quiesce+384 at ffffffffc06f3f70 [zfs]
#5 [ffffaf7242e53e78] txg_quiesce_thread+205 at ffffffffc06f40bd [zfs]
#6 [ffffaf7242e53ec0] thread_generic_wrapper+100 at ffffffffc052d314
[spl]
#7 [ffffaf7242e53ee8] kthread+214 at ffffffffbbb32ce6
#8 [ffffaf7242e53f28] ret_from_fork+70 at ffffffffbba66b76
#9 [ffffaf7242e53f50] ret_from_fork_asm+27 at ffffffffbba052ab
This typically happens when creating new files on ZFS pools with a
high objnum count beyond 2^32 values. Due to a bug in the object
allocation function dmu_object_alloc_impl(), values beyond the 32-bit
threshold get silently truncated causing the function to keep trying
to allocate space in chunks that are already full.
[Test Plan]
We've been able to consistently reproduce this on ZFS pools with very high
object number count. Using the attached zfs_write_unified.py script, we can
cause a pool to hang due to this bug within a couple of days. Below is a high
level summary of the test procedure:
1. Create a ZFS pool with total capacity above 2TB (this is required so that
we can hit the high objnum count):
ubuntu@wringer-wooster:~$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
pooltest 6.22T 660G 96K /pooltest
pooltest/data 6.21T 660G 6.21T /pooltest/data
2. Run zfs_write_unified.py against the test pool:
ubuntu@wringer-wooster:~# python3 zfs_write_unified.py . $(nproc)
3. Monitor pool throughput through `zfs iostat` or similar, until no
new transactions get sync'ed to disk (or until a similar kernel spew
to the one above starts getting logged)
Once the pool has enough objects, the problem will manifest almost
immediately. It's very easy to verify this fix by running
zfs_write_unified.py on an affected pool, as `zpool iostat` will
report disk activity.
[Where problems could occur]
The fix is fairly straightforward, as we're changing the P2ALIGN macro to an
equivalent that is able to handle typecast values above 32 bits
(P2ALIGN_TYPED). This shouldn't affect current existing pools, as this code is
only exercised when creating new objects (files, directories, snapshots, etc).
We should test the write path extensively after this change, to make
sure there are no other hangs when using the new P2ALIGN_MACRO. Any
potential regressions due to this will affect the object allocation
path, so we should see similar kernel spews stating that `txg_sync` or
`txg_quiesce` are hanging:
[179404.940783] INFO: task txg_quiesce:2203494 blocked for more than 122
seconds.
[179404.944987] Tainted: P OE 6.8.0-1020-aws
#22~22.04.1-Ubuntu
[179404.949205] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[Other info]
This fix has been upstream since May/2024, and is included with ZFS releases
starting with 2.2.5. As such, Jammy and Noble are affected, and releases
starting with Oracular already have this fix.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/2115683/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp