** Description changed: [Impact] - * Users of the Linux kernel's crypto userspace API - reported BUG() / kernel NULL pointer dereference - errors after kernel upgrades. + * Users of the Linux kernel's crypto userspace API + reported BUG() / kernel NULL pointer dereference + errors after kernel upgrades. - * The stack trace signature is an accept() syscall - going through af_alg_accept() and hitting errors - usually in one of: - - apparmor_sk_clone_security() - - apparmor_sock_graft() - - release_sock() + * The stack trace signature is an accept() syscall + going through af_alg_accept() and hitting errors + usually in one of: + - apparmor_sk_clone_security() + - apparmor_sock_graft() + - release_sock() [Fix] - - * This is a regression introduced by upstream commit - 37f96694cf73 ("crypto: af_alg - Use bh_lock_sock - in sk_destruct") which made its way through stable. - - * The offending patch allows the critical regions - of af_alg_accept() and af_alg_release_parent() to - run concurrently; now with the "right" events on 2 - CPUs it might drop the non-atomic reference counter - of the alg_sock then the sock, thus release a sock - that is still in use. - * The fix is upstream commit 34c86f4c4a7b ("crypto: - af_alg - fix use-after-free in af_alg_accept() due - to bh_lock_sock()") [1]. It changes alg_sock's ref - counter to atomic, which addresses the root cause. - + * This is a regression introduced by upstream commit + 37f96694cf73 ("crypto: af_alg - Use bh_lock_sock + in sk_destruct") which made its way through stable. + + * The offending patch allows the critical regions + of af_alg_accept() and af_alg_release_parent() to + run concurrently; now with the "right" events on 2 + CPUs it might drop the non-atomic reference counter + of the alg_sock then the sock, thus release a sock + that is still in use. + + * The fix is upstream commit 34c86f4c4a7b ("crypto: + af_alg - fix use-after-free in af_alg_accept() due + to bh_lock_sock()") [1]. It changes alg_sock's ref + counter to atomic, which addresses the root cause. + [Test Case] - * There is a synthetic test case available, which - uses a kprobes kernel module to synchronize the - concurrent CPUs on the instructions responsible - for the problem; and a userspace part to run it. + * There is a synthetic test case available, which + uses a kprobes kernel module to synchronize the + concurrent CPUs on the instructions responsible + for the problem; and a userspace part to run it. - * The organic reproducer is the Varnish Cache Plus - software with the Crypto vmod (which uses kernel - crypto userspace API) under long, very high load. - - * The patch has been verified on both reproducers - with the 4.15 and 5.7 kernels. - + * The organic reproducer is the Varnish Cache Plus + software with the Crypto vmod (which uses kernel + crypto userspace API) under long, very high load. + + * The patch has been verified on both reproducers + with the 4.15 and 5.7 kernels. + * More tests performed with 'stress-ng --af-alg' - with 11 CPUs/hogs on Bionic/Disco/Eoan/Focal + with 11 CPUs on Xenial/Bionic/Disco/Eoan/Focal (all on same version of stress-ng, V0.11.14) + No regressions observed from original kernel. (the af-alg stressor can exercise almost all kernel crypto modules shipped with the kernel; so it checks more paths/crypto alg interfaces.) - + [Regression Potential] - * The fix patch does a fundamental change in how - alg_sock reference counters work, plus another - change to the 'nokey' counting. This of course - *has* a risk of regression. + * The fix patch does a fundamental change in how + alg_sock reference counters work, plus another + change to the 'nokey' counting. This of course + *has* a risk of regression. - * Regressions theoretically could manifest as use - after free errors (in case of undercounting) in - the af_alg functions or silent memory leaks (in - case of overcounting), but also other behaviors - since reference counting is key to many things. - - * FWIW, this patch has been written by the crypto - subsystem maintainer, who certainly knows a lot - of the normal and corner cases, thus giving the - patch more credit. - - * Testing with the organic reproducer ran as long - as 5 days, without issues, so it does look good. + * Regressions theoretically could manifest as use + after free errors (in case of undercounting) in + the af_alg functions or silent memory leaks (in + case of overcounting), but also other behaviors + since reference counting is key to many things. + + * FWIW, this patch has been written by the crypto + subsystem maintainer, who certainly knows a lot + of the normal and corner cases, thus giving the + patch more credit. + + * Testing with the organic reproducer ran as long + as 5 days, without issues, so it does look good. [Other Info] - * [1] Patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34c86f4c4a7be3b3e35aa48bd18299d4c756064d - + * [1] Patch: + https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34c86f4c4a7be3b3e35aa48bd18299d4c756064d + [Stack Trace Examples] Examples: - BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 - ... - RIP: 0010:apparmor_sk_clone_security+0x26/0x70 - ... - Call Trace: - security_sk_clone+0x33/0x50 - af_alg_accept+0x81/0x1c0 [af_alg] - alg_accept+0x15/0x20 [af_alg] - SYSC_accept4+0xff/0x210 - SyS_accept+0x10/0x20 - do_syscall_64+0x73/0x130 - entry_SYSCALL_64_after_hwframe+0x3d/0xa2 + BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 + ... + RIP: 0010:apparmor_sk_clone_security+0x26/0x70 + ... + Call Trace: + security_sk_clone+0x33/0x50 + af_alg_accept+0x81/0x1c0 [af_alg] + alg_accept+0x15/0x20 [af_alg] + SYSC_accept4+0xff/0x210 + SyS_accept+0x10/0x20 + do_syscall_64+0x73/0x130 + entry_SYSCALL_64_after_hwframe+0x3d/0xa2 - general protection fault: 0000 [#1] SMP PTI - ... - RIP: 0010:__release_sock+0x54/0xe0 - ... - Call Trace: - release_sock+0x30/0xa0 - af_alg_accept+0x122/0x1c0 [af_alg] - alg_accept+0x15/0x20 [af_alg] - SYSC_accept4+0xff/0x210 - SyS_accept+0x10/0x20 - do_syscall_64+0x73/0x130 - entry_SYSCALL_64_after_hwframe+0x3d/0xa2 + general protection fault: 0000 [#1] SMP PTI + ... + RIP: 0010:__release_sock+0x54/0xe0 + ... + Call Trace: + release_sock+0x30/0xa0 + af_alg_accept+0x122/0x1c0 [af_alg] + alg_accept+0x15/0x20 [af_alg] + SYSC_accept4+0xff/0x210 + SyS_accept+0x10/0x20 + do_syscall_64+0x73/0x130 + entry_SYSCALL_64_after_hwframe+0x3d/0xa2
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1884766 Title: use-after-free in af_alg_accept() due to bh_lock_sock() Status in linux package in Ubuntu: Confirmed Status in linux source package in Xenial: New Status in linux source package in Bionic: New Status in linux source package in Eoan: New Status in linux source package in Focal: New Status in linux source package in Groovy: Confirmed Bug description: [Impact] * Users of the Linux kernel's crypto userspace API reported BUG() / kernel NULL pointer dereference errors after kernel upgrades. * The stack trace signature is an accept() syscall going through af_alg_accept() and hitting errors usually in one of: - apparmor_sk_clone_security() - apparmor_sock_graft() - release_sock() [Fix] * This is a regression introduced by upstream commit 37f96694cf73 ("crypto: af_alg - Use bh_lock_sock in sk_destruct") which made its way through stable. * The offending patch allows the critical regions of af_alg_accept() and af_alg_release_parent() to run concurrently; now with the "right" events on 2 CPUs it might drop the non-atomic reference counter of the alg_sock then the sock, thus release a sock that is still in use. * The fix is upstream commit 34c86f4c4a7b ("crypto: af_alg - fix use-after-free in af_alg_accept() due to bh_lock_sock()") [1]. It changes alg_sock's ref counter to atomic, which addresses the root cause. [Test Case] * There is a synthetic test case available, which uses a kprobes kernel module to synchronize the concurrent CPUs on the instructions responsible for the problem; and a userspace part to run it. * The organic reproducer is the Varnish Cache Plus software with the Crypto vmod (which uses kernel crypto userspace API) under long, very high load. * The patch has been verified on both reproducers with the 4.15 and 5.7 kernels. * More tests performed with 'stress-ng --af-alg' with 11 CPUs on Xenial/Bionic/Disco/Eoan/Focal (all on same version of stress-ng, V0.11.14) No regressions observed from original kernel. (the af-alg stressor can exercise almost all kernel crypto modules shipped with the kernel; so it checks more paths/crypto alg interfaces.) [Regression Potential] * The fix patch does a fundamental change in how alg_sock reference counters work, plus another change to the 'nokey' counting. This of course *has* a risk of regression. * Regressions theoretically could manifest as use after free errors (in case of undercounting) in the af_alg functions or silent memory leaks (in case of overcounting), but also other behaviors since reference counting is key to many things. * FWIW, this patch has been written by the crypto subsystem maintainer, who certainly knows a lot of the normal and corner cases, thus giving the patch more credit. * Testing with the organic reproducer ran as long as 5 days, without issues, so it does look good. [Other Info] * [1] Patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34c86f4c4a7be3b3e35aa48bd18299d4c756064d [Stack Trace Examples] Examples: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 ... RIP: 0010:apparmor_sk_clone_security+0x26/0x70 ... Call Trace: security_sk_clone+0x33/0x50 af_alg_accept+0x81/0x1c0 [af_alg] alg_accept+0x15/0x20 [af_alg] SYSC_accept4+0xff/0x210 SyS_accept+0x10/0x20 do_syscall_64+0x73/0x130 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 general protection fault: 0000 [#1] SMP PTI ... RIP: 0010:__release_sock+0x54/0xe0 ... Call Trace: release_sock+0x30/0xa0 af_alg_accept+0x122/0x1c0 [af_alg] alg_accept+0x15/0x20 [af_alg] SYSC_accept4+0xff/0x210 SyS_accept+0x10/0x20 do_syscall_64+0x73/0x130 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1884766/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp