Public bug reported:
[Impact]
In BFB version DOCA_2.6.0_BSP_4.6.0_Ubuntu_22.04-2.20240114, container deletion
via removal of its kubelet YAML from /etc/kubelet.d sometimes fails to
complete. The process waits for the container to disappear from crictl ps, but
the container remains in Running state indefinitely. This behavior is seen with
container version 2.dev.50 and FW 32.40.0324.
The issue appears to stem from a kernel bug affecting orphaned TCP sockets
stuck in a zero-window state. These sockets are not closed and timers are not
rescheduled, leading to "forever orphan" behavior that prevents resource
cleanup.
[Fix]
Backporting the upstream commit:
bac76cf89816bff06c4ec2f3df97dc34e150a1c4 ("tcp: fix forever orphan socket
caused by tcp_abort")
This commit removes a conditional check on SOCK_DEAD in tcp_abort, allowing
proper closure of orphaned sockets and preventing indefinite stalling.
Backporting is needed as the error handling and logging methods differ from the
original upstream code.
[Test Case]
Compile tested on BF 5.15.
Further testing includes reproducing the issue by removing the pod's YAML from
/etc/kubelet.d and monitoring container termination using crictl ps.
With the patch applied, the container should no longer remain stuck in Running
state.
[Regression Potential]
The patch targets a specific edge case in TCP socket handling, and after
backporting, it is as close as possible to the original upstream commit.
However, since the change removes a check that previously avoided closing
SOCK_DEAD sockets, there's a small risk if other kernel paths still rely on the
earlier behavior. This could theoretically lead to unexpected side effects in
force-close logic if assumptions about socket state are violated. Also, the
backport is not an absolute match for the original commit, and so there's a
possibility for unexpected behavior in edge cases related to socket teardown.
** Affects: ubuntu
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2114965
Title:
Ubuntu 22.04 -Container deletion hangs due to stuck orphaned TCP
socket in zero-window state
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+bug/2114965/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs