Public bug reported:

[Impact]
In BFB version DOCA_2.6.0_BSP_4.6.0_Ubuntu_22.04-2.20240114, container deletion 
via removal of its kubelet YAML from /etc/kubelet.d sometimes fails to 
complete. The process waits for the container to disappear from crictl ps, but 
the container remains in Running state indefinitely. This behavior is seen with 
container version 2.dev.50 and FW 32.40.0324.
The issue appears to stem from a kernel bug affecting orphaned TCP sockets 
stuck in a zero-window state. These sockets are not closed and timers are not 
rescheduled, leading to "forever orphan" behavior that prevents resource 
cleanup.

[Fix]
Backporting the upstream commit:
bac76cf89816bff06c4ec2f3df97dc34e150a1c4 ("tcp: fix forever orphan socket 
caused by tcp_abort")
This commit removes a conditional check on SOCK_DEAD in tcp_abort, allowing 
proper closure of orphaned sockets and preventing indefinite stalling. 
Backporting is needed as the error handling and logging methods differ from the 
original upstream code.

[Test Case]
Compile tested on BF 5.15.
Further testing includes reproducing the issue by removing the pod's YAML from 
/etc/kubelet.d and monitoring container termination using crictl ps.
With the patch applied, the container should no longer remain stuck in Running 
state.

[Regression Potential]
The patch targets a specific edge case in TCP socket handling, and after 
backporting, it is as close as possible to the original upstream commit. 
However, since the change removes a check that previously avoided closing 
SOCK_DEAD sockets, there's a small risk if other kernel paths still rely on the 
earlier behavior. This could theoretically lead to unexpected side effects in 
force-close logic if assumptions about socket state are violated. Also, the 
backport is not an absolute match for the original commit, and so there's a 
possibility for unexpected behavior in edge cases related to socket teardown.

** Affects: ubuntu
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2114965

Title:
  Ubuntu 22.04 -Container deletion hangs due to stuck orphaned TCP
  socket in zero-window state

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+bug/2114965/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to