On Wed, 27 May 2026 16:21:50 -0300 Ricardo Robaina wrote:
> When auditd is bottlenecked (e.g., by slow disk I/O), kauditd blocks
> on the netlink socket wait queue (nlk->wait). If the wait timeout
> fully expires (timeo == 0), netlink mistakenly interprets the zeroed
> timeout as a non-blocking request on its next retry iteration. It
> then triggers netlink_overrun that drops the event and poisons the
> socket with ENOBUFS. This bypasses the audit subsystem's internal
> retry backlog and falsely returns an error to user-space:
> 
>  auditd[]: Error receiving audit netlink packet (No buffer space available)
> 
> Unlike standard netlink users, the audit subsystem has a hard
> requirement to never silently drop security records. It uses a short
> finite socket timeout (sk_sndtimeo = HZ/10) to escape a stalled
> auditd and safely requeue the message internally. However, once
> netlink_overrun() executes, the ENOBUFS state is set on the
> receiving socket, and the audit subsystem has no mechanism to
> intercept or clear this from the outside.

This provides no improvement over v2, let's keep discussion on the v2
thread.

Reply via email to