On Mon, May 18, 2026 at 8:35 PM Jakub Kicinski <[email protected]> wrote: > On Wed, 13 May 2026 14:24:43 -0300 Ricardo Robaina wrote: > > When auditd is bottlenecked (e.g., by slow disk I/O), kauditd blocks on > > the netlink socket. > > Holding socket lock during slow IO sounds very wrong. One could say - > that's abuse of the socket lock?
It's no different than any other kernel subsystem sending netlink packets to userspace, although in some configurations the rate at which audit sends netlink traffic is likely much higher than the majority of netlink users. Arguably, audit probably never should have used netlink, but that decision happened a long time ago and there were other issues complicating the decision. > > If the wait timeout fully expires (timeo == 0), > > netlink mistakenly interprets the zeroed timeout as a non-blocking > > request. It then triggers netlink_overrun that drops the event, > > completely bypassing the audit subsystem's internal retry queue, and > > falsely returns ENOBUFS to user-space, resulting in the following error: > > > > auditd[]: Error receiving audit netlink packet (No buffer space available) > > > > Fix this by detecting when a blocking sender's timeout has expired > > (timeo == 0 && !nonblock) in netlink_unicast(). In this case, instead > > of retrying with timeo=0 (which would incorrectly trigger netlink_overrun > > on the next iteration), safely free the skb and return -EAGAIN, allowing > > the audit subsystem to gracefully enqueue the pending event into its > > internal backlog. > > The socket _is_ the queue, normally. There is a joke in there about audit and "normal", but I'll leave that as an exercise for the reader. I will say that audit has a lot of unique requirements regarding queue management and that dictates a lot of the wacky stuff audit has to do with it's record queue; the standard socket buffer functionality doesn't have everything, and I wouldn't want to ask for it to be augmented in a way that satisfies audit. > Please explore fixing this in audit? Ricardo, I was kinda hoping not to have to do this in audit, but I think you can probably get away with just open-coding netlink_unicast() in audit and then going from there ... we might want to do some other things differently, but let's see what a basic patch looks like before we spend a lot time redesigning it. -- paul-moore.com

