On Wed, Mar 04, 2026 at 01:57:31PM -0500, Demi Marie Obenour wrote:
> On 3/4/26 08:03, Christian Brauner wrote:
> > On Wed, Mar 04, 2026 at 01:53:42AM -0500, Demi Marie Obenour wrote:
> >> I noticed potentially missing input sanitization in dma_buf_set_name(),
> >> which is reachable from DMA_BUF_SET_NAME. This allows inserting a name
> >> containing a newline, which is then used to construct the contents of
> >> /proc/PID/task/TID/fdinfo/FD. This could confuse userspace programs
> >> that access this data, possibly tricking them into thinking a file
> >> descriptor is of a different type than it actually is.
> >>
> >> Other code might have similar bugs. For instance, there is code that
> >> uses a sysfs path, a driver name, or a device name from /dev. It is
> >> possible to sanitize the first, and the second and third should come
> >> from trusted sources within the kernel itself. The last area where
> >> I found a potential problem is BPF. I don't know if this can happen.
> >>
> >> I think this should be fixed by either sanitizing data on write
> >> (by limiting the allowed characters in dma_buf_set_name()), on read
> >> (by using one of the formats that escapes special characters), or both.
> >>
> >> Is there a better way to identify that a file descriptor is of
> >> a particular type, such as an eventfd? fdinfo is subject to
> >
> > The problem is that most of the anonymous inodes share a single
> > anonymous inode so any uapi that returns information based inode->i_op
> > is not going to be usable.
> >
> >> bugs of this type, which might happen again. readlink() reports
> >> "anon_inode:[eventfd]" and S_IFMT reports a mode of 0, but but my
> >
> > That is definitely uapi by now. We've tried to change S_IFMT and it
> > breaks lsfd and other tools so we can't reasonably change it. In fact,
> > pidfds pretend to be anon_inode even though they're not simply because
> > some tools parse that out.
>
> Does Linux guarantee that anything that is not an anonymous inode
> will have (st_mode & S_IFMT) != 0?
Ignoring bugs or disk corruption anonymous inodes should be the only
inode type that has a zero type. Everything else should have a non-zero
type and the I made the VFS splat in may_open():
switch (inode->i_mode & S_IFMT) {
case S_IFLNK:
return -ELOOP;
case S_IFDIR:
if (acc_mode & MAY_WRITE)
return -EISDIR;
if (acc_mode & MAY_EXEC)
return -EACCES;
break;
case S_IFBLK:
case S_IFCHR:
if (!may_open_dev(path))
return -EACCES;
fallthrough;
case S_IFIFO:
case S_IFSOCK:
if (acc_mode & MAY_EXEC)
return -EACCES;
flag &= ~O_TRUNC;
break;
case S_IFREG:
if ((acc_mode & MAY_EXEC) && path_noexec(path))
return -EACCES;
break;
default:
VFS_BUG_ON_INODE(!IS_ANON_FILE(inode), inode);
}
> Maybe it is time for a prctl that disables this legacy behavior?
I've switched anonymous inodes internally to S_IFREG a while ago in [1]
and then masked it off for userspace. Even just the internal conversion
caused various subsystems like io_uring to lose it which is why we
reverted it in [2].
So any next attempt needs to ensure that there are no internal and no
external regressions. And no prctl()s please. It's a strong contender
for Linux' main landfill next to procfs.
Ideally we'd just look at lsfd and lsof and move them away from any type
assertions. I have asked them to do that for pidfds a while ago and they
have merged a patch to that effect.
[1]: cfd86ef7e8e7 ("anon_inode: use a proper mode internally")
[2]: 1e7ab6f67824 ("anon_inode: rework assertions")