On Tue 17-02-26 14:32:25, T.J. Mercier wrote:
> On Tue, Feb 17, 2026 at 1:25 PM Amir Goldstein <[email protected]> wrote:
> > > > Are you expecting to get IN_IGNORED|IN_DELETE_SELF on an entry
> > > > while watching the parent? Because this is not how the API works.
> > >
> > > No, only on the file being watched. The parent should only get
> > > IN_DELETE, but I read your feedback below and I'm fine with removing
> > > that part and just sending the DELETE_SELF and IN_IGNORED events.
> > >
> >
> > So if the file was being watched, some application needed to call
> > inotify_add_watch() with the user path to the cgroupfs inode
> > and inotify watch keeps a live reference to this vfs inode.
> >
> > When the cgroup is being destroyed something needs to drop
> > this vfs inode and call __destroy_inode() -> fsnotify_inode_delete()
> > which should remove the inotify watch and result in IN_IGNORED.
>
> Nothing like this exists before this patch.
>
> > IN_DELETE_SELF is a different story, because the inode does not
> > have zero i_nlink.
> >
> > I did not try to follow the code path of cgroupfs destroy when an
> > inotify watch on a cgroup file exists, but this is what I expect.
> > Please explain - what am I missing?
>
> Yes that's the problem here. The inode isn't dropped unless the watch
> is removed, and the watch isn't removed because kernfs doesn't go
> through vfs to notify about file removal. There is nothing to trigger
> dropping the watch and the associated inode reference except this
> patch calling into fsnotify_inoderemove which both sends
> IN_DELETE_SELF and calls __fsnotify_inode_delete for the IN_IGNORED
> and inode cleanup.
>
> Without this, the watch and inode persist after file deletion until
> the process exits and file descriptors are cleaned up, or until
> inotify_rm_watch gets called manually.
Hrm. I was scratching my head how it is possible VFS isn't involved for a
while. So let me share what I found:
Normally fsnotify_inoderemove() is called from dentry_unlink_inode() which
is called from d_delete() (name unlinked) and __dentry_kill() (last dput()).
Now it is true that kernfs doesn't bother with pruning child dentries from
its rmdir implementation. It just marks all corresponding kernfs_nodes
(inodes) as dead and that's it so d_delete() isn't called. But vfs_rmdir()
makes up for this by calling shrink_dcache_parent() on the removed
directory so the child dentries end up going through __dentry_kill(). *But*
kernfs also doesn't bother to set i_nlink for these child dentries to 0
when marking them as dead and so __dentry_kill() doesn't call
fsnotify_inoderemove(). So at this point it seems more like a kernfs bug
that children inodes aren't properly cleaned up by setting i_nlink to 0 and
I don't think we should paper over this by calling fsnotify_inoderemove()
explicitely.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR