On Wed, Jun 24, 2026 at 1:59 PM Pavel Tikhomirov <[email protected]> wrote: > > > > On 6/23/26 17:34, Amir Goldstein wrote: > > On Tue, Jun 23, 2026 at 4:55 PM Pavel Tikhomirov > > <[email protected]> wrote: > >> > >> > >> > >> On 6/23/26 15:48, Johannes Weiner wrote: > >>> On Tue, Jun 23, 2026 at 01:14:48PM +0200, Pavel Tikhomirov wrote: > >>>> The cachestat() syscall reads page cache statistics straight from the > >>>> file's f_mapping. Stackable filesystems such as overlayfs keep the data > >>>> pages in an underlying inode's mapping rather than in the overlay > >>>> inode's, so cachestat() reports all zeroes for them. > >>>> > >>>> Add a ->cachestat() file operation and route the syscall through a new > >>>> vfs_cachestat() helper that calls it when present, falling back to > >>>> file's f_mapping otherwise. This lets stackable filesystems forward the > >>>> query to the file that actually owns the page cache. No behaviour change > >>>> for regular files. > >>>> > >>>> Signed-off-by: Pavel Tikhomirov <[email protected]> > >>>> --- > >>>> Note: Memset change might be a bit tricky, I moved it to no > >>>> ->cachestat() path to avoid multiple memset on nested overlayfs, that > >>>> means that ->cachestat() is expected to be able to handle unitialized > >>>> cs. > >>>> --- > >>>> include/linux/fs.h | 10 ++++++++++ > >>>> mm/filemap.c | 43 +++++++++++++++++++++++++++++++++++-------- > >>>> 2 files changed, 45 insertions(+), 8 deletions(-) > >>>> > >>>> diff --git a/include/linux/fs.h b/include/linux/fs.h > >>>> index 6da44573ce450..966b6564707e4 100644 > >>>> --- a/include/linux/fs.h > >>>> +++ b/include/linux/fs.h > >>>> @@ -53,6 +53,8 @@ > >>>> > >>>> struct bdi_writeback; > >>>> struct bio; > >>>> +struct cachestat_range; > >>>> +struct cachestat; > >>>> struct io_comp_batch; > >>>> struct fiemap_extent_info; > >>>> struct kiocb; > >>>> @@ -1963,6 +1965,8 @@ struct file_operations { > >>>> struct file *file_out, loff_t pos_out, > >>>> loff_t len, unsigned int remap_flags); > >>>> int (*fadvise)(struct file *, loff_t, loff_t, int); > >>>> + int (*cachestat)(struct file *file, struct cachestat_range *csr, > >>>> + struct cachestat *cs); > >>> > >>> I suppose you can't just have it return the real file because of the > >>> with_ovl_creds() scope you need during access? > >> > >> Yes, AFAIU in overlay when we use realfile we should always use it > >> with_ovl_creds(), even though I don't think there is anything cred related > >> in filemap_cachestat(), I still think we should follow the common pattern > >> other overlay helpers use (similar to ovl_fadvise() and ovl_flush()). > >> > >> note: Actually some places get ovl_real_file() and use it without > >> with_ovl_creds(), e.g.: ovl_read_iter, ovl_write_iter, ovl_splice_read, > >> ovl_splice_write. But those look more of an exception than the general > >> rule. All other instances use with_ovl_creds(). > > > > Use with_ovl_creds() is a good practice to keep the mental security model, > > but it is useless if the security check (can_do_cachestat) is not in the > > vfs helper (vfs_cachestat), so please move it there. > > Totally, I feel stupid now that I missed it originally, will move it in v2. > > > > > Also it kind of makes more sense to check (flags != 0) in sys_cachestats > > before checking permissions. > > > >> > >> Also there are simingly no other file_operations which return "realfile" > >> for further processing, mostly the operation from fsops simply replaces > >> general operation with its own logic completely. > >> > >> Thanks for your review! > >> > >> ps: Hope overlay maintainers will correctly if I'm getting this wrong. > >> > > > > I don't think this is wrong per-se, except for can_do_cachestat(). > > > > Just be aware that the real file could change from one cachestat > > call to the next (i.e. due to copy up). > > I guess the copy-up between two cachestat calls is an equivalent of cache > pages > being dropped/reclaimed between the calls, right?
I guess so. > (I assume old pages will not be used anymore and will be reclaimed in due > time.) TBH as long as this overlayfs fd is kept open both real lower and real upper (post copy up) files are kept open, but the lower pages will not be accessed by overlayfs expect for the case ofMAP_SHARED mmap if this overlayfs file was mmaped before copy up, the lower inode pages will remain mapped to process address space and then cachestat will return stats which are disassociated from the pages accessed by mmap. This is documented in the `Non-standard behavior` section of overlayfs.rst and not a specific issue to cachestat. Just wanted to be aware of this. > So userspace should not be > overly confused by such a cache usage change. > Agree. Thanks, Amir.

