On 6/23/26 17:34, Amir Goldstein wrote:
> On Tue, Jun 23, 2026 at 4:55 PM Pavel Tikhomirov
> <[email protected]> wrote:
>>
>>
>>
>> On 6/23/26 15:48, Johannes Weiner wrote:
>>> On Tue, Jun 23, 2026 at 01:14:48PM +0200, Pavel Tikhomirov wrote:
>>>> The cachestat() syscall reads page cache statistics straight from the
>>>> file's f_mapping. Stackable filesystems such as overlayfs keep the data
>>>> pages in an underlying inode's mapping rather than in the overlay
>>>> inode's, so cachestat() reports all zeroes for them.
>>>>
>>>> Add a ->cachestat() file operation and route the syscall through a new
>>>> vfs_cachestat() helper that calls it when present, falling back to
>>>> file's f_mapping otherwise. This lets stackable filesystems forward the
>>>> query to the file that actually owns the page cache. No behaviour change
>>>> for regular files.
>>>>
>>>> Signed-off-by: Pavel Tikhomirov <[email protected]>
>>>> ---
>>>> Note: Memset change might be a bit tricky, I moved it to no
>>>> ->cachestat() path to avoid multiple memset on nested overlayfs, that
>>>> means that ->cachestat() is expected to be able to handle unitialized
>>>> cs.
>>>> ---
>>>>  include/linux/fs.h | 10 ++++++++++
>>>>  mm/filemap.c       | 43 +++++++++++++++++++++++++++++++++++--------
>>>>  2 files changed, 45 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>>>> index 6da44573ce450..966b6564707e4 100644
>>>> --- a/include/linux/fs.h
>>>> +++ b/include/linux/fs.h
>>>> @@ -53,6 +53,8 @@
>>>>
>>>>  struct bdi_writeback;
>>>>  struct bio;
>>>> +struct cachestat_range;
>>>> +struct cachestat;
>>>>  struct io_comp_batch;
>>>>  struct fiemap_extent_info;
>>>>  struct kiocb;
>>>> @@ -1963,6 +1965,8 @@ struct file_operations {
>>>>                                 struct file *file_out, loff_t pos_out,
>>>>                                 loff_t len, unsigned int remap_flags);
>>>>      int (*fadvise)(struct file *, loff_t, loff_t, int);
>>>> +    int (*cachestat)(struct file *file, struct cachestat_range *csr,
>>>> +                     struct cachestat *cs);
>>>
>>> I suppose you can't just have it return the real file because of the
>>> with_ovl_creds() scope you need during access?
>>
>> Yes, AFAIU in overlay when we use realfile we should always use it
>> with_ovl_creds(), even though I don't think there is anything cred related
>> in filemap_cachestat(), I still think we should follow the common pattern
>> other overlay helpers use (similar to ovl_fadvise() and ovl_flush()).
>>
>> note: Actually some places get ovl_real_file() and use it without
>> with_ovl_creds(), e.g.: ovl_read_iter, ovl_write_iter, ovl_splice_read,
>> ovl_splice_write. But those look more of an exception than the general
>> rule. All other instances use with_ovl_creds().
> 
> Use with_ovl_creds() is a good practice to keep the mental security model,
> but it is useless if the security check (can_do_cachestat) is not in the
> vfs helper (vfs_cachestat), so please move it there.

Totally, I feel stupid now that I missed it originally, will move it in v2.

> 
> Also it kind of makes more sense to check (flags != 0) in sys_cachestats
> before checking permissions.
> 
>>
>> Also there are simingly no other file_operations which return "realfile"
>> for further processing, mostly the operation from fsops simply replaces
>> general operation with its own logic completely.
>>
>> Thanks for your review!
>>
>> ps: Hope overlay maintainers will correctly if I'm getting this wrong.
>>
> 
> I don't think this is wrong per-se, except for can_do_cachestat().
> 
> Just be aware that the real file could change from one cachestat
> call to the next (i.e. due to copy up).

I guess the copy-up between two cachestat calls is an equivalent of cache pages
being dropped/reclaimed between the calls, right? (I assume old pages will not
be used anymore and will be reclaimed in due time.) So userspace should not be
overly confused by such a cache usage change.

> 
> Thanks,
> Amir.

-- 
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.


Reply via email to