On 4/21/26 01:33, Michael S. Tsirkin wrote:
> On Mon, Apr 20, 2026 at 08:20:57PM +0200, David Hildenbrand (Arm) wrote:
>> On 4/20/26 14:51, Michael S. Tsirkin wrote:
>>>
>>
>> Hi!
>>
>>>
>>> v2 - this is an attempt to address David Hildenbrand's comments:
>>> overloading GFP and using page->private, support for
>>> balloon deflate.
>>>
>>> I hope this one is acceptable, API wise.
>>>
>>> I also went ahead and implemented an alternative approach
>>> that David suggested:
>>> using GFP_ZERO to zero userspace pages.
>>> The issue is simple: on some architectures, one has to know the
>>> userspace fault address in order to flush the cache.
>>>
>>> So, I had to propagate the fault address everywhere.
>>
>> As I said, that might not be necessary. vma_alloc_folio() is the
>> interface we mostly care about in that regard.
>>
> 
> I'm not sure I follow what "might not be necessary". We need a fault
> address so zeroing can be effective wrt cache. Since you asked that it's
> done deep in post alloc hook, the address has to propagate all over mm.

Let's look at who ends up using user_alloc_needs_zeroing() or
folio_zero_user()

3 folio_zero_user() users are hugetlb that might get pages from another
allocator. In particular, mm/memfd.c even passes 0 as it doesn't even
have an address.

I don't think we particularly care about speeding up hugetlb zeroing at
this point when we already don't even care about optimizing for
user_alloc_needs_zeroing(). But it could be reworked later to optimize
zeroing in a similar way when actually allocating a folio from the buddy.

Now, for callers we care more about:

* vma_alloc_anon_folio_pmd() calls
  vma_alloc_folio()+user_alloc_needs_zeroing()+folio_zero_user()
* alloc_anon_folio() calls
  vma_alloc_folio()+user_alloc_needs_zeroing()+folio_zero_user()
* vma_alloc_zeroed_movable_folio() calls
  vma_alloc_zeroed_movable_folio()+user_alloc_needs_zeroing()+
  clear_user_highpage().


Other vma_alloc_folio() users neither specify __GFP_ZERO not use
folio_zero_user(), as they will be overwriting the data either way. Like
KSM when unsharing, for example.

I am saying we move "user_alloc_needs_zeroing()+folio_zero_user()" into
vma_alloc_folio(), by teaching vma_alloc_folio() to respect __GFP_ZERO.

user_alloc_needs_zeroing() will effectively go away as the buddy will
just handle that.


All of the above is what you do on gfp_zero branch already, so I think
you understood what I meant regarding this interface.


Anybody in the tree that would be using another folio_alloc() (or page
allocator) interface with __GFP_ZERO *would already be broken on other
architectures* where we would actually require folio_zero_user(), as
they would already not be using folio_zero_user().

But we don't really need the user address in many cases, like when
allocating a folio for the pagecache where we don't even have an address.

> 
> 
>>> A lot of churn, and my concern is, if we miss even one
>>> place, silent, subtle data corruption will result and only
>>> on some arches (x86 will be fine).
>>
>> Which would *already* be the case of you use folio_alloc(GFP_ZERO)
>> instead of magical vma_alloc_folio() + folio_zero_user().
>>
>> I don't really see how vma_alloc_folio_hints() -- that also consumes the
>> address -- is any better in that regard?
> 
> By itself, it is not. But the issue is propagating the address from
> there all over mm. If we miss even one place - we get a subtle cache
> corruption on non x86.

Yes. Like someone not using folio_zero_user() as of today.

[...]

> 
>>>
>>> Still, you can view that approach here:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git gfp_zero
>>>
>>> David, if you still feel I should switch to that approach,
>>> let me know. Personally, I'd rather keep that as a separate
>>> project from this optimization.
>> I'd prefer if we extend vma_alloc_folio() to just handle GFP_ZERO for us.
> 
> Pls take a look at that tree then. What do you think of that approach?
> Better? 

I primarily wonder whether we can limit the impact in patch #1 by
focusing on the vma_alloc_folio() path only.

For example, I don't think converting all folio_alloc_mpol() users to
consume USER_ADDR_NONE at this point is really reasonable.

(a) Focus on vma_alloc_folio(), where we already have an address.

(b) To implement vma_alloc_folio() that way, we might need some internal
    interfaces that consume an address.

For example, instead of changing all callers of post_alloc_hook() to
pass USER_ADDR_NONE, can we make post_alloc_hook() a simple wrapper
around a variant that consumes an address.

So isn't there a way we can just keep the changes mostly to mm/page_alloc.c?


> 
> I also note that we need a flag for free in order to implement
> balloon deflate as you asked. Here, I reused the hints.

Yes, but on the allocation path we do have a flag: __GFP_ZERO.

-- 
Cheers,

David

Reply via email to