On 4/21/26 01:33, Michael S. Tsirkin wrote: > On Mon, Apr 20, 2026 at 08:20:57PM +0200, David Hildenbrand (Arm) wrote: >> On 4/20/26 14:51, Michael S. Tsirkin wrote: >>> >> >> Hi! >> >>> >>> v2 - this is an attempt to address David Hildenbrand's comments: >>> overloading GFP and using page->private, support for >>> balloon deflate. >>> >>> I hope this one is acceptable, API wise. >>> >>> I also went ahead and implemented an alternative approach >>> that David suggested: >>> using GFP_ZERO to zero userspace pages. >>> The issue is simple: on some architectures, one has to know the >>> userspace fault address in order to flush the cache. >>> >>> So, I had to propagate the fault address everywhere. >> >> As I said, that might not be necessary. vma_alloc_folio() is the >> interface we mostly care about in that regard. >> > > I'm not sure I follow what "might not be necessary". We need a fault > address so zeroing can be effective wrt cache. Since you asked that it's > done deep in post alloc hook, the address has to propagate all over mm.
Let's look at who ends up using user_alloc_needs_zeroing() or folio_zero_user() 3 folio_zero_user() users are hugetlb that might get pages from another allocator. In particular, mm/memfd.c even passes 0 as it doesn't even have an address. I don't think we particularly care about speeding up hugetlb zeroing at this point when we already don't even care about optimizing for user_alloc_needs_zeroing(). But it could be reworked later to optimize zeroing in a similar way when actually allocating a folio from the buddy. Now, for callers we care more about: * vma_alloc_anon_folio_pmd() calls vma_alloc_folio()+user_alloc_needs_zeroing()+folio_zero_user() * alloc_anon_folio() calls vma_alloc_folio()+user_alloc_needs_zeroing()+folio_zero_user() * vma_alloc_zeroed_movable_folio() calls vma_alloc_zeroed_movable_folio()+user_alloc_needs_zeroing()+ clear_user_highpage(). Other vma_alloc_folio() users neither specify __GFP_ZERO not use folio_zero_user(), as they will be overwriting the data either way. Like KSM when unsharing, for example. I am saying we move "user_alloc_needs_zeroing()+folio_zero_user()" into vma_alloc_folio(), by teaching vma_alloc_folio() to respect __GFP_ZERO. user_alloc_needs_zeroing() will effectively go away as the buddy will just handle that. All of the above is what you do on gfp_zero branch already, so I think you understood what I meant regarding this interface. Anybody in the tree that would be using another folio_alloc() (or page allocator) interface with __GFP_ZERO *would already be broken on other architectures* where we would actually require folio_zero_user(), as they would already not be using folio_zero_user(). But we don't really need the user address in many cases, like when allocating a folio for the pagecache where we don't even have an address. > > >>> A lot of churn, and my concern is, if we miss even one >>> place, silent, subtle data corruption will result and only >>> on some arches (x86 will be fine). >> >> Which would *already* be the case of you use folio_alloc(GFP_ZERO) >> instead of magical vma_alloc_folio() + folio_zero_user(). >> >> I don't really see how vma_alloc_folio_hints() -- that also consumes the >> address -- is any better in that regard? > > By itself, it is not. But the issue is propagating the address from > there all over mm. If we miss even one place - we get a subtle cache > corruption on non x86. Yes. Like someone not using folio_zero_user() as of today. [...] > >>> >>> Still, you can view that approach here: >>> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git gfp_zero >>> >>> David, if you still feel I should switch to that approach, >>> let me know. Personally, I'd rather keep that as a separate >>> project from this optimization. >> I'd prefer if we extend vma_alloc_folio() to just handle GFP_ZERO for us. > > Pls take a look at that tree then. What do you think of that approach? > Better? I primarily wonder whether we can limit the impact in patch #1 by focusing on the vma_alloc_folio() path only. For example, I don't think converting all folio_alloc_mpol() users to consume USER_ADDR_NONE at this point is really reasonable. (a) Focus on vma_alloc_folio(), where we already have an address. (b) To implement vma_alloc_folio() that way, we might need some internal interfaces that consume an address. For example, instead of changing all callers of post_alloc_hook() to pass USER_ADDR_NONE, can we make post_alloc_hook() a simple wrapper around a variant that consumes an address. So isn't there a way we can just keep the changes mostly to mm/page_alloc.c? > > I also note that we need a flag for free in order to implement > balloon deflate as you asked. Here, I reused the hints. Yes, but on the allocation path we do have a flag: __GFP_ZERO. -- Cheers, David

