On 02.03.2026 12:01, Marek Marczykowski-Górecki wrote:
> On Mon, Mar 02, 2026 at 09:40:29AM +0100, David Hildenbrand (Arm) wrote:
>> On 3/2/26 07:36, Jürgen Groß wrote:
>>> On 01.03.26 16:04, Marek Marczykowski-Górecki wrote:
>>>> Hi,
>>>>
>>>> Some time ago I made a change to disable scrubbing pages that are
>>>> ballooned out during system boot. I'll paste the whole commit message as
>>>> it's relevant here:
>>>>
>>>>      197ecb3802c0 xen/balloon: add runtime control for scrubbing
>>>> ballooned out pages
>>>>
>>>>      Scrubbing pages on initial balloon down can take some time,
>>>> especially
>>>>      in nested virtualization case (nested EPT is slow). When HVM/PVH
>>>> guest is
>>>>      started with memory= significantly lower than maxmem=, all the extra
>>>>      pages will be scrubbed before returning to Xen. But since most of
>>>> them
>>>>      weren't used at all at that point, Xen needs to populate them first
>>>>      (from populate-on-demand pool). In nested virt case (Xen inside KVM)
>>>>      this slows down the guest boot by 15-30s with just 1.5GB needed
>>>> to be
>>>>      returned to Xen.
>>>>           Add runtime parameter to enable/disable it, to allow
>>>> initially disabling
>>>>      scrubbing, then enable it back during boot (for example in
>>>> initramfs).
>>>>      Such usage relies on assumption that a) most pages ballooned out
>>>> during
>>>>      initial boot weren't used at all, and b) even if they were, very few
>>>>      secrets are in the guest at that time (before any serious userspace
>>>>      kicks in).
>>>>      Convert CONFIG_XEN_SCRUB_PAGES to CONFIG_XEN_SCRUB_PAGES_DEFAULT
>>>> (also
>>>>      enabled by default), controlling default value for the new runtime
>>>>      switch.
>>>>
>>>> Now, I face the same issue with init_on_free/init_on_alloc (not sure
>>>> which one applies here, probably the latter one), which several
>>>> distributions enable by default. The result is (see timestamps):
>>>>
>>>>      [2026-02-24 01:12:55] [    7.485151] xen:balloon: Waiting for
>>>> initial ballooning down having finished.
>>>>      [2026-02-24 01:14:14] [   86.581510] xen:balloon: Initial
>>>> ballooning down finished.
>>>>
>>>> But here the situation is a bit more complicated:
>>>> init_on_free/init_on_alloc applies to any pages, not just those for
>>>> balloon driver. I see two approaches to solve the issue:
>>>> 1. Similar to xen_scrub_pages=, add a runtime switch for
>>>>     init_on_free/init_on_alloc, then force them off during boot, and
>>>>     re-enable early in initramfs.
>>>> 2. Somehow adjust balloon driver to bypass init_on_alloc when ballooning
>>>>     a page out.
>>>>
>>>> The first approach is likely easier to implement, but also has some
>>>> drawbacks: it may result in some kernel structures that are allocated
>>>> early to remain with garbage data in uninitialized places. While it may
>>>> not matter during early boot, such structures may survive for quite some
>>>> time, and maybe attacker can use them later on to exploit some other
>>>> bug. This wasn't really a concern with xen_scrub_pages, as those pages
>>>> were immediately ballooned out.
>>>>
>>>> The second approach sounds architecturally better, and maybe
>>>> init_on_alloc could be always bypassed during balloon out? The balloon
>>>> driver can scrub the page on its own already (which is enabled by
>>>> default). That of course assumes the issue is only about init_on_alloc,
>>>> not init_on_free (or both) - which I haven't really confirmed yet...
>>>> If going this way, I see the balloon driver does basically
>>>> alloc_page(GFP_BALLOON), where GFP_BALLOON is:
>>>>
>>>>      /* When ballooning out (allocating memory to return to Xen) we
>>>> don't really
>>>>         want the kernel to try too hard since that can trigger the oom
>>>> killer. */
>>>>      #define GFP_BALLOON \
>>>>          (GFP_HIGHUSER | __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC)
>>>>
>>>> Would that be about adding some new flag here? Or maybe there is already
>>>> one for this purpose?
>>>
>>> There doesn't seem to be a flag for that.
>>>
>>> But I think adding a new flag __GFP_NO_INIT and testing that in
>>> want_init_on_alloc() _before_ checking CONFIG_INIT_ON_ALLOC_DEFAULT_ON
>>> would be a sensible approach.
>>
>> People argued against such flags in the past, because it will simply get
>> abused by arbitrary drivers that want to be smart.
> 
> Could it be named differently to discourage such usage? Maybe
> __GFP_BALLOON_OUT ?
> 
>> Whatever leaves the buddy shall be zeroed out. If there is a
>> double-zeroing happen, the latter could get optimized out by checking
>> something like user_alloc_needs_zeroing().
>>
>> See mm/huge_memory.c:vma_alloc_anon_folio_pmd() as an example where we
>> avoid double-zeroing.
> 
> It isn't just reducing double-zeroing to single zeroing. It's about
> avoiding zeroing such pages at all. If a domU is started with
> populate-on-demand, many (sometimes most) of its pages are populated in
> EPT.

ITYM "unpopulated in EPT"?

Jan

> The idea of PoD is to start guest with high static memory size, but
> low actual allocation and fake it until balloon driver kicks in and make
> the domU really not use more pages than it has. When balloon driver try
> to return those pages to the hypervisor, normally it would just take
> unallocated page one by one and made Linux not use them. But if _any_
> zeroing is happening, each page first needs to be mapped to the guest by
> the hypervisor (one trip through EPT), just to be removed from them a
> moment later...
> 
>>>> Any opinions?
>>>
>>> You are aware of the "init_on_alloc" boot parameter? So if this is fine
>>> for you, you could just use approach 1 above without any kernel patches
>>> needed.
>>
>> I don't think init_on_alloc can be enabled after boot. IIUC, 1) would
>> require a runtime switch.
> 
> Indeed.
> 


Reply via email to