On Wed, Sep 24, 2025 at 3:16 AM <[email protected]> wrote: > > Michał Cłapiński wrote: > > On Fri, Aug 29, 2025 at 9:57 AM Mike Rapoport <[email protected]> wrote: > > > > > > Hi Ira, > > > > > > On Thu, Aug 28, 2025 at 07:47:31PM -0500, Ira Weiny wrote: > > > > + Michal > > > > > > > > Mike Rapoport wrote: > > > > > From: "Mike Rapoport (Microsoft)" <[email protected]> > > > > > > > > > > There are use cases, for example virtual machine hosts, that create > > > > > "persistent" memory regions using memmap= option on x86 or dummy > > > > > pmem-region device tree nodes on DT based systems. > > > > > > > > > > Both these options are inflexible because they create static regions > > > > > and > > > > > the layout of the "persistent" memory cannot be adjusted without > > > > > reboot > > > > > and sometimes they even require firmware update. > > > > > > > > > > Add a ramdax driver that allows creation of DIMM devices on top of > > > > > E820_TYPE_PRAM regions and devicetree pmem-region nodes. > > > > > > > > While I recognize this driver and the e820 driver are mutually > > > > exclusive[1][2]. I do wonder if the use cases are the same? > > > > > > They are mutually exclusive in the sense that they cannot be loaded > > > together so I had this in Kconfig in RFC posting > > > > > > config RAMDAX > > > tristate "Support persistent memory interfaces on RAM carveouts" > > > depends on OF || (X86 && X86_PMEM_LEGACY=n) > > > > > > (somehow my rebase lost Makefile and Kconfig changes :( ) > > > > > > As Pasha said in the other thread [1] the use-cases are different. My goal > > > is to achieve flexibility in managing carved out "PMEM" regions and > > > Michal's patches aim to optimize boot time by autoconfiguring multiple > > > PMEM > > > regions in the kernel without upcalls to ndctl. > > > > > > > From a high level I don't like the idea of adding kernel parameters. So > > > > if this could solve Michal's problem I'm inclined to go this direction. > > > > > > I think it could help with optimizing the reboot times. On the first boot > > > the PMEM is partitioned using ndctl and then the partitioning remains > > > there > > > so that on subsequent reboots kernel recreates dax devices without upcalls > > > to userspace. > > > > Using this patch, if I want to divide 500GB of memory into 1GB chunks, > > the last 128kB of every chunk would be taken by the label, right? > > > > My patch disables labels, so we can divide the memory into 1GB chunks > > without any losses and they all remain aligned to the 1GB boundary. I > > think this is necessary for vmemmap dax optimization. > > As Mike says you would lose 128K at the end, but that indeed becomes > losing that 1GB given alignment constraints. > > However, I think that could be solved by just separately vmalloc'ing the > label space for this. Then instead of kernel parameters to sub-divide a > region, you just have an initramfs script to do the same. > > Does that meet your needs?
Sorry, I'm having trouble imagining this. If I wanted 500 1GB chunks, I would request a region of 500GB+space for the label? Or is that a label and info-blocks? Then on each boot the kernel would check if there is an actual label/info-blocks in that space and if yes, it would recreate my devices (including the fsdax/devdax type)? One of the requirements for live update is that the kexec reboot has to be fast. My solution introduced a delay of tens of milliseconds since the actual device creation is asynchronous. Manually dividing a region into thousands of devices from userspace would be very slow but I would have to do that only on the first boot, right?

