On Wed, Sep 24, 2025 at 3:16 AM <[email protected]> wrote:
>
> Michał Cłapiński wrote:
> > On Fri, Aug 29, 2025 at 9:57 AM Mike Rapoport <[email protected]> wrote:
> > >
> > > Hi Ira,
> > >
> > > On Thu, Aug 28, 2025 at 07:47:31PM -0500, Ira Weiny wrote:
> > > > + Michal
> > > >
> > > > Mike Rapoport wrote:
> > > > > From: "Mike Rapoport (Microsoft)" <[email protected]>
> > > > >
> > > > > There are use cases, for example virtual machine hosts, that create
> > > > > "persistent" memory regions using memmap= option on x86 or dummy
> > > > > pmem-region device tree nodes on DT based systems.
> > > > >
> > > > > Both these options are inflexible because they create static regions 
> > > > > and
> > > > > the layout of the "persistent" memory cannot be adjusted without 
> > > > > reboot
> > > > > and sometimes they even require firmware update.
> > > > >
> > > > > Add a ramdax driver that allows creation of DIMM devices on top of
> > > > > E820_TYPE_PRAM regions and devicetree pmem-region nodes.
> > > >
> > > > While I recognize this driver and the e820 driver are mutually
> > > > exclusive[1][2].  I do wonder if the use cases are the same?
> > >
> > > They are mutually exclusive in the sense that they cannot be loaded
> > > together so I had this in Kconfig in RFC posting
> > >
> > > config RAMDAX
> > >         tristate "Support persistent memory interfaces on RAM carveouts"
> > >         depends on OF || (X86 && X86_PMEM_LEGACY=n)
> > >
> > > (somehow my rebase lost Makefile and Kconfig changes :( )
> > >
> > > As Pasha said in the other thread [1] the use-cases are different. My goal
> > > is to achieve flexibility in managing carved out "PMEM" regions and
> > > Michal's patches aim to optimize boot time by autoconfiguring multiple 
> > > PMEM
> > > regions in the kernel without upcalls to ndctl.
> > >
> > > > From a high level I don't like the idea of adding kernel parameters.  So
> > > > if this could solve Michal's problem I'm inclined to go this direction.
> > >
> > > I think it could help with optimizing the reboot times. On the first boot
> > > the PMEM is partitioned using ndctl and then the partitioning remains 
> > > there
> > > so that on subsequent reboots kernel recreates dax devices without upcalls
> > > to userspace.
> >
> > Using this patch, if I want to divide 500GB of memory into 1GB chunks,
> > the last 128kB of every chunk would be taken by the label, right?
> >
> > My patch disables labels, so we can divide the memory into 1GB chunks
> > without any losses and they all remain aligned to the 1GB boundary. I
> > think this is necessary for vmemmap dax optimization.
>
> As Mike says you would lose 128K at the end, but that indeed becomes
> losing that 1GB given alignment constraints.
>
> However, I think that could be solved by just separately vmalloc'ing the
> label space for this. Then instead of kernel parameters to sub-divide a
> region, you just have an initramfs script to do the same.
>
> Does that meet your needs?

Sorry, I'm having trouble imagining this.
If I wanted 500 1GB chunks, I would request a region of 500GB+space
for the label? Or is that a label and info-blocks?
Then on each boot the kernel would check if there is an actual
label/info-blocks in that space and if yes, it would recreate my
devices (including the fsdax/devdax type)?

One of the requirements for live update is that the kexec reboot has
to be fast. My solution introduced a delay of tens of milliseconds
since the actual device creation is asynchronous. Manually dividing a
region into thousands of devices from userspace would be very slow but
I would have to do that only on the first boot, right?

Reply via email to