Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

Jonathan Cameron via Thu, 01 Feb 2024 05:05:45 -0800

On Tue, 30 Jan 2024 13:50:18 +0530
Sajjan Rao <sajj...@gmail.com> wrote:


> Hi Jonathan,
> 
> The QEMU command line in the original email has been corrected back in
> August 2023 based on the subsequent responses.
> 
> My current QEMU command line reads like below. As you can see I am not
> assigning numa to the CXL memory object.
> 
> qemu-system-x86_64 \
>  -hda /var/lib/libvirt/images/CXL-Test_1.qcow2 \
>  -machine type=q35,nvdimm=on,cxl=on \
>  -accel tcg,thread=single \
>  -m 4G \
>  -smp cpus=4 \
>  -object memory-backend-ram,size=4G,id=m0 \
>  -object memory-backend-ram,size=256M,id=cxl-mem1 \
>  -object memory-backend-ram,size=256M,id=cxl-mem2 \
>  -numa node,memdev=m0,cpus=0-3,nodeid=0 \
>  -netdev 
> user,id=net0,net=192.168.0.0/24,dhcpstart=192.168.0.9,hostfwd=tcp::2222-:22
> \
>  -device virtio-net-pci,netdev=net0 \
>  -device pxb-cxl,bus_nr=2,bus=pcie.0,id=cxl.1,hdm_for_passthrough=true \
>  -device cxl-rp,port=0,bus=cxl.1,id=cxl_rp_port0,chassis=0,slot=2 \
>  -device cxl-upstream,bus=cxl_rp_port0,id=us0,addr=0.0,multifunction=on, \
>  -device cxl-switch-mailbox-cci,bus=cxl_rp_port0,addr=0.2,target=us0 \
>  -device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \
>  -device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=8 \
>  -device cxl-type3,bus=swport0,volatile-memdev=cxl-mem1,id=cxl-vmem1 \
>  -device cxl-type3,bus=swport1,volatile-memdev=cxl-mem2,id=cxl-vmem2 \
>  -M 
> cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=512M,cxl-fmw.0.interleave-granularity=2k
> \
>  -D /tmp/qemu.log \
>  -nographic
> 
> Until I moved to Qemu version 8.2 recently, I was able to create
> regions and run linux native commands on CXL memory using
> #numactl --membind <cxl NUMA#> top
> 
> You had advised me to turn off KVM and use tcg since the membind
> command will run code out of CXL memory which is not supported. By
> disabling KVM the membind command worked fine.
> However with Qemu version 8.2 the same membind command results in a
> kernel hard crash.

Just to check, kernel crashes, or qemu crashes?

I've probably replicated and it seems to be qemu that is going down with a TCG 
issue.

Bisection underway.

This may take a while.
Our use of TCG is unusual with what QEMU thinks of as io memory is unusual
so we tend to run into corners no one else cares about.

Richard, +CC on off chance you can guess what has happened and save
me a bisection run..

x86 machine pretty much as described above

root@localhost:~/devmem2# numactl --membind=1 touch a
qemu: fatal: cpu_io_recompile: could not find TB for pc=(nil)
RAX=00d6b969c0000000 RBX=ff294696c0044440 RCX=0000000000000028 
RDX=0000000000000000
RSI=0000000000000275 RDI=0000000000000000 RBP=0000000490000000 
RSP=ff4f8767805d3d20
R8 =0000000000000000 R9 =ff4f8767805d3cdc R10=0000000000000000 
R11=0000000000000040
R12=ff294696c0044980 R13=0000000000000000 R14=ff294696c51d0000 
R15=0000000000000000
RIP=ffffffff9d270fed RFL=00000007 [-----PC] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 00000000 00000000
CS =0010 0000000000000000 ffffffff 00af9b00 DPL=0 CS64 [-RA]
SS =0018 0000000000000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
DS =0000 0000000000000000 00000000 00000000
FS =0000 0000000000000000 00000000 00000000
GS =0000 ff2946973bc00000 00000000 00000000
LDT=0000 0000000000000000 00000000 00008200 DPL=0 LDT
TR =0040 fffffe37d29e7000 00004087 00008900 DPL=0 TSS64-avl
GDT=     fffffe37d29e5000 0000007f
IDT=     fffffe0000000000 00000fff
CR0=80050033 CR2=00007f2972bdc450 CR3=0000000490000000 CR4=00751ef0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 
DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=00d6b969c0000000 CCD=0000000490000000 CCO=ADDQ
EFER=0000000000000d01
FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
YMM00=0000000000000000 0000000000000000 3a3a3a3a3a3a3a3a 3a3a3a3a3a3a3a3a
YMM01=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM02=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM03=0000000000000000 0000000000000000 00ff0000000000ff 0000000000000000
YMM04=0000000000000000 0000000000000000 5f796c7261655f63 62696c5f5f004554
YMM05=0000000000000000 0000000000000000 0000000000000000 0000000000000060
YMM06=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM07=0000000000000000 0000000000000000 0909090909090909 0909090909090909
YMM08=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM09=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM10=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM11=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM12=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM13=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM14=0000000000000000 0000000000000000 0000000000000000 0000000000000000
YMM15=0000000000000000 0000000000000000 0000000000000000 0000000000000000

Jonathan



> I wanted to check if this is a known issue with 8.2 and is there a way
> around it.
> 
> Thanks,
> Sajjan
> 
> On Fri, Jan 26, 2024 at 10:42 PM Jonathan Cameron
> <jonathan.came...@huawei.com> wrote:
> >
> > On Fri, 26 Jan 2024 10:43:43 -0500
> > Gregory Price <gregory.pr...@memverge.com> wrote:
> >  
> > > On Fri, Jan 26, 2024 at 12:39:26PM +0000, Jonathan Cameron wrote:  
> > > > On Thu, 25 Jan 2024 13:45:09 +0530
> > > > Sajjan Rao <sajj...@gmail.com> wrote:
> > > >  
> > > > > Looks like something changed in QEMU 8.2 that broke running code out
> > > > > of CXL memory with KVM disabled.
> > > > > I used "numactl --membind 2 ls" as suggested by Dimitrios earlier,
> > > > > this worked for me until I updated to the latest QEMU.
> > > > >
> > > > > Is this a known issue? Or am I missing something?  
> > > >
> > > > I'm confused on how the description below ever worked.
> > > > Assigning the underlying memdev=cxl-mem1 to a numa node isn't going
> > > > to correctly build the connections the CFMWS PA range.
> > > >  
> > >
> > > I've now seen 3-4 occasions where people have done this and run into
> > > trouble (for obvious reasons).  Is there anything we can do to disallow
> > > the double-registering of a single memdev to both a numa node and a cxl
> > > device?
> > >  
> > It would be novel for us to prevent people shooting themselves
> > in the foot ;) but I guess this should be fairly easy as the
> > numa node logic prevents the same one being used multiple times so can
> > copy how that is done.
> >
> > This should do the trick (very lightly tested).
> > It's end of day Friday here so a formal patch can wait for next week.
> >
> >
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index f29346fae7..d4194bb757 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -827,6 +827,11 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error 
> > **errp)
> >              error_setg(errp, "volatile memdev must have backing device");
> >              return false;
> >          }
> > +        if (host_memory_backend_is_mapped(ct3d->hostvmem)) {
> > +            error_setg(errp, "memory backend %s can't be used multiple 
> > times.",
> > +               
> > object_get_canonical_path_component(OBJECT(ct3d->hostvmem)));
> > +            return false;
> > +        }
> >          memory_region_set_nonvolatile(vmr, false);
> >          memory_region_set_enabled(vmr, true);
> >          host_memory_backend_set_mapped(ct3d->hostvmem, true);
> > @@ -850,6 +855,11 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error 
> > **errp)
> >              error_setg(errp, "persistent memdev must have backing device");
> >              return false;
> >          }
> > +        if (host_memory_backend_is_mapped(ct3d->hostpmem)) {
> > +            error_setg(errp, "memory backend %s can't be used multiple 
> > times.",
> > +               
> > object_get_canonical_path_component(OBJECT(ct3d->hostpmem)));
> > +            return false;
> > +        }
> >          memory_region_set_nonvolatile(pmr, true);
> >          memory_region_set_enabled(pmr, true);
> >          host_memory_backend_set_mapped(ct3d->hostpmem, true);
> > @@ -880,6 +890,11 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error 
> > **errp)
> >              error_setg(errp, "dynamic capacity must have backing device");
> >              return false;
> >          }
> > +        if (host_memory_backend_is_mapped(ct3d->dc.host_dc)) {
> > +            error_setg(errp, "memory backend %s can't be used multiple 
> > times.",
> > +               
> > object_get_canonical_path_component(OBJECT(ct3d->dc.host_dc)));
> > +            return false;
> > +        }
> >          /* FIXME: set dc as nonvolatile for now */
> >          memory_region_set_nonvolatile(dc_mr, true);
> >          memory_region_set_enabled(dc_mr, true);
> >
> >
> >
> >
> >  
> > > ~Gregory  
> >

Crash with CXL + TCG on 8.2: Was Re: qemu cxl memory expander shows numa_node -1

Reply via email to