(cc Thomas for bug reporting on kvm-unit-test...)

On Tue, Jun 24, 2025 at 04:57:21PM +0800, Zhao Liu wrote:
> Date: Tue, 24 Jun 2025 16:57:21 +0800
> From: Zhao Liu <zhao1....@intel.com>
> Subject: Re: [Regression] Re: [PULL 35/35] qom: reverse order of
>  instance_post_init calls
> 
> On Mon, Jun 23, 2025 at 09:56:14AM -0700, Dongli Zhang wrote:
> > Date: Mon, 23 Jun 2025 09:56:14 -0700
> > From: Dongli Zhang <dongli.zh...@oracle.com>
> > Subject: [Regression] Re: [PULL 35/35] qom: reverse order of
> >  instance_post_init calls
> > 
> > This commit may broken the "vendor=" configuration.
> > 
> > For instance, the hypervisor CPU vendor is AMD.
> > 
> > I am going to use "-cpu Skylake-Server,vendor=GenuineIntel".
> > 
> > 
> > Because of the commit, the vendor is still AMD.
> > 
> > [root@vm ~]# cpuid -1 -l 0x0
> > CPU:
> >    vendor_id = "AuthenticAMD"
> > 
> > 
> > If I revert this patch, the vendor because the expected Intel.
> > 
> > [root@vm ~]# cpuid -1 -l 0x0
> > CPU:
> >    vendor_id = "GenuineIntel"
> > 
> > 
> > Thank you very much!
> 
> Thank you Dongli!
> 
> (+Like)
> 
> While testing my cache model series, I also noticed the similar behavior
> for KVM. Additionally, Like Xu reported to me that this commit caused
> a failure in a KVM unit test case. Your report helped me connect these
> two issues I met (though due to my environment issues, I haven't
> confirmed yet).

Ok, now I can confirm this commit cause KUT failure:
 * On AMD platform, the "msr.flat" case fails since this case requires
   vendor=GenuineIntel (tested by Like).
 * On Intel platform, the "syscall.flat" case fails because it requires
   vendor=AuthenticAMD (tested by myself).

> The "vendor" property from cli is registered as the global property in
> x86_cpu_parse_featurestr(), and is applied to x86 CPUs in
> device_post_init().
> 
> With this commit, now KVM will override the "vendor" in
> host_cpu_instance_init() (called in x86_cpu_post_initfn()) after
> device_post_init(), regardless the previous global "vendor" property.

This is the root cause for the above failure.

> Back to this commit, I think current order of post_init  makes sense.
> Instead, the place of host_cpu_instance_init() doesn't seem quite
> right. So, I think this commit might have exposed some drawbacks in the
> previous x86 CPU initialization order:
> 
> f5cc5a5c1686 ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")
> 5b8978d80426 ("i386: do not call cpudef-only models functions for max, host, 
> base")

To fix this issue, we need to initialize "vendor" property in the initfn
of max/host/named CPUs instead of current post_initfn.

This will need to split the cpu_instance_init() of x86 kvm (and maybe hvf/tcg)
into 2 hooks:
 * AccelCPUClass.cpu_instance_init() - called in x86 CPUs' initfn.
 * AccelCPUClass.cpu_instance_post_init() - called in x86 CPUs'
   post_initfn.

I just did a POC and this solution works in principle.

But there are very many more details here, including considering more
properties/considering hvf and tcg and so on. It still need to take more
time for me figuring everything out. Once that's done, I'll post a series
to fix this issue.

Thanks,
Zhao


Reply via email to