Hello,

On Wed, 12 Jun 2024, Paolo Bonzini wrote:

> I didn't do this because of RHEL9, I did it because it's silly that
> QEMU cannot use POPCNT and has to waste 2% of the L1 d-cache to
> compute the x86 parity flag (and POPCNT was introduced at the same
> time as SSE4.2).

I do not see where the 2% figure is coming from: even considering that
the 256-byte LUT may take an extra cache line due to misalignment, 320
bytes is still less than 1% of 32KB L1D size.

More importantly, the way this comment is phrased made me think that Qemu
eagerly computes PF. But the comment in target/i386/cpu.h is saying that
all flags are computed in an on-demand manner. Considering that software
pretty much never uses PF, why would the parity table be resident in L1D?
As far as I can see, the cost is rather a cache miss and perhaps a TLB miss
when PF is computed (mostly when EFLAGS are accessed all together on
context switches I think).

Is there something I'm not seeing?

Thanks.
Alexander

Reply via email to