On Mon, Sep 21, 2020 at 02:04:23PM +0100, Andrew Cooper wrote: > MFENCE is overly heavyweight for SMP semantics on WB memory, because it also > orders weaker cached writes, and flushes the WC buffers. > > This technique was used as an optimisation in Java[1], and later adopted by > Linux[2] where it was measured to have a 60% performance improvement in VirtIO > benchmarks. > > The stack is used because it is hot in the L1 cache, and a -4 offset is used > to avoid creating a false data dependency on live data. (For 64bit userspace, > the offset needs to be under the red zone to avoid false dependences). > > Fix up the 32 bit definitions in HVMLoader and libxc to avoid a false data > dependency. > > [1] https://shipilev.net/blog/2014/on-the-fence-with-dependencies/ > [2] https://git.kernel.org/torvalds/c/450cbdd0125cfa5d7bbf9e2a6b6961cc48d29730 > > Signed-off-by: Andrew Cooper <[email protected]> > --- > CC: Jan Beulich <[email protected]> > CC: Roger Pau Monné <[email protected]> > CC: Wei Liu <[email protected]> > CC: Ian Jackson <[email protected]> > --- > tools/firmware/hvmloader/util.h | 2 +- > tools/libs/ctrl/include/xenctrl.h | 4 ++--
If this is ever needed: Acked-by: Wei Liu <[email protected]> I have not followed the discussion in the thread closely, but the change looks to be following what Linux does, so I'm certainly fine with this.
