On Tue, Nov 11, 2014 at 12:45:05PM +0530, Aravinda Prasad wrote: > > > On Tuesday 11 November 2014 08:54 AM, David Gibson wrote: > > On Wed, Nov 05, 2014 at 12:42:03PM +0530, Aravinda Prasad wrote: > >> This series of patches add support for fwnmi in powerKVM guests. > >> > >> Currently upon machine check exception, if the address in > >> error belongs to guest then KVM invokes guest's NMI interrupt > >> vector 0x200. > >> > >> This patch series adds functionality where the guest's 0x200 > >> interrupt vector is patched such that QEMU gets control. QEMU > >> then builds error log and reports the error to OS registered > >> machine check handlers through RTAS space. > >> > >> Apart from this, the patch series also takes care of synchronization > >> when multiple processors encounter machine check at or about the > >> same time. > >> > >> The patch set was tested by simulating a machine check error in > >> the guest. > >> > >> Changes in v3: > >> - Incorporated review comments > >> - Byte codes in patch 4/4 are now moved to > >> pc-bios/spapr-rtas/spapr-rtas.S as instructions. > >> - Defined the RTAS blob in-memory layout. > >> - FIX: save and restore cr register in the trampoline > >> > >> Changes in v2: > >> - Re-based to github.com/agraf/qemu.git branch: ppc-next > >> - Merged patches 4 and 5. > >> - Incorporated other review comments > > > > So, this may not still be possible depending on whether the KVM side > > of this is already merged, but it occurs to me that there's a simpler > > way. > > The KVM part is already merged. Commit ID: 74845bc
Ok, that makes life harder, though I guess without the qemu code
merged, no-one would be using yet, so it's not impossible to change still.
> > Rather than mucking about with having to update the hypervisor on the
> > RTAS location, they have qemu copy the code out of RTAS, patch it and
> > copy it back into the vector, you could instead do this:
>
> Though this is possible, I have coupe of comments below
>
> >
> > 1. Make KVM instead of immediately delivering a 0x200 for a guest
> > machine check, cause a special exit to qemu.
> >
> > 2. Have the register-nmi RTAS call store the guest side MC handler
> > address in the spapr structure, but perform no actual guest code
> > patching.
> >
> > 3. Allocate the error log buffer independently from the RTAS blob,
> > so qemu always knows where it is.
>
> As per PAPR, the error log buffer should be part of RTAS blob and the
> guest kernel explicitly checks if error log is inside RTAS blob.
> This requires qemu to know the updated RTAS location by the OS which is
> handled in patch 2/4.
Ugh, ok. That's a pretty stupid interface requirement, even by PAPR
standards, but I guess we're stuck with it.
> > 4. When qemu gets the MC exit condition, instead of going via a
> > patched 0x200 vector, just directly set the guest register state and
> > jump straight into the guest side MC handler.
>
> PAPR mentions:
>
> "R1–7.3.14–8: Once the OS has registered for NMI notification, the
> platform firmware must intercept all System Reset Interrupts on all of
> the OS’s processors."
>
> So do we need to go via 0x200?
I don't see why. The hypervisor is already intercepting system resets
and machine checks because it's a hypervisor, and from the PAPR
guest's point of view, all it cares about is that you enter its
registered handler with the expected information available.
I don't see that the guest cares whether you bounce via a vector in
guest space or directly enter the guest supplied handler using
hypervisor magic. Patching the guest's vector actually seems a pretty
awful hack that would only be necessary to work around limitations in
the virtualization capabilities which I don't think we have as of POWER8.
Btw, isn't a "System Reset Interrupt" vector 0x100, not vector 0x200?
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
pgpmYcUEXP0bA.pgp
Description: PGP signature
