On 16/05/17 10:54, Jan Beulich wrote: >>>> On 16.05.17 at 05:47, <ehem+deb...@m5p.com> wrote: >> On Mon, May 15, 2017 at 02:02:53AM -0600, Jan Beulich wrote: >>>>>> On 14.05.17 at 00:36, <ehem+deb...@m5p.com> wrote: >>>> I haven't yet done as much experimentation as Andreas Pflug has, but I >>>> can confirm I'm also running into this bug with Xen 4.4.1. >>>> >>>> I've only tried Linux kernel 3.16.43, but as Dom0: >>>> >>>> EDAC MC: Ver: 3.0.0 >>>> AMD64 EDAC driver v3.4.0 >>>> EDAC amd64: DRAM ECC enabled. >>>> EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to >>>> enable. >>>> EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not >>>> load. >>>> AMD64 EDAC driver v3.4.0 >>>> EDAC amd64: DRAM ECC enabled. >>>> EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to >>>> enable. >>>> EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not >>>> load. >>> Afaict the driver as is simply can't work in a Xen Dom0; it needs >>> enabling (read: para-virtualizing). I'm actually glad to see it doesn't >>> load (the worse alternative would be for it to load and then do the >>> wrong thing or give you a false sense of safety of your data). >> I'm unsure of how to evaluate the situation. Since ECC is enabled in the >> BIOS, data should be safe whether or not the EDAC driver loads. I >> /suspect/ the EDAC driver failing to load merely means reportting of ECC >> errors won't happen. > "Merely" being relative here: The missing reports mean a false feeling > of safety, as they may be early indications of later double-bit errors. > >> I suspect the only paravirtualization needed is to >> map the physical address of the soft|hard errors to which VM's memory >> range was effected. What this effects is which VM should panic in case >> of hard errors. > Which in turn obviously requires hypervisor interaction. It's not really > clear to me whether perhaps the driver would better live in the > hypervisor in the first place for that reason.
The driver should probably live directly in Xen; it needs to program a number of nothbridge and CPU registers including interrupt information. For the reporting side of things, it looks like it would require vMCE to pass on fault information to guests. ~Andrew