On 2016-03-17 at 20:23, Henrique de Moraes Holschuh wrote: > On Wed, 16 Mar 2016, The Wanderer wrote: > >> On 2016-03-16 at 11:35, Henrique de Moraes Holschuh wrote:
>>> What processor is this, please? >> >> Core i7-990X Extreme. /proc/cpuinfo reports it as: >> >> cpu family: 6 >> model: 44 >> model name: Intel(R) Core(TM) i7 CPU X 990 @ 3.47 GHz >> stepping: 2 > > Crap. I am looking into this. I'm afraid I don't quite understand. Is there still a problem worth being concerned about, now that the message has stopped appearing for me? >> and currently (with the problem not happening) also reports >> microcode: 0x14 > > Lucky you, 0x14 is safe enough on non-server systems. > >> The problem apparently only happens with some motherboards, whose >> BIOS or UEFI doesn't handle something correctly (I used to know >> what, but I've forgotten the details). My motherboard is an Asus >> Sabertooth X58, > > The broken IOMMU interrupt remapping on the X58/S55xx chipsets, > maybe? Could be. I'll try to find time tomorrow to re-do some of my previous research and dig up what I had deduced the original claimed problem to be. > I'd expect any BIOS with a 0x14 microcode to have the fix to the > above (which is to disable the broken interrupt remapping feature of > the IOMMU), so it might have been fixed when you updated that BIOS. The recurring messages persisted after the BIOS update (although they seemed, at least at first, to get less frequent), so while this may have helped, it doesn't seem to have been enough on its own. FWIW, I think the previous microcode on my system was either 0x11 or 0x10, although I can't swear to that. (I might be mixing it up with some of the computers at my workplace; I don't exactly check the BIOS on this machine very often.) It's also (at least faintly) possible that the 0x14 microcode is being put in place after boot, despite the change to stop doing that automatically. I did install iucode-tool and some other microcode-related packages in my attempts to find a fix; although it didn't seem to produce any results initially, it's not impossible that some later package update introduced a change which got the microcode being applied on-the-fly again. > And I *think* our 3.14 kernel eventually got the patch that bitches > about BIOSes that get this wrong and tries to disable it, but I am > not sure about this, so a kernel update can certainly fix it (if > that's indeed the root cause of the "no irq handler for vector" on > X58/S55xx systems). That's interesting to be aware of; thanks. Is there anything in particular I should look for, in kernel messages, to determine whether this is taking effect on my system? > There was also an erratum that caused the uncore frequency multiplier > to be stuck and locked on "max". This got fixed somewhere between > microcode 0x10 and microcode 0x13, AFAIK... > > Does any of the above ring a bell? I think it may well have been the broken interrupt remapping that was the problem, but unfortunately, it's been long enough since I gave up on the research that I don't have the details anymore. -- The Wanderer The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man. -- George Bernard Shaw
signature.asc
Description: OpenPGP digital signature