On Wed, Jan 15, 2020 at 11:28 PM Juerg Haefliger <juerg.haefli...@canonical.com> wrote: > > On Thu, 16 Jan 2020 02:14:16 -0000 > dann frazier <dann.fraz...@canonical.com> wrote: > > > I built a kernel with the proposed patches[*] and ran a reboot/kernel > > compile test on 4 systems. The tests survived 46 total iterations > > (~12/system) before I interrupted. Two systems failed with "Synchronous > > External Abort: synchronous parity or ECC error" errors. > > > > I've reverted the systems back to 4.15.0-70 - the kernel before the > > cpufeature/errata patches that caused this - to see if these SEA errors > > are a regression. > > > > [*] https://lists.ubuntu.com/archives/kernel- > > team/2020-January/106909.html > > > > I've ran 75 iterations of reboot/compile-kernel and encountered 3 gcc > segmentation faults. Unfortunately, my test didn't capture the dmesg log but > it's likely that these are due to the ECC problems we're (still?) seeing.
I've seen those on every machine so far when ran long enough. Since I believe we've clearly demonstrated that this is an unrelated failure, I've split it out into bug 1860013 - let's track it there. > There was also another issue during one of the reboots which is probably > unrelated and due to a flaky BMC: Let's track that in bug 1857073. Even if it is a flaky BMC, the IPMI driver should handle the failure gracefully. Did you see this on host 'wright' as well? -dann -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1857074 Title: Cavium ThunderX CN88XX Panic : Unknown reason To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1857074/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs