Re: [Beowulf] SIMD exception kernel panic on Skylake-EP triggered by OpenFOAM?

2018-09-09 Thread Jonathan Engwall
If it is helpful there are a few similar bugs, generally considered unreproducible. One thread calls it bogus xcomp_bv...the kernel clobbers itself writing zeroes when that is not the state. And spectre came up. One suggestion is to disable IBRS; according to other sources IBRS is dangerous to d

[Beowulf] C++ compilers and assembly

2018-09-09 Thread John Hearns via Beowulf
Chris Samuels recent post reminds me. I went to a fascinating and well delivered talk by Jason Hearne McGuiness https://www.meetup.com/ACCULondon/events/253570550/ https://accu.org/index.php/accu_branches/accu_london Slides are here: https://github.com/acculondon/2018-September I would encourage

Re: [Beowulf] SIMD exception kernel panic on Skylake-EP triggered by OpenFOAM?

2018-09-09 Thread Christopher Samuel
On 10/09/18 11:16, Joe Landman wrote: If you have dumps from the crash, you could load them up in the debugger. Would be the most accurate route to determine why that was triggered. Thanks Joe! Looking at our nodes I don't think we've got crash dumps enabled, I'll see if we can get that done

Re: [Beowulf] SIMD exception kernel panic on Skylake-EP triggered by OpenFOAM?

2018-09-09 Thread Joe Landman
I've not seen this one, but looking around a bit, I am wondering if the code path hit a denormal underflow in a SIMD instruction, and didn't have the appropriate SIMD exception mask.  See https://software.intel.com/en-us/articles/x87-and-sse-floating-point-assists-in-ia-32-flush-to-zero-ftz-and-

[Beowulf] SIMD exception kernel panic on Skylake-EP triggered by OpenFOAM?

2018-09-09 Thread Christopher Samuel
Hi folks, We've had 2 different nodes crash over the past few days with kernel panics triggered by (what is recorded as) a "simd exception" (console messages below). In both cases the triggering application is given as the same binary, a user application built against OpenFOAM v16.06. This doesn