On 26/03/18 12:43, Robbert Eggermont wrote:
Does this sound familiar to anyone?
Does the slurmd log report it trying to kill the auks process? Also you might want to have a look at: https://bugs.schedmd.com/show_bug.cgi?id=4733 to see if that bug fits what you're seeing. Basically I get a slurmstepd stuck, deadlocking internally on free_list_lock() for reasons that are yet to be understood. You'll need to use pstack or gdb to see the thread info. The fact that auks is hanging around makes me wonder if this is a different issue, but you never know.. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC