Hi Chris,

On 26-03-18 05:04, Christopher Samuel wrote:
Does the slurmd log report it trying to kill the auks process?

The first thing I need to do is turn up the logging verbosity.

https://bugs.schedmd.com/show_bug.cgi?id=4733

The fact that auks is hanging around makes me wonder if this is a
different issue, but you never know..

It's not a 100% match but it's the closest I've found so far. I'll need to study this some more.

I left a test job hanging last night, and this morning the slurmstepd was gone, but the auks is still there (orphaned)... Which is different than last night, when the nodes were drained because of a batch job failure...

I'll report back when I find out more.

Robbert

--
Robbert Eggermont
Intelligent Systems Support & Data Steward | TU Delft
+31 15 27 83234 | Building 28, Floor 5, Room W660
Available Mon, Wed-Fri

Reply via email to