On Thursday, 2 July 2020 6:52:15 AM PDT Prentice Bisbal wrote:
> [2020-07-01T16:19:19.463] [801777.extern] _oom_event_monitor: oom-kill
> event count: 1
We get that line for pretty much every job, I don't think it reflects the OOM
killer being invoked on something in the extern step.
OOM killer
I maintain a very heterogeneous cluster (different processors, different
amounts of RAM, etc.) I have a user reporting the following problem.
He's running the same job multiple times with different input
parameters. The jobs run fine unless they land on specific nodes. He's
specifying --mem=2G
Are you sure that the OOM killer is involved? I can get you specifics later,
but if it’s that one line about OOM events, you may see it after successful
jobs too. I just had a SLURM bug where this came up.
--
|| \\UTGERS, |---*O*---
||_/