One possible datapoint: on the node where the job ran, there were two 
slurmstepd processes running, both at 100%CPU even after the job had ended.


--
David Chin, PhD (he/him)   Sr. SysAdmin, URCF, Drexel
dw...@drexel.edu                     215.571.4335 (o)
For URCF support: urcf-supp...@drexel.edu
https://proteusmaster.urcf.drexel.edu/urcfwiki
github:prehensilecode

________________________________
From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of 
Chin,David <dw...@drexel.edu>
Sent: Monday, March 15, 2021 13:52
To: Slurm-Users List <slurm-users@lists.schedmd.com>
Subject: [slurm-users] Job ended with OUT_OF_MEMORY even though MaxRSS and 
MaxVMSize are under the ReqMem value


External.

Hi, all:

I'm trying to understand why a job exited with an error condition. I think it 
was actually terminated by Slurm: job was a Matlab script, and its output was 
incomplete.

Here's sacct output:

               JobID    JobName      User  Partition        NodeList    Elapsed 
     State ExitCode     ReqMem     MaxRSS  MaxVMSize                        
AllocTRES AllocGRE
-------------------- ---------- --------- ---------- --------------- ---------- 
---------- -------- ---------- ---------- ---------- 
-------------------------------- --------
               83387 ProdEmisI+      foob        def         node001   03:34:26 
OUT_OF_ME+    0:125      128Gn                               
billing=16,cpu=16,node=1
         83387.batch      batch                              node001   03:34:26 
OUT_OF_ME+    0:125      128Gn   1617705K   7880672K              
cpu=16,mem=0,node=1
        83387.extern     extern                              node001   03:34:26 
 COMPLETED      0:0      128Gn       460K    153196K         
billing=16,cpu=16,node=1

Thanks in advance,
    Dave

--
David Chin, PhD (he/him)   Sr. SysAdmin, URCF, Drexel
dw...@drexel.edu                     215.571.4335 (o)
For URCF support: urcf-supp...@drexel.edu
https://proteusmaster.urcf.drexel.edu/urcfwiki
github:prehensilecode



Drexel Internal Data


Drexel Internal Data


Drexel Internal Data

Reply via email to