So this issue is occurring only with job arrays.

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
 

On 12/21/18, 12:15 PM, "slurm-users on behalf of Chance Bryce Carl Nelson" 
<slurm-users-boun...@lists.schedmd.com on behalf of chance-nel...@nau.edu> 
wrote:

    Hi folks,
    
    
    calling sacct with the usercpu flag enabled seems to provide cpu times far 
above expected values for job array indices. This is also reported by seff. For 
example, executing the following job script:
    ________________________________________________________
    
    
    #!/bin/bash
    #SBATCH --job-name=array_test                   
    #SBATCH --workdir=/scratch/cbn35/bigdata          
    #SBATCH --output=/scratch/cbn35/bigdata/logs/job_%A_%a.log
    #SBATCH --time=20:00  
    #SBATCH --array=1-5
    #SBATCH -c2
    
    
    srun stress -c 2 -m 1 --vm-bytes 500M --timeout 65s
    
    
    
    ________________________________________________________
    
    
    ...results in the following stats:
    ________________________________________________________
    
    
    
           JobID  ReqCPUS    UserCPU  Timelimit    Elapsed 
    ------------ -------- ---------- ---------- ---------- 
    15730924_5          2   02:30:14   00:20:00   00:01:08 
    15730924_5.+        2  00:00.004              00:01:08 
    15730924_5.+        2   00:00:00              00:01:09 
    15730924_5.0        2   02:30:14              00:01:05 
    15730924_1          2   02:30:48   00:20:00   00:01:08 
    15730924_1.+        2  00:00.013              00:01:08 
    15730924_1.+        2   00:00:00              00:01:09 
    15730924_1.0        2   02:30:48              00:01:05 
    15730924_2          2   02:15:52   00:20:00   00:01:07 
    15730924_2.+        2  00:00.007              00:01:07 
    15730924_2.+        2   00:00:00              00:01:07 
    15730924_2.0        2   02:15:52              00:01:06 
    15730924_3          2   02:30:20   00:20:00   00:01:08 
    15730924_3.+        2  00:00.010              00:01:08 
    15730924_3.+        2   00:00:00              00:01:09 
    15730924_3.0        2   02:30:20              00:01:05 
    15730924_4          2   02:30:26   00:20:00   00:01:08 
    15730924_4.+        2  00:00.006              00:01:08 
    15730924_4.+        2   00:00:00              00:01:09 
    15730924_4.0        2   02:30:25              00:01:05 
    
    
    
    ________________________________________________________
    
    
    This is also reported by seff, with several errors to boot:
    ________________________________________________________
    
    
    
    Use of uninitialized value $lmem in numeric lt (<) at /usr/bin/seff line 
130, <DATA> line 624.
    Use of uninitialized value $lmem in numeric lt (<) at /usr/bin/seff line 
130, <DATA> line 624.
    Use of uninitialized value $lmem in numeric lt (<) at /usr/bin/seff line 
130, <DATA> line 624.
    Job ID: 15730924
    Array Job ID: 15730924_5
    Cluster: monsoon
    User/Group: cbn35/clusterstu
    State: COMPLETED (exit code 0)
    Nodes: 1
    Cores per node: 2
    CPU Utilized: 03:19:15
    CPU Efficiency: 8790.44% of 00:02:16 core-walltime
    Job Wall-clock time: 00:01:08
    Memory Utilized: 0.00 MB (estimated maximum)
    Memory Efficiency: 0.00% of 1.95 GB (1000.00 MB/core)
    
    
    
    ________________________________________________________
    
    
    
    
    
    As far as I can tell, I don't think a two core job with an elapsed time of 
around one minute would have a cpu time of two hours. Could this be a 
configuration issue, or is it a possible bug? 
    
    
    More info is available on request, and any help is appreciated!
    
    
    
    
    

Reply via email to