We've attempted setting JobAcctGatherFrequency=task=0 and there is no change. We have settings:
ProctrackType=proctrack/cgroup TaskPlugin=task/cgroup,task/affinity JobAcctGatherType=jobacct_gather/cgroup Odd ... wonder why we don't see it help. Here is how we verify: === #!/bin/bash #SBATCH --job-name=lazy # the name of your job #SBATCH --output=/scratch/blah/lazy.txt # this is the file your output and errors go to #SBATCH --time=20:00 # max time #SBATCH --workdir=/scratch/blah # your work directory #SBATCH --mem=7000 # total mem #SBATCH -c4 # 4 cpus # use 500MB of memory and 1 cpu thread #srun stress -m 1 --vm-bytes 500M --timeout 65s # use 500MB of memory and 3 cpu threads, 1 memory thread srun stress -c 3 -m 1 --vm-bytes 500M --timeout 65s === Still have jobs with usercpu way too high. [cbc@head-dev ~ ]$ jobstats JobID JobName ReqMem MaxRSS ReqCPUS UserCPU Timelimit Elapsed State JobEff ============================================================================================================= 7957 lazy 9.77G 0.0M 4 00:00:00 00:20:00 00:00:00 FAILED - 7958 lazy 6.84G 0.0M 4 00:00.018 00:20:00 00:00:01 FAILED - 7959 lazy 6.84G 480M 4 01:51.269 00:20:00 00:01:06 COMPLETED 18.17 7960 lazy 6.84G 499M 4 02:01.275 00:20:00 00:01:06 COMPLETED 19.53 7961 lazy 6.84G 499M 4 01:55.259 00:20:00 00:01:06 COMPLETED 18.76 7962 lazy 6.84G 499M 4 01:58.307 00:20:00 00:01:06 COMPLETED 19.15 7963 lazy 6.84G 491M 4 02:01.267 00:20:00 00:01:06 COMPLETED 19.49 7964 lazy 6.84G 499M 4 02:01.270 00:20:00 00:01:05 COMPLETED 19.73 7965 lazy 6.84G 500M 4 02:04.336 00:20:00 00:01:05 COMPLETED 20.13 7966 lazy 6.84G 468M 4 04:58:56 00:20:00 00:01:05 COMPLETED 2303.53 7967 lazy 6.84G 464M 4 04:40:39 00:20:00 00:01:05 COMPLETED 2162.87 7968 lazy 6.84G 440M 4 05:20:22 00:20:00 00:01:05 COMPLETED 2468.26 7969 lazy 6.84G 500M 4 05:14:37 00:20:00 00:01:05 COMPLETED 2424.32 7970 lazy 6.84G 278M 4 02:56:39 00:20:00 00:01:06 COMPLETED 1341.42 7971 lazy 6.84G 265M 4 02:57:18 00:20:00 00:01:06 COMPLETED 1346.28 7972 lazy 6.84G 500M 4 02:54:38 00:20:00 00:01:06 COMPLETED 1327.2 7973 lazy 6.84G 426M 4 02:29:50 00:20:00 00:01:06 COMPLETED 1138.96 ============================================================================================================= Requested Memory: 06.49% Requested Cores : 2906.81% Time Limit : 05.47% ======================== Efficiency Score: 972.92 ======================== — Christopher Coffey High-Performance Computing Northern Arizona University 928-523-1167 On 1/9/19, 7:24 AM, "slurm-users on behalf of Paddy Doyle" <slurm-users-boun...@lists.schedmd.com on behalf of pa...@tchpc.tcd.ie> wrote: On Wed, Jan 09, 2019 at 12:44:03PM +0100, Bj?rn-Helge Mevik wrote: > Paddy Doyle <pa...@tchpc.tcd.ie> writes: > > > Looking back through the mailing list, it seems that from 2015 onwards the > > recommendation from Danny was to use 'jobacct_gather/linux' instead of > > 'jobacct_gather/cgroup'. I didn't pick up on that properly, so we kept with > > the cgroup version. > > > > Is anyone else still using jobacct_gather/cgroup and are you seeing this > > same issue? > > Just a side note: In last year's SLUG, Tim recommended the following > settings: > > proctrack/cgroup, task/cgroup, jobacct_gather/cgroup > > So the recommendation for jobacct_gather might have changed -- or Danny > and Tim might just have different opinions. :) Interesting... the cgroups documentation page still says the performance of jobacct_gather/cgroup is worse than jobacct_gather/linux. Although according to the git commits of doc/html/cgroups.shtml, that was added to the page in Jan 2015, so yeah maybe things have changed again. :) https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fcgroups.html&data=02%7C01%7Cchris.coffey%40nau.edu%7C2e47d9c9330646a8245f08d6763e2346%7C27d49e9f89e14aa099a3d35b57b2ba03%7C0%7C0%7C636826406595983378&sdata=i634oCV0NeO6DvBos05gM3iF7YxI%2FJC%2BZC7MJ222SW8%3D&reserved=0 In that case, either set 'JobAcctGatherFrequency=task=0' or wait for the bug to be fixed. Paddy -- Paddy Doyle Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. Phone: +353-1-896-3725 https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.tchpc.tcd.ie%2F&data=02%7C01%7Cchris.coffey%40nau.edu%7C2e47d9c9330646a8245f08d6763e2346%7C27d49e9f89e14aa099a3d35b57b2ba03%7C0%7C0%7C636826406595983378&sdata=S2PCubxVUifigrvyEnmFdrQb5G9Ak4roM2FJtUxiM%2Fw%3D&reserved=0