Am 11.08.2020 um 20:55 schrieb Richard Lefebvre:
I have seen this as well - I did not bother to trace it in the code, but I would guess its some underflow problem (when the raw usage of the account decays toward zero the LevelFS gets ever bigger...)Hi,The command "sshare -l" is crashing. I isolated the problem to an account which is causing the problem. The problem seems to be an extremely large LevelFS in the order of 4.8x10e16. I can see the value if I add the "-p" option. Is there a way to fix the account?
It can be fixed by just resetting the account to 'true' zero usage (sacctmgr modify account NAME set rawusage=0)When the next FS recalculation kicks in the huge LevelFS resets to 'inf' and the problem goes away.
Regards, Holger N.
Below are the results of the 2 commands with the "-p" and next the crashed command:sshare -l -p --account=group001_cpu Account|User|RawShares|NormShares|RawUsage|NormUsage|EffectvUsage|FairShare|LevelFS|GrpTRESMins|TRESRunMins| group001_cpu||650216|0.003724|0|0.000000|0.000000||48285673640776424.000000||cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0| sshare -l --account=group001_cpuAccount User RawShares NormShares RawUsage NormUsage EffectvUsage FairShare LevelFS GrpTRESMins TRESRunMins -------------------- ---------- ---------- ----------- ----------- ----------- ------------- ---------- ---------- ------------------------------ ------------------------------ *** Error in `sshare': free(): invalid next size (fast): 0x0000000000eff280 ***======= Backtrace: ========= /lib64/libc.so.6(+0x81679)[0x7efd0e82a679] /opt/software/slurm/lib64/slurm/libslurmfull.so(slurm_xfree+0x1d)[0x7efd0fcb9009] /opt/software/slurm/lib64/slurm/libslurmfull.so(print_fields_double+0x2d6)[0x7efd0fc02a08] sshare(process+0x51c)[0x4024c9] sshare[0x40292c] sshare(main+0xa2d)[0x40337f] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7efd0e7cb505] sshare[0x401da9] ======= Memory map: ========00400000-00405000 r-xp 00000000 00:2f 51577 /opt/software/slurm/bin/sshare 00604000-00605000 r--p 00004000 00:2f 51577 /opt/software/slurm/bin/sshare 00605000-00606000 rw-p 00005000 00:2f 51577 /opt/software/slurm/bin/sshare00ee3000-00f23000 rw-p 00000000 00:00 0 [heap] 7efd08000000-7efd08021000 rw-p 00000000 00:00 0 7efd08021000-7efd0c000000 ---p 00000000 00:00 07efd0d564000-7efd0d579000 r-xp 00000000 00:24 61849 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7efd0d579000-7efd0d778000 ---p 00015000 00:24 61849 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7efd0d778000-7efd0d779000 r--p 00014000 00:24 61849 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7efd0d779000-7efd0d77a000 rw-p 00015000 00:24 61849 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7efd0d77a000-7efd0d783000 r-xp 00000000 00:24 66799 /usr/lib64/libmunge.so.2.0.0 7efd0d783000-7efd0d982000 ---p 00009000 00:24 66799 /usr/lib64/libmunge.so.2.0.0 7efd0d982000-7efd0d983000 r--p 00008000 00:24 66799 /usr/lib64/libmunge.so.2.0.0 7efd0d983000-7efd0d984000 rw-p 00009000 00:24 66799 /usr/lib64/libmunge.so.2.0.0 7efd0d984000-7efd0d987000 r-xp 00000000 00:2f 51448 /opt/software/slurm/lib64/slurm/auth_munge.so 7efd0d987000-7efd0db86000 ---p 00003000 00:2f 51448 /opt/software/slurm/lib64/slurm/auth_munge.so 7efd0db86000-7efd0db87000 r--p 00002000 00:2f 51448 /opt/software/slurm/lib64/slurm/auth_munge.so 7efd0db87000-7efd0db88000 rw-p 00003000 00:2f 51448 /opt/software/slurm/lib64/slurm/auth_munge.so 7efd0db88000-7efd0e38d000 r--s 00000000 00:24 191641 /var/lib/sss/mc/passwd 7efd0e38d000-7efd0e395000 r-xp 00000000 00:24 66184 /usr/lib64/libnss_sss.so.2 7efd0e395000-7efd0e594000 ---p 00008000 00:24 66184 /usr/lib64/libnss_sss.so.2 7efd0e594000-7efd0e595000 r--p 00007000 00:24 66184 /usr/lib64/libnss_sss.so.2 7efd0e595000-7efd0e596000 rw-p 00008000 00:24 66184 /usr/lib64/libnss_sss.so.2 7efd0e596000-7efd0e5a2000 r-xp 00000000 00:24 62229 /usr/lib64/libnss_files-2.17.so <http://libnss_files-2.17.so> 7efd0e5a2000-7efd0e7a1000 ---p 0000c000 00:24 62229 /usr/lib64/libnss_files-2.17.so <http://libnss_files-2.17.so> 7efd0e7a1000-7efd0e7a2000 r--p 0000b000 00:24 62229 /usr/lib64/libnss_files-2.17.so <http://libnss_files-2.17.so> 7efd0e7a2000-7efd0e7a3000 rw-p 0000c000 00:24 62229 /usr/lib64/libnss_files-2.17.so <http://libnss_files-2.17.so>7efd0e7a3000-7efd0e7a9000 rw-p 00000000 00:00 07efd0e7a9000-7efd0e96c000 r-xp 00000000 00:24 62154 /usr/lib64/libc-2.17.so <http://libc-2.17.so> 7efd0e96c000-7efd0eb6c000 ---p 001c3000 00:24 62154 /usr/lib64/libc-2.17.so <http://libc-2.17.so> 7efd0eb6c000-7efd0eb70000 r--p 001c3000 00:24 62154 /usr/lib64/libc-2.17.so <http://libc-2.17.so> 7efd0eb70000-7efd0eb72000 rw-p 001c7000 00:24 62154 /usr/lib64/libc-2.17.so <http://libc-2.17.so>7efd0eb72000-7efd0eb77000 rw-p 00000000 00:00 07efd0eb77000-7efd0eb8e000 r-xp 00000000 00:24 62349 /usr/lib64/libpthread-2.17.so <http://libpthread-2.17.so> 7efd0eb8e000-7efd0ed8d000 ---p 00017000 00:24 62349 /usr/lib64/libpthread-2.17.so <http://libpthread-2.17.so> 7efd0ed8d000-7efd0ed8e000 r--p 00016000 00:24 62349 /usr/lib64/libpthread-2.17.so <http://libpthread-2.17.so> 7efd0ed8e000-7efd0ed8f000 rw-p 00017000 00:24 62349 /usr/lib64/libpthread-2.17.so <http://libpthread-2.17.so>7efd0ed8f000-7efd0ed93000 rw-p 00000000 00:00 07efd0ed93000-7efd0edb8000 r-xp 00000000 00:24 62205 /usr/lib64/libtinfo.so.5.9 7efd0edb8000-7efd0efb8000 ---p 00025000 00:24 62205 /usr/lib64/libtinfo.so.5.9 7efd0efb8000-7efd0efbc000 r--p 00025000 00:24 62205 /usr/lib64/libtinfo.so.5.9 7efd0efbc000-7efd0efbd000 rw-p 00029000 00:24 62205 /usr/lib64/libtinfo.so.5.9 7efd0efbd000-7efd0efe3000 r-xp 00000000 00:24 62147 /usr/lib64/libncurses.so.5.9 7efd0efe3000-7efd0f1e2000 ---p 00026000 00:24 62147 /usr/lib64/libncurses.so.5.9 7efd0f1e2000-7efd0f1e3000 r--p 00025000 00:24 62147 /usr/lib64/libncurses.so.5.9 7efd0f1e3000-7efd0f1e4000 rw-p 00026000 00:24 62147 /usr/lib64/libncurses.so.5.9 7efd0f1e4000-7efd0f1ec000 r-xp 00000000 00:24 62410 /usr/lib64/libhistory.so.6.2 7efd0f1ec000-7efd0f3eb000 ---p 00008000 00:24 62410 /usr/lib64/libhistory.so.6.2 7efd0f3eb000-7efd0f3ec000 r--p 00007000 00:24 62410 /usr/lib64/libhistory.so.6.2 7efd0f3ec000-7efd0f3ed000 rw-p 00008000 00:24 62410 /usr/lib64/libhistory.so.6.2 7efd0f3ed000-7efd0f429000 r-xp 00000000 00:24 62408 /usr/lib64/libreadline.so.6.2 7efd0f429000-7efd0f629000 ---p 0003c000 00:24 62408 /usr/lib64/libreadline.so.6.2 7efd0f629000-7efd0f62b000 r--p 0003c000 00:24 62408 /usr/lib64/libreadline.so.6.2 7efd0f62b000-7efd0f631000 rw-p 0003e000 00:24 62408 /usr/lib64/libreadline.so.6.27efd0f631000-7efd0f633000 rw-p 00000000 00:00 07efd0f633000-7efd0f734000 r-xp 00000000 00:24 62170 /usr/lib64/libm-2.17.so <http://libm-2.17.so> 7efd0f734000-7efd0f933000 ---p 00101000 00:24 62170 /usr/lib64/libm-2.17.so <http://libm-2.17.so> 7efd0f933000-7efd0f934000 r--p 00100000 00:24 62170 /usr/lib64/libm-2.17.so <http://libm-2.17.so> 7efd0f934000-7efd0f935000 rw-p 00101000 00:24 62170 /usr/lib64/libm-2.17.so <http://libm-2.17.so> 7efd0f935000-7efd0f937000 r-xp 00000000 00:24 62166 /usr/lib64/libdl-2.17.sogroup001_cpu 650216 0.003712 0 0.000000 0.000000 4.8104e+16 Aborted
-- Dr. Holger Naundorf Christian-Albrechts-Universität zu Kiel Rechenzentrum / HPC / Server und Storage Tel: +49 431 880-1990 Fax: +49 431 880-1523 naund...@rz.uni-kiel.de
smime.p7s
Description: S/MIME Cryptographic Signature