Am 11.08.2020 um 20:55 schrieb Richard Lefebvre:
Hi,

The command "sshare -l" is crashing. I isolated the problem to an account which is causing the problem. The problem seems to be an extremely large LevelFS in the order of 4.8x10e16. I can see the value if I add the "-p" option. Is there a way to fix the account?

I have seen this as well - I did not bother to trace it in the code, but I would guess its some underflow problem (when the raw usage of the account decays toward zero the LevelFS gets ever bigger...)

It can be fixed by just resetting the account to 'true' zero usage

(sacctmgr modify account NAME set rawusage=0)

When the next FS recalculation kicks in the huge LevelFS resets to 'inf' and the problem goes away.

Regards,

Holger N.



Below are the results of the 2 commands with the "-p" and next the crashed command:

sshare -l -p --account=group001_cpu
Account|User|RawShares|NormShares|RawUsage|NormUsage|EffectvUsage|FairShare|LevelFS|GrpTRESMins|TRESRunMins|
group001_cpu||650216|0.003724|0|0.000000|0.000000||48285673640776424.000000||cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0,gres/gpu=0|

sshare -l --account=group001_cpu
             Account       User  RawShares  NormShares  RawUsage   NormUsage  EffectvUsage  FairShare    LevelFS                  GrpTRESMins                    TRESRunMins -------------------- ---------- ---------- ----------- ----------- ----------- ------------- ---------- ---------- ------------------------------ ------------------------------ *** Error in `sshare': free(): invalid next size (fast): 0x0000000000eff280 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x81679)[0x7efd0e82a679]
/opt/software/slurm/lib64/slurm/libslurmfull.so(slurm_xfree+0x1d)[0x7efd0fcb9009]
/opt/software/slurm/lib64/slurm/libslurmfull.so(print_fields_double+0x2d6)[0x7efd0fc02a08]
sshare(process+0x51c)[0x4024c9]
sshare[0x40292c]
sshare(main+0xa2d)[0x40337f]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7efd0e7cb505]
sshare[0x401da9]
======= Memory map: ========
00400000-00405000 r-xp 00000000 00:2f 51577              /opt/software/slurm/bin/sshare 00604000-00605000 r--p 00004000 00:2f 51577              /opt/software/slurm/bin/sshare 00605000-00606000 rw-p 00005000 00:2f 51577              /opt/software/slurm/bin/sshare
00ee3000-00f23000 rw-p 00000000 00:00 0              [heap]
7efd08000000-7efd08021000 rw-p 00000000 00:00 0
7efd08021000-7efd0c000000 ---p 00000000 00:00 0
7efd0d564000-7efd0d579000 r-xp 00000000 00:24 61849              /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7efd0d579000-7efd0d778000 ---p 00015000 00:24 61849              /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7efd0d778000-7efd0d779000 r--p 00014000 00:24 61849              /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7efd0d779000-7efd0d77a000 rw-p 00015000 00:24 61849              /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7efd0d77a000-7efd0d783000 r-xp 00000000 00:24 66799              /usr/lib64/libmunge.so.2.0.0 7efd0d783000-7efd0d982000 ---p 00009000 00:24 66799              /usr/lib64/libmunge.so.2.0.0 7efd0d982000-7efd0d983000 r--p 00008000 00:24 66799              /usr/lib64/libmunge.so.2.0.0 7efd0d983000-7efd0d984000 rw-p 00009000 00:24 66799              /usr/lib64/libmunge.so.2.0.0 7efd0d984000-7efd0d987000 r-xp 00000000 00:2f 51448              /opt/software/slurm/lib64/slurm/auth_munge.so 7efd0d987000-7efd0db86000 ---p 00003000 00:2f 51448              /opt/software/slurm/lib64/slurm/auth_munge.so 7efd0db86000-7efd0db87000 r--p 00002000 00:2f 51448              /opt/software/slurm/lib64/slurm/auth_munge.so 7efd0db87000-7efd0db88000 rw-p 00003000 00:2f 51448              /opt/software/slurm/lib64/slurm/auth_munge.so 7efd0db88000-7efd0e38d000 r--s 00000000 00:24 191641             /var/lib/sss/mc/passwd 7efd0e38d000-7efd0e395000 r-xp 00000000 00:24 66184              /usr/lib64/libnss_sss.so.2 7efd0e395000-7efd0e594000 ---p 00008000 00:24 66184              /usr/lib64/libnss_sss.so.2 7efd0e594000-7efd0e595000 r--p 00007000 00:24 66184              /usr/lib64/libnss_sss.so.2 7efd0e595000-7efd0e596000 rw-p 00008000 00:24 66184              /usr/lib64/libnss_sss.so.2 7efd0e596000-7efd0e5a2000 r-xp 00000000 00:24 62229              /usr/lib64/libnss_files-2.17.so <http://libnss_files-2.17.so> 7efd0e5a2000-7efd0e7a1000 ---p 0000c000 00:24 62229              /usr/lib64/libnss_files-2.17.so <http://libnss_files-2.17.so> 7efd0e7a1000-7efd0e7a2000 r--p 0000b000 00:24 62229              /usr/lib64/libnss_files-2.17.so <http://libnss_files-2.17.so> 7efd0e7a2000-7efd0e7a3000 rw-p 0000c000 00:24 62229              /usr/lib64/libnss_files-2.17.so <http://libnss_files-2.17.so>
7efd0e7a3000-7efd0e7a9000 rw-p 00000000 00:00 0
7efd0e7a9000-7efd0e96c000 r-xp 00000000 00:24 62154              /usr/lib64/libc-2.17.so <http://libc-2.17.so> 7efd0e96c000-7efd0eb6c000 ---p 001c3000 00:24 62154              /usr/lib64/libc-2.17.so <http://libc-2.17.so> 7efd0eb6c000-7efd0eb70000 r--p 001c3000 00:24 62154              /usr/lib64/libc-2.17.so <http://libc-2.17.so> 7efd0eb70000-7efd0eb72000 rw-p 001c7000 00:24 62154              /usr/lib64/libc-2.17.so <http://libc-2.17.so>
7efd0eb72000-7efd0eb77000 rw-p 00000000 00:00 0
7efd0eb77000-7efd0eb8e000 r-xp 00000000 00:24 62349              /usr/lib64/libpthread-2.17.so <http://libpthread-2.17.so> 7efd0eb8e000-7efd0ed8d000 ---p 00017000 00:24 62349              /usr/lib64/libpthread-2.17.so <http://libpthread-2.17.so> 7efd0ed8d000-7efd0ed8e000 r--p 00016000 00:24 62349              /usr/lib64/libpthread-2.17.so <http://libpthread-2.17.so> 7efd0ed8e000-7efd0ed8f000 rw-p 00017000 00:24 62349              /usr/lib64/libpthread-2.17.so <http://libpthread-2.17.so>
7efd0ed8f000-7efd0ed93000 rw-p 00000000 00:00 0
7efd0ed93000-7efd0edb8000 r-xp 00000000 00:24 62205              /usr/lib64/libtinfo.so.5.9 7efd0edb8000-7efd0efb8000 ---p 00025000 00:24 62205              /usr/lib64/libtinfo.so.5.9 7efd0efb8000-7efd0efbc000 r--p 00025000 00:24 62205              /usr/lib64/libtinfo.so.5.9 7efd0efbc000-7efd0efbd000 rw-p 00029000 00:24 62205              /usr/lib64/libtinfo.so.5.9 7efd0efbd000-7efd0efe3000 r-xp 00000000 00:24 62147              /usr/lib64/libncurses.so.5.9 7efd0efe3000-7efd0f1e2000 ---p 00026000 00:24 62147              /usr/lib64/libncurses.so.5.9 7efd0f1e2000-7efd0f1e3000 r--p 00025000 00:24 62147              /usr/lib64/libncurses.so.5.9 7efd0f1e3000-7efd0f1e4000 rw-p 00026000 00:24 62147              /usr/lib64/libncurses.so.5.9 7efd0f1e4000-7efd0f1ec000 r-xp 00000000 00:24 62410              /usr/lib64/libhistory.so.6.2 7efd0f1ec000-7efd0f3eb000 ---p 00008000 00:24 62410              /usr/lib64/libhistory.so.6.2 7efd0f3eb000-7efd0f3ec000 r--p 00007000 00:24 62410              /usr/lib64/libhistory.so.6.2 7efd0f3ec000-7efd0f3ed000 rw-p 00008000 00:24 62410              /usr/lib64/libhistory.so.6.2 7efd0f3ed000-7efd0f429000 r-xp 00000000 00:24 62408              /usr/lib64/libreadline.so.6.2 7efd0f429000-7efd0f629000 ---p 0003c000 00:24 62408              /usr/lib64/libreadline.so.6.2 7efd0f629000-7efd0f62b000 r--p 0003c000 00:24 62408              /usr/lib64/libreadline.so.6.2 7efd0f62b000-7efd0f631000 rw-p 0003e000 00:24 62408              /usr/lib64/libreadline.so.6.2
7efd0f631000-7efd0f633000 rw-p 00000000 00:00 0
7efd0f633000-7efd0f734000 r-xp 00000000 00:24 62170              /usr/lib64/libm-2.17.so <http://libm-2.17.so> 7efd0f734000-7efd0f933000 ---p 00101000 00:24 62170              /usr/lib64/libm-2.17.so <http://libm-2.17.so> 7efd0f933000-7efd0f934000 r--p 00100000 00:24 62170              /usr/lib64/libm-2.17.so <http://libm-2.17.so> 7efd0f934000-7efd0f935000 rw-p 00101000 00:24 62170              /usr/lib64/libm-2.17.so <http://libm-2.17.so> 7efd0f935000-7efd0f937000 r-xp 00000000 00:24 62166              /usr/lib64/libdl-2.17.sogroup001_cpu        650216    0.003712           0    0.000000  0.000000            4.8104e+16 Aborted

--
Dr. Holger Naundorf
Christian-Albrechts-Universität zu Kiel
Rechenzentrum / HPC / Server und Storage
Tel: +49 431 880-1990
Fax:  +49 431 880-1523
naund...@rz.uni-kiel.de

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to