[slurm-users] Pending job will be cancelled since transport endpoint is not connected

2023-03-31 Thread Chenyang Yan
Hi, all We have one cluster with Slurm version 20.11.8 in CentOS 8.2. Suddenly it produces a wired problem proid for *only Pending job* will be cancelled since transport endpoint is not connected error(See image https://user-images.githubusercontent.com/19144683/229037078-ca704ba8-23a4-4948-9d1a-b

[slurm-users] Job real running time contains EIO time for srun --time

2023-01-04 Thread Chenyang Yan
Hi, all, I have met a problem about job running time. My job running time test script: ``` [root@slurmctl tmp]# cat test.sh #!/bin/bash cleanup() { local now=$(date '+%s') echo "now: $(date -d "@$now")" echo "difference(start_time-now): $((now - start_time))" } trap cleanup EXIT start_time=$(

Re: [slurm-users] Slurmrestd authentication failed: Unspecified error

2022-03-21 Thread Chenyang Yan
errorvalue" [ ] ``` So, I'm confused for slurmrestd JWT authentication. Thanks, Chenyang Yan On Mon, Mar 21, 2022 at 4:10 PM Guillaume COCHARD < guillaume.coch...@cc.in2p3.fr> wrote: > Hello, > > We had the same error and we fixed it by adding > `Environ

[slurm-users] Slurmrestd authentication failed: Unspecified error

2022-03-19 Thread Chenyang Yan
sed about JWT authentication. Q1: What is used for the `SLURM_JWT` environment variable, is it required for JWT? Related search from github source repo: https://github.com/SchedMD/slurm/search?q=SLURM_JWT Q2: How to use slurmrestd JWT authentication? Thanks, Chenyang Yan