Re: [slurm-users] srun: Job step aborted

2023-03-01 Thread Niccolo Tosato
_pmix.so -> ./mpi_pmix_v4.so -rwxr-xr-x 1 root root 198952 Dec 18 00:38 mpi_pmix_v4.so This also could be verified using the command: `srun --mpi=list` I hope that this could be usefull.  Niccolò Il giovedì 16 febbraio 2023 alle ore 10:04:51 CET, Niccolo Tosato ha scritto: Hi all, I'm faci

[slurm-users] srun: Job step aborted

2023-02-16 Thread Niccolo Tosato
Hi all, I'm facing the following issue with a DGX A100 machine: I'm able to allocate resources, but the job fail when I try to execute srun, follow a detailed analysis of the incident: ``` $ salloc -n1 -N1 -p DEBUG -w dgx001 --time=2:0:0 salloc: Granted job allocation 1278 salloc: Waiting