[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-29 Thread mercan via slurm-users
Hi; Did you check can you connect db with your conf parameters from head-node: mysql --user=slurm --password=slurmdbpass  slurm_acct_db Also, check and stop firewall and selinux, if they are running. Last, you can stop slurmdbd, then run run terminal with: slurmdbd -D -vvv Regards; C. Ahmet

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-29 Thread Ole Holm Nielsen via slurm-users
This might be the firewall blocking communication to slurmdbd? You may perhaps find some useful information in this Wiki page: https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/ /Ole On 29-05-2024 23:05, Radhouane Aniba via slurm-users wrote: Hi everyone I am trying to get slurmdbd

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-29 Thread aradwen--- via slurm-users
Yes mysql database is running I can update and check, but I guess the update will break a couple of config , I need to check if this is something safe to do even though it is for my homelab but still :) -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email t

[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-29 Thread James Lam via slurm-users
1. is your mysql database running? 2. slurm 19.x is far obselete and you should at least use 21.x On 30/5/2024 5:05 am, Radhouane Aniba via slurm-users wrote: Hi everyone I am trying to get slurmdbd to run on my local home server but I am really struggling. Note : am a novice slurm user my slu

[slurm-users] slurmdbd not connecting to mysql (mariadb)

2024-05-29 Thread Radhouane Aniba via slurm-users
Hi everyone I am trying to get slurmdbd to run on my local home server but I am really struggling. Note : am a novice slurm user my slurmdbd always times out even though all the details in the conf file are correct My log looks like this [2024-05-29T20:51:30.088] Accounting storage MYSQL plugin l

[slurm-users] Re: Jobs showing running but not running

2024-05-29 Thread Ryan Novosielski via slurm-users
One of the other states — down or fail, from memory — should cause it to completely drop the job. -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technol

[slurm-users] Re: Jobs showing running but not running

2024-05-29 Thread Laura Hild via slurm-users
> sudo systemctl restart slurmd # gets stuck Are you able to restart other services on this host? Anything weird in its dmesg? -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Jobs showing running but not running

2024-05-29 Thread Sushil Mishra via slurm-users
Hi All, I'm managing a cluster with Slurm, consisting of 4 nodes. One of the compute nodes appears to be experiencing issues. While the front node's 'squeue' command indicates that jobs are running, upon connecting to the problematic node, I observe no active processes and GPUs are not being utili

[slurm-users] Re: dynamical configuration || meta configuration mgmt

2024-05-29 Thread Paul Edmon via slurm-users
Many parameters in slurm can be changed via scontrol and sacctmgr commands without updating the conf itself. The thing is that scontrol commands are not durable across restarts. sacctmgr though update the slurmdb and thus will be sticky. That's at least what I would do is that if you are using

[slurm-users] dynamical configuration || meta configuration mgmt

2024-05-29 Thread Heckes, Frank via slurm-users
Hello all, I’m sorry if this has been asked and answered before, but I couldn’t find anything related. Does anyone know whether a framework of sorts exists that allow to change certain SLURM configuration parameters provided some conditions in the batch system’s state are detected and of c

[slurm-users] Configuring sacct to report state=OUT_OF_MEMORY

2024-05-29 Thread Lee via slurm-users
Hello, *Background :* I am working on a small cluster that is managed by Base Command Manager v10.0 using Slurm 23.02.7 with Ubuntu 22.04.2. I have a small testing script that simply consumes memory and processors. I run my test script, it consumes more memory than allocated by Slurm and as expe

[slurm-users] Re: sbatch problem

2024-05-29 Thread Hermann Schwärzler via slurm-users
Hi Mihai, yes, it's the same problem: when you run srun echo $CUDA_VISIBLE_DEVICES the value of $CUDA_VISIBLE_DEVICES on the first of the two nodes is substituted into the line *before* srun is called. srun bash -c 'echo $CUDA_VISIBLE_DEVICES' is the way to go. BTW: the job-script I am

[slurm-users] Re: sbatch problem

2024-05-29 Thread Mihai Ciubancan via slurm-users
Dear Hermann, Sorry to come back to you, but just to understand...if I run the following script: #!/bin/bash #SBATCH --partition=gpu #SBATCH --time=24:00:00 #SBATCH --nodes=2 #SBATCH --exclusive #SBATCH --job-name="test_job" #SBATCH -o stdout_%j #SBATCH -e stderr_%j touch test.txt # Print