[slurm-users] Jobs showing running but not running

2024-05-29 Thread Sushil Mishra via slurm-users
Hi All, I'm managing a cluster with Slurm, consisting of 4 nodes. One of the compute nodes appears to be experiencing issues. While the front node's 'squeue' command indicates that jobs are running, upon connecting to the problematic node, I observe no active processes and GPUs are not being utili

Re: [slurm-users] slurm_persist_conn_open_without_init: failed to open persistent connection to host

2022-12-01 Thread Sushil Mishra
se; the cluster name is I think an abstract > name, where host names must be for real nodes that are resolvable. > > > > You may also find information in /var/log/messages or /var/log/secure….if > applicable to your Linux distro. > > > > I use Slurm with firewalld a

[slurm-users] slurm_persist_conn_open_without_init: failed to open persistent connection to host

2022-11-30 Thread Sushil Mishra
Hi all, I installed slurm and enable accounting in a single-node machine, i.e same server is the master and computing node. I mainly followed this page for instructions: https://southgreenplatform.github.io/trainings/hpc/slurminstallation/ After enabling accounting I am having problems in starting

Re: [slurm-users] slurm_update error: Invalid node state specified

2022-10-11 Thread Sushil Mishra
gt;> between what the node says or thinks it has (slurmd -C) and what the >> slurm.conf says it has. While there is that discrepancy and the node is >> invalid, you can't just tell it to resume. >> >> -- >> *From:* slurm-users on

[slurm-users] slurm_update error: Invalid node state specified

2022-10-11 Thread Sushil Mishra
Dear all, I am stuck with scontrol not recognizing the state keywords. I wonder if someone can point me to the possible cause of the error. I restarted slurmd a few times, and it didn't help. [sushil@fucose ~]$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST LocalQ* up infinite

Re: [slurm-users] Accounting core-hours usages

2022-10-10 Thread Sushil Mishra
>> Jörg Striewski >> >> Information Systems and Machine Learning Lab (ISMLL) >> Institute of Computer Science >> University of Hildesheim Germany >> post address: Universitätsplatz 1, D-31141Hildesheim, Germany >> visitor address: Samelsonplatz 1, D-31141 H

[slurm-users] Accounting core-hours usages

2022-10-10 Thread Sushil Mishra
Dear all, I am pretty new to system administration and looking for some help setup slumdb or maridb in a GPU cluster. We bought a machine but the vendor simply installed slurm and did not install any database for accounting. I tried installing MariaDB and then slurmdb as described in the manual bu

[slurm-users] Configuring SLURM on single node GPU cluster

2022-04-06 Thread Sushil Mishra
Dear SLURM users, I am very new to alarm and need some help in configuring slurm in a single node machine. This machine has 8x Nvidia GPUs and 96 core cpu. Vendor has set up a "LocalQ" but thai somehow is running all the calculations in GPU 0. If I submit 4 independent jobs at a time, it starts ru