[slurm-users] GPU configuration

2021-12-10 Thread Giuseppe G. A. Celano
Hi, My cluster has 2 nodes, with the first having 2 gpus and the second 1 gpu. The states of both nodes is "drained" because "gres/gpu count reported lower than configured": any idea why this happens? Thanks. My .conf files are: slurm.conf AccountingStorageTRES=gres/gpu GresTypes=gpu NodeName=t

Re: [slurm-users] [EXT] Re: slurmdbd does not work

2021-12-06 Thread Giuseppe G. A. Celano
Grazie Gennaro, It's working! On Mon, Dec 6, 2021 at 9:41 AM Gennaro Oliva wrote: > Ciao Giuseppe, > > On Mon, Dec 06, 2021 at 03:46:02AM +0100, Giuseppe G. A. Celano wrote: > > sinfo: symbol lookup error: sinfo: undefined symbol: slurm_conf > > srun: symbol loo

Re: [slurm-users] [EXT] Re: slurmdbd does not work

2021-12-05 Thread Giuseppe G. A. Celano
xfree_ptr sacct: symbol lookup error: sacct: undefined symbol: slurm_destroy_selected_step Does anyone know the reason for that? Thanks. Best, Giuseppe On Sat, Dec 4, 2021 at 5:31 PM Giuseppe G. A. Celano < giuseppegacel...@gmail.com> wrote: > Hi Gennaro, > &

Re: [slurm-users] [EXT] Re: slurmdbd does not work

2021-12-04 Thread Giuseppe G. A. Celano
am not sure whether I should try to uninstall my previous installation and reinstall slurm-wlm... On Sat, Dec 4, 2021 at 12:38 PM Gennaro Oliva wrote: > Ciao Giuseppe, > > On Sat, Dec 04, 2021 at 02:30:40AM +0100, Giuseppe G. A. Celano wrote: > > I have installed almost all

Re: [slurm-users] [EXT] Re: slurmdbd does not work

2021-12-03 Thread Giuseppe G. A. Celano
ent.so", whereas > libmariadb-dev provides "libmariadb.so" > -- > *From:* slurm-users on behalf of > Giuseppe G. A. Celano > *Sent:* Saturday, 4 December 2021 11:40 > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] [

Re: [slurm-users] [EXT] Re: slurmdbd does not work

2021-12-03 Thread Giuseppe G. A. Celano
10.4.22 On Sat, Dec 4, 2021 at 1:35 AM Brian Andrus wrote: > Which version of Mariadb are you using? > > Brian Andrus > On 12/3/2021 4:20 PM, Giuseppe G. A. Celano wrote: > > After installation of libmariadb-dev, I have reinstalled the entire slurm > with ./configure + op

Re: [slurm-users] [EXT] Re: slurmdbd does not work

2021-12-03 Thread Giuseppe G. A. Celano
normally use) > make > make install > > on your DBD server after you installed the mariadb-devel package? > > -- > *From:* slurm-users on behalf of > Giuseppe G. A. Celano > *Sent:* Saturday, 4 December 2021 10:07 > *To:* Slurm User Commun

Re: [slurm-users] slurmdbd does not work

2021-12-03 Thread Giuseppe G. A. Celano
u can just DROP the database and restart slurmdbd. > > Brian Andrus > On 12/3/2021 6:42 AM, Giuseppe G. A. Celano wrote: > > Thanks for the answer, Brian. I now added > --with-mysql_config=/etc/mysql/my.cnf, but the problem is still there and > now also slurmctld does not work, w

Re: [slurm-users] slurmdbd does not work

2021-12-03 Thread Giuseppe G. A. Celano
location with > --with-mysql when you configure/build slurm > > Brian Andrus > On 12/2/2021 12:40 PM, Giuseppe G. A. Celano wrote: > > Hi everyone, > > I am having trouble getting *slurmdbd* to work. This is the error I get: > > > > > *error: Couldn't fin

[slurm-users] slurmdbd does not work

2021-12-02 Thread Giuseppe G. A. Celano
Hi everyone, I am having trouble getting *slurmdbd* to work. This is the error I get: *error: Couldn't find the specified plugin name for accounting_storage/mysql looking at all fileserror: cannot find accounting_storage plugin for accounting_storage/mysqlerror: cannot create accounting_storag

Re: [slurm-users] job_submit.lua and memory allocations

2020-01-24 Thread William G. Wichser
o_use = job_desc.min_mem_per_node end if job_desc.min_mem_per_cpu ~= nil then mem_to_use = job_desc.min_mem_per_cpu * job_desc.min_cpus end log_info("slurm_job_submit: Got total memory: %d", mem_to_use) Bill On 1/24/20 8:52 AM, William G. Wichser wrote: > Resurrecting an older thread wher

[slurm-users] job_submit.lua and memory allocations

2020-01-24 Thread William G. Wichser
Resurrecting an older thread where I need to obtain the value for memory in a submitted job. Turns out this is not an easy case with the method I'm trying to use so hope that there is just some variable I am overlooking. The trivial case was simply to look at job_desc.pn_min_memory. And this

Re: [slurm-users] Slurm configuration, Weight Parameter

2019-11-22 Thread Goetz, Patrick G
Can't you just set the usage priority to be higher for the 2GB machines? This way, if the requested memory is less than 2GB those machines will be used first, and larger jobs skip to the higher memory machines. On 11/21/19 9:44 AM, Jim Prewett wrote: > > Hi Sistemas, > > I could be mista

Re: [slurm-users] Execute scripts on suspend and cancel

2019-10-17 Thread Goetz, Patrick G
Are applications even aware when they've been hit by a SIGSTP? This idea of a license being released under these circumstances just seems very unlikely. On 10/15/19 1:57 PM, Brian Andrus wrote: > It seems that there are some details that would need addressed. > > A suspend signal is nothing mo

Re: [slurm-users] How to share GPU resources? (MPS or another way?)

2019-10-08 Thread Goetz, Patrick G
On 10/8/19 1:47 AM, Kota Tsuyuzaki wrote: > GPU is running as well as gres gpu:1. And more, the NVIDIA docs looks to > describe what I hit > (https://docs.nvidia.com/deploy/mps/index.html#topic_4_3). That seems like > the mps-server will be created to each user and the > server will be running

Re: [slurm-users] Heterogeneous HPC

2019-09-19 Thread Goetz, Patrick G
On 9/19/19 8:22 AM, Thomas M. Payerle wrote: > one of our clusters > is still running RHEL6, and while containers based on Ubuntu 16, > Debian 8, or RHEL7 all appear to work properly, > containers based on Ubuntu 18 or Debian 9 will die with "Kernel too > old" errors. I think the idea generally is

Re: [slurm-users] ticking time bomb? launching too many jobs in parallel

2019-08-29 Thread Goetz, Patrick G
On 8/29/19 9:38 AM, Jarno van der Kolk wrote: > Here's an example on how to do so from the Compute Canada docs: > https://docs.computecanada.ca/wiki/GNU_Parallel#Running_on_Multiple_Nodes > [name@server ~]$ parallel --jobs 32 --sshloginfile ./node_list_${SLURM_JOB_ID} --env MY_VARIABLE --workdir

Re: [slurm-users] ticking time bomb? launching too many jobs in parallel

2019-08-29 Thread Goetz, Patrick G
On 8/27/19 11:47 AM, Brian Andrus wrote: > 1) If you can, either use xargs or parallel to do the forking so you can > limit the number of simultaneous submissions > Sorry if this is a naive question, but I'm not following how you would use parallel with Slurm (unless you're talking about using

Re: [slurm-users] slurm-19.05 link error

2019-07-24 Thread Goetz, Patrick G
es/hl/c/ptExampleFL.c > /usr/share/doc/hdf5/examples/hl/c/run-hlc-ex.sh > /usr/share/doc/hdf5/examples/hl/c++/ptExampleFL.cpp > /usr/share/doc/hdf5/examples/hl/c++/run-hlc++-ex.sh > /usr/share/doc/hdf5/examples/hl/fortran/ex_ds1.f90 > /usr/share/doc/hdf5/examples/hl/fortran/

Re: [slurm-users] Problem with sbatch

2019-07-08 Thread Goetz, Patrick G
Sudo is more flexible than than; for example you can just give the slurmduser sudo access to the chown command and nothing else. On 7/8/19 11:37 AM, Daniel Torregrosa wrote: > You are right. The critical part I was missing is that chown does not > work without sudo. > > I assume this can be fix

Re: [slurm-users] SLURM heterogeneous jobs, a little help needed plz

2019-03-21 Thread Goetz, Patrick G
There are 2 kinds of system admins: can do and can't do. You're a can do; his are can't do. On 3/21/19 10:26 AM, Prentice Bisbal wrote: > > On 3/20/19 1:58 PM, Christopher Samuel wrote: >> On 3/20/19 4:20 AM, Frava wrote: >> >>> Hi Chris, thank you for the reply. >>> The team that manages that

Re: [slurm-users] Kinda Off-Topic: data management for Slurm clusters

2019-02-26 Thread Goetz, Patrick G
But rsync -a will only help you if people are using identical or at least overlapping data sets? And you don't need rsync to prune out old files. On 2/26/19 1:53 AM, Janne Blomqvist wrote: > On 22/02/2019 18.50, Will Dennis wrote: >> Hi folks, >> >> Not directly Slurm-related, but... We have a

Re: [slurm-users] Unable to locate HDF5 compilation helper scripts 'h5cc' or 'h5pcc'.

2018-12-24 Thread G
>Date: Sun, 23 Dec 2018 19:45:08 -0800 >From: Kurt H Maier >To: Slurm User Community List >Subject: Re: [slurm-users] Unable to locate HDF5 compilation helper > scripts 'h5cc' or 'h5pcc'. >Message-ID: <20181224034508.GA56809@wopr> >Content-Type: text/plain; charset=us-ascii > >On Mon, Dec

Re: [slurm-users] About x11 support

2018-11-26 Thread Goetz, Patrick G
I'm a little confused about how this would work. For example, where does slurmctld run? And if on each submit host, why aren't the control daemons stepping all over each other? On 11/22/18 6:38 AM, Stu Midgley wrote: > indeed. > > All our workstations are submit hosts and in the queue, so peo