Hi,
My cluster has 2 nodes, with the first having 2 gpus and the second 1 gpu.
The states of both nodes is "drained" because "gres/gpu count reported
lower than configured": any idea why this happens? Thanks.
My .conf files are:
slurm.conf
AccountingStorageTRES=gres/gpu
GresTypes=gpu
NodeName=t
Grazie Gennaro,
It's working!
On Mon, Dec 6, 2021 at 9:41 AM Gennaro Oliva wrote:
> Ciao Giuseppe,
>
> On Mon, Dec 06, 2021 at 03:46:02AM +0100, Giuseppe G. A. Celano wrote:
> > sinfo: symbol lookup error: sinfo: undefined symbol: slurm_conf
> > srun: symbol loo
xfree_ptr
sacct: symbol lookup error: sacct: undefined symbol:
slurm_destroy_selected_step
Does anyone know the reason for that? Thanks.
Best,
Giuseppe
On Sat, Dec 4, 2021 at 5:31 PM Giuseppe G. A. Celano <
giuseppegacel...@gmail.com> wrote:
> Hi Gennaro,
>
&
am not sure
whether I should try to uninstall my previous installation and reinstall
slurm-wlm...
On Sat, Dec 4, 2021 at 12:38 PM Gennaro Oliva
wrote:
> Ciao Giuseppe,
>
> On Sat, Dec 04, 2021 at 02:30:40AM +0100, Giuseppe G. A. Celano wrote:
> > I have installed almost all
ent.so", whereas
> libmariadb-dev provides "libmariadb.so"
> --
> *From:* slurm-users on behalf of
> Giuseppe G. A. Celano
> *Sent:* Saturday, 4 December 2021 11:40
> *To:* Slurm User Community List
> *Subject:* Re: [slurm-users] [
10.4.22
On Sat, Dec 4, 2021 at 1:35 AM Brian Andrus wrote:
> Which version of Mariadb are you using?
>
> Brian Andrus
> On 12/3/2021 4:20 PM, Giuseppe G. A. Celano wrote:
>
> After installation of libmariadb-dev, I have reinstalled the entire slurm
> with ./configure + op
normally use)
> make
> make install
>
> on your DBD server after you installed the mariadb-devel package?
>
> --
> *From:* slurm-users on behalf of
> Giuseppe G. A. Celano
> *Sent:* Saturday, 4 December 2021 10:07
> *To:* Slurm User Commun
u can just DROP the database and restart slurmdbd.
>
> Brian Andrus
> On 12/3/2021 6:42 AM, Giuseppe G. A. Celano wrote:
>
> Thanks for the answer, Brian. I now added
> --with-mysql_config=/etc/mysql/my.cnf, but the problem is still there and
> now also slurmctld does not work, w
location with
> --with-mysql when you configure/build slurm
>
> Brian Andrus
> On 12/2/2021 12:40 PM, Giuseppe G. A. Celano wrote:
>
> Hi everyone,
>
> I am having trouble getting *slurmdbd* to work. This is the error I get:
>
>
>
>
> *error: Couldn't fin
Hi everyone,
I am having trouble getting *slurmdbd* to work. This is the error I get:
*error: Couldn't find the specified plugin name for
accounting_storage/mysql looking at all fileserror: cannot find
accounting_storage plugin for accounting_storage/mysqlerror: cannot create
accounting_storag
o_use = job_desc.min_mem_per_node
end
if job_desc.min_mem_per_cpu ~= nil then
mem_to_use = job_desc.min_mem_per_cpu * job_desc.min_cpus
end
log_info("slurm_job_submit: Got total memory: %d", mem_to_use)
Bill
On 1/24/20 8:52 AM, William G. Wichser wrote:
> Resurrecting an older thread wher
Resurrecting an older thread where I need to obtain the value for memory
in a submitted job. Turns out this is not an easy case with the method
I'm trying to use so hope that there is just some variable I am overlooking.
The trivial case was simply to look at job_desc.pn_min_memory. And this
Can't you just set the usage priority to be higher for the 2GB machines?
This way, if the requested memory is less than 2GB those machines will
be used first, and larger jobs skip to the higher memory machines.
On 11/21/19 9:44 AM, Jim Prewett wrote:
>
> Hi Sistemas,
>
> I could be mista
Are applications even aware when they've been hit by a SIGSTP? This
idea of a license being released under these circumstances just seems
very unlikely.
On 10/15/19 1:57 PM, Brian Andrus wrote:
> It seems that there are some details that would need addressed.
>
> A suspend signal is nothing mo
On 10/8/19 1:47 AM, Kota Tsuyuzaki wrote:
> GPU is running as well as gres gpu:1. And more, the NVIDIA docs looks to
> describe what I hit
> (https://docs.nvidia.com/deploy/mps/index.html#topic_4_3). That seems like
> the mps-server will be created to each user and the
> server will be running
On 9/19/19 8:22 AM, Thomas M. Payerle wrote:
> one of our clusters
> is still running RHEL6, and while containers based on Ubuntu 16,
> Debian 8, or RHEL7 all appear to work properly,
> containers based on Ubuntu 18 or Debian 9 will die with "Kernel too
> old" errors.
I think the idea generally is
On 8/29/19 9:38 AM, Jarno van der Kolk wrote:
> Here's an example on how to do so from the Compute Canada docs:
> https://docs.computecanada.ca/wiki/GNU_Parallel#Running_on_Multiple_Nodes
>
[name@server ~]$ parallel --jobs 32 --sshloginfile
./node_list_${SLURM_JOB_ID} --env MY_VARIABLE --workdir
On 8/27/19 11:47 AM, Brian Andrus wrote:
> 1) If you can, either use xargs or parallel to do the forking so you can
> limit the number of simultaneous submissions
>
Sorry if this is a naive question, but I'm not following how you would
use parallel with Slurm (unless you're talking about using
es/hl/c/ptExampleFL.c
> /usr/share/doc/hdf5/examples/hl/c/run-hlc-ex.sh
> /usr/share/doc/hdf5/examples/hl/c++/ptExampleFL.cpp
> /usr/share/doc/hdf5/examples/hl/c++/run-hlc++-ex.sh
> /usr/share/doc/hdf5/examples/hl/fortran/ex_ds1.f90
> /usr/share/doc/hdf5/examples/hl/fortran/
Sudo is more flexible than than; for example you can just give the
slurmduser sudo access to the chown command and nothing else.
On 7/8/19 11:37 AM, Daniel Torregrosa wrote:
> You are right. The critical part I was missing is that chown does not
> work without sudo.
>
> I assume this can be fix
There are 2 kinds of system admins: can do and can't do. You're a can
do; his are can't do.
On 3/21/19 10:26 AM, Prentice Bisbal wrote:
>
> On 3/20/19 1:58 PM, Christopher Samuel wrote:
>> On 3/20/19 4:20 AM, Frava wrote:
>>
>>> Hi Chris, thank you for the reply.
>>> The team that manages that
But rsync -a will only help you if people are using identical or at
least overlapping data sets? And you don't need rsync to prune out old
files.
On 2/26/19 1:53 AM, Janne Blomqvist wrote:
> On 22/02/2019 18.50, Will Dennis wrote:
>> Hi folks,
>>
>> Not directly Slurm-related, but... We have a
>Date: Sun, 23 Dec 2018 19:45:08 -0800
>From: Kurt H Maier
>To: Slurm User Community List
>Subject: Re: [slurm-users] Unable to locate HDF5 compilation helper
> scripts 'h5cc' or 'h5pcc'.
>Message-ID: <20181224034508.GA56809@wopr>
>Content-Type: text/plain; charset=us-ascii
>
>On Mon, Dec
I'm a little confused about how this would work. For example, where
does slurmctld run? And if on each submit host, why aren't the control
daemons stepping all over each other?
On 11/22/18 6:38 AM, Stu Midgley wrote:
> indeed.
>
> All our workstations are submit hosts and in the queue, so peo
24 matches
Mail list logo