date:20191211

Re: [slurm-users] Need help with controller issues

2019-12-11 Thread Chris Samuel

On 11/12/19 11:31 am, Eli V wrote: Look for libmariadb-client. That's needed for slurmdbd on debian. Looking at the output from building some Slurm 19.05.4 RPMs earlier tonight, this is what I see in the output of configure: [...] checking for mysql_config... /usr/bin/mysql_config MySQL 10.

Re: [slurm-users] Is that possible to submit jobs to a Slurm cluster right from a developer's PC

2019-12-11 Thread Ryan Novosielski

Sure; they’ll need to have the appropriate part of SLURM installed and the config file. This is similar to having just one login node per user. Typically login nodes don’t run either daemon. -- || \\UTGERS, |---*O*--- ||_// the State

[slurm-users] Is that possible to submit jobs to a Slurm cluster right from a developer's PC

2019-12-11 Thread Victor (Weikai) Xie

Hi, We are trying to setup a tiny Slurm cluster to manage shared access to the GPU server in our team. Both slurmctld and slumrd are going to run on this GPU server. But here is a problem. On one hand, we don't want to give developers ssh access to that box, because otherwise they might bypass Slu

Re: [slurm-users] Need help with controller issues

2019-12-11 Thread dean.w.schulze

Is that logged somewhere or do I need to capture the output from the make command to a file? -Original Message- From: slurm-users On Behalf Of Kurt H Maier Sent: Wednesday, December 11, 2019 6:32 PM To: Slurm User Community List Subject: Re: [slurm-users] Need help with controller issues

Re: [slurm-users] cleanup script after timeout

2019-12-11 Thread Brian Andrus

You prompted me to dig even deeper into my epilog. I was trying to access a semaphore file in the user's home directory. It seems that when the epilogue is run the ~ is not expanded in anyway. So I can't even use ~${SLURM_JOB_USER} to access their semaphore file. Potentially problematic for a

Re: [slurm-users] Need help with controller issues

2019-12-11 Thread Kurt H Maier

On Wed, Dec 11, 2019 at 04:04:44PM -0700, Dean Schulze wrote: > I tried again with a completely new system (virtual machine). I used the > latest source, I used mysql instead of mariadb, and I installed all the > client and dev libs (below). I still get the same error. It doesn't > build the /us

Re: [slurm-users] Lua jobsubmit plugin for cons_tres ?

2019-12-11 Thread Renfro, Michael

Snapshot of a job_submit.lua we use to automatically to route jobs to a GPU partition if the user asks for a GPU: https://gist.github.com/mikerenfro/92d70562f9bb3f721ad1b221a1356de5 All our users just use srun or sbatch with a default queue, and the plugin handles it from there. There’s more de

Re: [slurm-users] cleanup script after timeout

2019-12-11 Thread Juergen Salk

Hi Brian, can you maybe elaborate on how exactly you verified that your epilog does not run when a job exceeds it's walltime limit? Does it run when the jobs end normally or when a running job is cancelled by the user? I am asking because in our environment the epilog also runs when a job hits

Re: [slurm-users] Need help with controller issues

2019-12-11 Thread Dean Schulze

I tried again with a completely new system (virtual machine). I used the latest source, I used mysql instead of mariadb, and I installed all the client and dev libs (below). I still get the same error. It doesn't build the /usr/lib/slurm/accounting_storage_mysql.so file. Could the ./configure c

Re: [slurm-users] Need help with controller issues

2019-12-11 Thread Eli V

Look for libmariadb-client. That's needed for slurmdbd on debian. On Wed, Dec 11, 2019 at 11:43 AM Dean Schulze wrote: > > Turns out I've already got libmariadb-dev installed: > > $ dpkg -l | grep maria > ii libmariadb-dev 3.0.3-1build1 >

[slurm-users] cleanup script after timeout

2019-12-11 Thread Brian Andrus

All, So I have verified that the Epilog script is NOT run for any job that times out. Even though in the documentation ( https://slurm.schedmd.com/prolog_epilog.html), it states "At job termination" I guess timeouts are not considered terminated?? So, is there a recommended way to have a cleanup s

Re: [slurm-users] Lua jobsubmit plugin for cons_tres ?

2019-12-11 Thread Paul Edmon

We do this via looking at gres. The info is in the job_desc.gres variable. We basically do the inverse where we ensure some one is asking for the gpu before allowing them to submit to a gpu partition. -Paul Edmon- On 12/11/2019 12:32 PM, Grigory Shamov wrote: Hi All, I am trying the newest

[slurm-users] Lua jobsubmit plugin for cons_tres ?

2019-12-11 Thread Grigory Shamov

Hi All, I am trying the newest SLURM 19.05 and its new cons_tres plugin. Is there a way to handle its new GPU options in Lua job submit plugin? That is, something like "detect if a job has ‹gpus-per-node, assign it to a GPU partition"? Thank you very much in advance! -- Grigory Shamov WestGrid

Re: [slurm-users] Need help with controller issues

2019-12-11 Thread Dean Schulze

Turns out I've already got libmariadb-dev installed: $ dpkg -l | grep maria ii libmariadb-dev 3.0.3-1build1 amd64MariaDB Connector/C, development files ii libmariadb3:amd64 3.0.3-1build1 amd64

[slurm-users] Slurm 18.08.8 --mem-per-cpu + --exclusive = strange behavior

2019-12-11 Thread Beatrice Charton

Hi, We have a strange behaviour of Slurm after updating from 18.08.7 to 18.08.8, for jobs using --exclusive and --mem-per-cpu. Our nodes have 128GB of memory, 28 cores. $ srun --mem-per-cpu=3 -n 1 --exclusive hostname => works in 18.08.7 => doesn’t work in 18.08.8 In 18.08.8 : -

Re: [slurm-users] Need help with controller issues

2019-12-11 Thread Dean Schulze

These are the packages I installed prior to building slurm: libmariadb-client-lgpl-dev libmysqlclient-dev mariadb-server This installs mariadb 10.1.43 which is old. On the Ubuntu site (https://packages.ubuntu.com/search?keywords=mariadb) there's a package called libmariadb-dev Maybe this is th

Re: [slurm-users] Multi-node job failure

2019-12-11 Thread Chris Woelkers - NOAA Federal

Partial progress. The scientist that developed the model took a look at the output and found that instead of one model run being ran in parallel srun had ran multiple instances of the model, one per thread, which for this test was 110 threads. I have a feeling this just verified the same thing that

Re: [slurm-users] Multi-node job failure

2019-12-11 Thread Chris Woelkers - NOAA Federal

I tried a simple thing of swapping out mpirun in the sbatch script for srun. Nothing more, nothing less. The model is now working on at least two nodes, I will have to test again on more but this is progress. Thanks, Chris Woelkers IT Specialist National Oceanic and Atmospheric Agency Great Lakes

Re: [slurm-users] Multi-node job failure

2019-12-11 Thread Chris Woelkers - NOAA Federal

Thanks all for the ideas and possibilities. I will answer all in turn. Paul: Neither of the switches in use, Ethernet and Infiniband, have any form of broadcast storm protection enabled. Chris: I have passed on your question to the scientist that created the sbatch script. I will also look into o

Re: [slurm-users] SLURM_TMPDIR

2019-12-11 Thread Tina Friedrich

Hi Angelines, I use a plugin for that - I believe this one https://github.com/hpc2n/spank-private-tmp which sort of does it all; your job sees an (empty) /tmp/. (It doesn't do cleanup, I simply rely on OS cleaning up /tmp, at the moment.) Tina On 05/12/2019 15:57, Angelines wrote: > Hello, >

Re: [slurm-users] Multi-node job failure

2019-12-11 Thread Zacarias Benta

I had a simmilar issue, please check if the home drive, or the place the data should be stored is mounted on the nodes. On Tue, 2019-12-10 at 14:49 -0500, Chris Woelkers - NOAA Federal wrote: > I have a 16 node HPC that is in the process of being upgraded from > CentOS 6 to 7. All nodes are diskles

Re: [slurm-users] Need help with controller issues

Re: [slurm-users] Is that possible to submit jobs to a Slurm cluster right from a developer's PC

[slurm-users] Is that possible to submit jobs to a Slurm cluster right from a developer's PC

Re: [slurm-users] Need help with controller issues

Re: [slurm-users] cleanup script after timeout

Re: [slurm-users] Need help with controller issues

Re: [slurm-users] Lua jobsubmit plugin for cons_tres ?

Re: [slurm-users] cleanup script after timeout

Re: [slurm-users] Need help with controller issues

Re: [slurm-users] Need help with controller issues

[slurm-users] cleanup script after timeout

Re: [slurm-users] Lua jobsubmit plugin for cons_tres ?

[slurm-users] Lua jobsubmit plugin for cons_tres ?

Re: [slurm-users] Need help with controller issues

[slurm-users] Slurm 18.08.8 --mem-per-cpu + --exclusive = strange behavior

Re: [slurm-users] Need help with controller issues

Re: [slurm-users] Multi-node job failure

Re: [slurm-users] Multi-node job failure

Re: [slurm-users] Multi-node job failure

Re: [slurm-users] SLURM_TMPDIR

Re: [slurm-users] Multi-node job failure

21 matches

Site Navigation

Mail list logo

Footer information