Re: [slurm-users] GPU utilization of running jobs

2022-10-19 Thread Relu Patrascu
Look up CUDA_​DEVICE_​ORDER and PCI_BUS_ID Sent from my iPhone > On Oct 19, 2022, at 04:33, Vecerka Daniel wrote: >

Re: [slurm-users] How to avoid a feature?

2021-07-06 Thread Relu Patrascu
We have had a similar problem, even with different partitions for CPU and GPU nodes, people still submitted jobs to the GPU nodes, and we suspected running CPU type jobs. Doesn't help to look for the missing --gres=gpu:x because a user can ask for GPUs and simply not use them. We thought of get

Re: [slurm-users] Exclude Slurm packages from the EPEL yum repository

2021-02-03 Thread Relu Patrascu
On 2021-02-03 10:32, Brian Andrus wrote: Wow, This is getting so ridiculous that my email program has started putting this thread in junk... The spirit of Linux is to give you the tools so you can do things how you want. You have the tools, do not expect someone else to come over and plan

Re: [slurm-users] Reserve some cores per GPU

2020-10-20 Thread Relu Patrascu
d on a 'max cores per gpu' property. The node names are appended to the job desc exc_nodes property. It's not particularly elegant but it does work quite well for us. Aaron On 20 October 2020 at 18:17 BST, Relu Patrascu wrote: Hi all, We have a GPU cluster and have run into

[slurm-users] Reserve some cores per GPU

2020-10-20 Thread Relu Patrascu
Hi all, We have a GPU cluster and have run into this issue occasionally. Assume four GPUs per node; when a user requests a GPU on such a node, and all the cores, or all the RAM, the other three GPUs will be wasted for the duration of the job, as slurm has no more cores or RAM available to all

Re: [slurm-users] CUDA environment variable not being set

2020-10-08 Thread Relu Patrascu
| +-+ -- -SS- *From:* slurm-users *On Behalf Of *Relu Patrascu *Sent:* Thursday, October 8, 2020 4:26 PM *To:* slurm-users@lists.schedmd.com *Subject:* Re: [slurm-users] CUDA environment variable not

Re: [slurm-users] CUDA environment variable not being set

2020-10-08 Thread Relu Patrascu
That usually means you don't have the nvidia kernel module loaded, probably because there's no driver installed. Relu On 2020-10-08 14:57, Sajesh Singh wrote: Slurm 18.08 CentOS 7.7.1908 I have 2 M500 GPUs in a compute node which is defined in the slurm.conf and gres.conf of the cluster, b

Re: [slurm-users] Auto-select partition?

2020-10-01 Thread Relu Patrascu
Besides having a separate partition for each type of node, you can also have a partition which includes all the nodes, and use the Default=yes option in its definition. From the docs: 'Default     If this keyword is set, jobs submitted without a partition specification will utilize this p

Re: [slurm-users] Running gpu and cpu jobs on the same node

2020-09-30 Thread Relu Patrascu
If you don't use**OverSubscribe then resources are not shared. What resources a job gets allocated is not available to other jobs, regardless of partition. Relu ** On 2020-09-30 16:12, Ahmad Khalifa wrote: I have a machine with 4 rtx2080ti and a core i9. I submit jobs to it through MPI PMI2 (

Re: [slurm-users] How to contact slurm developers

2020-09-30 Thread Relu Patrascu
search Computing - MSB C630, Newark `' On Sep 30, 2020, at 11:29, Relu Patrascu mailto:r...@cs.toronto.edu>> wrote:  Thanks Ryan, I'll try the bugs site. And indeed, one person in our organization has already said "let's pay for support, maybe they&

Re: [slurm-users] How to contact slurm developers

2020-09-30 Thread Relu Patrascu
--- ||_// the State     | Ryan Novosielski - novos...@rutgers.edu <mailto:novos...@rutgers.edu> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus ||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark `' On Sep 30, 2020

[slurm-users] How to contact slurm developers

2020-09-30 Thread Relu Patrascu
Hi all, I posted recently on this mailing list a feature request and got no reply from the developers. Is there a better way to contact the slurm developers or we should just accept that they are not interested in community feedback? Regards, Relu

Re: [slurm-users] Features request

2020-09-25 Thread Relu Patrascu
Thank you for your ideas Diego. On 2020-09-25 02:20, Diego Zuccato wrote: Il 25/09/20 00:04, Relu Patrascu ha scritto: 1. Allow preemption in the same QOS, all else being equal, based on job priority. You'd risk having jobs continuously preempted by jobs that have been in queue for

[slurm-users] Features request

2020-09-24 Thread Relu Patrascu
Hello all, We're mostly a GPU compute shop, and we've been happy with slurm for the last three years, but we think slurm would benefit from the following two features: 1. Allow preemption in the same QOS, all else being equal, based on job priority. 2. Job size calculation to take into acc

Re: [slurm-users] slurmdbd crashes with segmentation fault following DBD_GET_ASSOCS

2020-05-11 Thread Relu Patrascu
We've experienced the same problem on several versions of slurmdbd (18, 19) so we downgraded mysql and put a hold on the package. Hey Dustin, funny we meet here :) Relu On Tue, May 5, 2020 at 3:43 PM Dustin Lang wrote: > > I tried upgrading Slurm to 18.08.9 and I am still getting this Segmentati

Re: [slurm-users] Preemption within same QOS

2020-03-09 Thread Relu Patrascu
We received no replies, so we solved the problem in house by writing a simple plugin based on the qos priority plugin. On Wed, Jan 22, 2020 at 2:50 PM Relu Patrascu wrote: > We're having a bit of a problem setting up slurm to achieve this: > > 1. Two QOSs, 'high' and &

[slurm-users] Preemption within same QOS

2020-01-22 Thread Relu Patrascu
We're having a bit of a problem setting up slurm to achieve this: 1. Two QOSs, 'high' and 'normal'. 2. Preemption type: requeue. 3. Any job has a guarantee of running 60 minutes before being preempted. 4. Any job submitted with --qos=high can preempt jobs with --qos=normal if no resources availabl