Re: [slurm-users] Header lengths are longer than data received after changing SelectType & GresTypes to use MPS

2020-04-08 Thread Davide Vanzo
As I said at the beginning, I have never played with MPS, so my answer is based only on what the Slurm documentation shows. Apparently MPS does not require NVML, hence you can avoid setting AutoDetect and instead list the GPU resources in the gres.conf file old style. That should help you to get

Re: [slurm-users] Header lengths are longer than data received after changing SelectType & GresTypes to use MPS

2020-04-08 Thread Robert Kudyba
> > > use yum install slurm20, here they show Slurm 19 but it's the same for 20 > > In that case you'll need to open a bug with Bright to get them to > rebuild Slurm with nvml support. They told me they don't officially support MPS nor Slurm and to come here to get support (or pay SchedMD). The

Re: [slurm-users] Header lengths are longer than data received after changing SelectType & GresTypes to use MPS

2020-04-08 Thread Christopher Samuel
On 4/8/20 12:17 PM, Robert Kudyba wrote: As I wrote we use Bright Cluster on CentOS 7.7. So we just follow their instructions  to use yum install slurm20, here they show Slurm 19 but it's the same for 20 In th

Re: [slurm-users] Header lengths are longer than data received after changing SelectType & GresTypes to use MPS

2020-04-08 Thread Davide Vanzo
Robert, There is a pretty good consensus here that the RPM that Bright is providing do not support NVML. If you need this function and you do not want to attempt building your own RPM on a node with the Nvidia drivers installed, have you considered contacting the Bright support? This would be t

Re: [slurm-users] Header lengths are longer than data received after changing SelectType & GresTypes to use MPS

2020-04-08 Thread Robert Kudyba
> > > and the NVIDIA Management Library (NVML) is installed on the node and >> > was found during Slurm configuration >> >> That's the key phrase - when whoever compiled Slurm ran ./configure >> *before* compilation it was on a system without the nvidia libraries and >> headers present, so Slurm co

Re: [slurm-users] [External] Question about partition and node allocation

2020-04-08 Thread Renata Maria Dart
Thanks Prentice, I will take a look at QOS, I'm not completely clear on it. Renata On Wed, 8 Apr 2020, Prentice Bisbal wrote: > I believe you can do this with QOS, by assigning group limits to the QOS. > > Prentice > > > On 4/8/20 1:38 PM, Renata Maria Dart wrote: >> Hi, is there a way to specif

Re: [slurm-users] Header lengths are longer than data received after changing SelectType & GresTypes to use MPS

2020-04-08 Thread Christopher Samuel
Hi Robert, On 4/8/20 7:08 AM, Robert Kudyba wrote: and the NVIDIA Management Library (NVML) is installed on the node and was found during Slurm configuration That's the key phrase - when whoever compiled Slurm ran ./configure *before* compilation it was on a system without the nvidia librari

Re: [slurm-users] [External] Question about partition and node allocation

2020-04-08 Thread Prentice Bisbal
I believe you can do this with QOS, by assigning group limits to the QOS. Prentice On 4/8/20 1:38 PM, Renata Maria Dart wrote: Hi, is there a way to specify a certain number of nodes for a partition without specifying exactly which nodes to use? For instance, if I have 100 hosts and would lik

[slurm-users] Question about partition and node allocation

2020-04-08 Thread Renata Maria Dart
Hi, is there a way to specify a certain number of nodes for a partition without specifying exactly which nodes to use? For instance, if I have 100 hosts and would like two partitions, a shared which includes all of the 100, and a second partition which should have preemption over 20 of the 100. D

Re: [slurm-users] Header lengths are longer than data received after changing SelectType & GresTypes to use MPS

2020-04-08 Thread Robert Kudyba
On Wed, Apr 8, 2020 at 9:34 AM wrote: > I believe in order to compile for nvml you'll have to compile on a system > with an Nvidia gpu installed otherwise the Nvidia driver and libraries > won't install on that system. > Yes our 3 compute nodes have 1 V100 each. So I can run: ssh node001 Last lo

Re: [slurm-users] Header lengths are longer than data received after changing SelectType & GresTypes to use MPS

2020-04-08 Thread Robert Kudyba
On Wed, Apr 8, 2020 at 10:23 AM Eric Berquist wrote: > I just ran into this issue. Specifically, SLURM looks for the NVML header > file, which comes with CUDA or DCGM, in addition to the library that comes > with the drivers. The check is at > https://github.com/SchedMD/slurm/blob/a763a008b770032

Re: [slurm-users] Header lengths are longer than data received after changing SelectType & GresTypes to use MPS

2020-04-08 Thread Chris Samuel
On 8/4/20 7:20 am, Eric Berquist wrote: Once you’ve built SLURM, it’s enough to just have the GPU drivers on the nodes where SLURM will be installed. Yeah I checked that at the Slurm User Group - slurmd will try and dlopen() the required libraries and should gracefully deal with them not bei

Re: [slurm-users] Header lengths are longer than data received after changing SelectType & GresTypes to use MPS

2020-04-08 Thread Eric Berquist
I just ran into this issue. Specifically, SLURM looks for the NVML header file, which comes with CUDA or DCGM, in addition to the library that comes with the drivers. The check is at https://github.com/SchedMD/slurm/blob/a763a008b7700321b51aad2e619deab00638a379/auxdir/x_ac_nvml.m4#L32. Once you

Re: [slurm-users] Header lengths are longer than data received after changing SelectType & GresTypes to use MPS

2020-04-08 Thread dean.w.schulze
I believe in order to compile for nvml you'll have to compile on a system with an Nvidia gpu installed otherwise the Nvidia driver and libraries won't install on that system. -Original Message- From: slurm-users On Behalf Of Christopher Samuel Sent: Tuesday, April 7, 2020 10:08 PM To:

Re: [slurm-users] Error buildind rpm on Centos 7

2020-04-08 Thread Alfonso Núñez Slagado
Thanks guys, it took me a while to check the solutions you proposed and both of them works. The mariadb downgrade is a bit tricky using "rpm -e --nodeps" and the solution Ole proposed keep the system updated to the MariaDB 10.4. @Ole, thanks for the guide, is really usefull Alfonso El 7/4/20