If you use qemu-kvm beware: qemu-kvm doesn't allow communication of
virtual machines with the host, therefore your slurm servers must be
all virtual machines.
On Wed, 2021-07-28 at 13:55 +1000, Sid Young wrote:
> Why not spin them up as Virtual machines... then you could build real
> (separate) cl
On Wed, 2021-06-02 at 22:11 -0700, Ahmad Khalifa wrote:
> How to send a job to a particular gpu card using its ID
> (0,1,2...etc)?
If your GPUs are CUDA I can't help but, if you have OpenCL GPUs then
your program can select a GPU with a call to getDeviceIDs() and select
the GPU by number.
Starting
It is now possible for programs to do a precise and reliable selection
of the GPU by first issuing a query to OpenCL with the
clGetDeviceInfo() function with the param_name parameter set to
cl_khr_pci_bus_info. This extension is available starting from OpenCL
3.0.7
References:
-
https://github.c
y serial number using
the rocm-smi interface, this approach is much more reliable than using
device ordinals:
https://rocmdocs.amd.com/en/latest/ROCm_System_Managment/ROCm-SMI-CLI.html?highlight=showuniqueid
> -Original Message-
> From: slurm-users On Behalf
> Of Valerio Bellizz
pen source
> components or layers.
>
> Gareth
>
> -----Original Message-
> From: slurm-users On Behalf
> Of Valerio Bellizzomi
> Sent: Thursday, 6 May 2021 5:21 PM
> To: slurm-users@lists.schedmd.com
> Subject: Re: [slurm-users] CUDA vs OpenCL
>
> On Wed, 2021-0
On Wed, 2021-04-28 at 10:56 +0200, Valerio Bellizzomi wrote:
> Greetings,
> I see here https://slurm.schedmd.com/gres.html#GPU_Management that
> CUDA_VISIBLE_DEVICES is available for NVIDIA GPUs, what about OpenCL
> GPUs?
>
> Is there an OPENCL_VISIBLE_DEVICES ?
>
>
Greetings,
I see here https://slurm.schedmd.com/gres.html#GPU_Management that
CUDA_VISIBLE_DEVICES is available for NVIDIA GPUs, what about OpenCL
GPUs?
Is there an OPENCL_VISIBLE_DEVICES ?
--
Valerio Bellizzomi
https://www.selroc.systems
http://www.selnet.org
On Fri, 2020-11-06 at 13:00 +0100, Diego Zuccato wrote:
> Il 04/11/20 19:12, Brian Andrus ha scritto:
>
> > One thing you will start finding in HPC is that, by it's goal,
> > hyperthreading is usually a poor fit.
> Depends on many factors, but our tests confirm it can do much good!
>
> > If you a
On Sun, 2019-06-30 at 18:15 -0700, Chris Samuel wrote:
> On Saturday, 29 June 2019 10:33:50 AM PDT Valerio Bellizzomi wrote:
>
> > no I am using the option --unbuffered to watch the output in a terminal
> > window.
>
> I don't think this is a Slurm issue, you
n to a location only accessible from the
> compute node running your job? You might be able to ssh from the submit host
> to the compute node (or maybe from your local computer to the compute node).
>
> > On Jun 29, 2019, at 10:07 AM, Valerio Bellizzomi wrote:
> >
> >
On Sat, 2019-06-29 at 07:57 -0700, Brian Andrus wrote:
> I believe you are referring to an interactive terminal window.
>
> You can do that with srun --pty bash
>
> Windows themselves are not handled by slurm at all. To have multiple
> windows is a function of your workstation. You would need mu
On Sat, 2019-06-29 at 16:48 +0200, Valerio Bellizzomi wrote:
> On Sat, 2019-06-29 at 07:36 -0700, Brian Andrus wrote:
> > A little more details of what you are trying to do would help.
> >
> > multiple srun statements with --pty options will spawn multiple
> > termin
it will create a terminal within a terminal.
>
> So, I would ask: what are you trying to do and we may be able to advise
> the best way to accomplish it.
>
> Brian Andrus
>
> On 6/29/2019 12:53 AM, Valerio Bellizzomi wrote:
> > How it gets done normally ?
> >
>
How it gets done normally ?
On Fri, 2019-06-28 at 09:39 +0200, Ole Holm Nielsen wrote:
> On 6/28/19 9:18 AM, Valerio Bellizzomi wrote:
> > On Fri, 2019-06-28 at 08:51 +0200, Valerio Bellizzomi wrote:
> >> On Thu, 2019-06-27 at 18:35 +0200, Valerio Bellizzomi wrote:
> >>> The nodes are now commun
On Fri, 2019-06-28 at 09:39 +0200, Ole Holm Nielsen wrote:
> On 6/28/19 9:18 AM, Valerio Bellizzomi wrote:
> > On Fri, 2019-06-28 at 08:51 +0200, Valerio Bellizzomi wrote:
> >> On Thu, 2019-06-27 at 18:35 +0200, Valerio Bellizzomi wrote:
> >>> The nodes are now commun
On Fri, 2019-06-28 at 09:39 +0200, Ole Holm Nielsen wrote:
> On 6/28/19 9:18 AM, Valerio Bellizzomi wrote:
> > On Fri, 2019-06-28 at 08:51 +0200, Valerio Bellizzomi wrote:
> >> On Thu, 2019-06-27 at 18:35 +0200, Valerio Bellizzomi wrote:
> >>> The nodes are now commun
On Fri, 2019-06-28 at 08:51 +0200, Valerio Bellizzomi wrote:
> On Thu, 2019-06-27 at 18:35 +0200, Valerio Bellizzomi wrote:
> > The nodes are now communicating however when I run the command
> >
> > srun -w compute02 /bin/ls
> >
> > it remains stuck and there i
On Thu, 2019-06-27 at 18:35 +0200, Valerio Bellizzomi wrote:
> The nodes are now communicating however when I run the command
>
> srun -w compute02 /bin/ls
>
> it remains stuck and there is no output on the submit machine.
>
> on the compute02 there is a Communicat
The nodes are now communicating however when I run the command
srun -w compute02 /bin/ls
it remains stuck and there is no output on the submit machine.
on the compute02 there is a Communication error and Timeout.
the network ports 6817 and 6818 are open.
> On 19-06-27 15:33, Valerio Bellizzomi wrote:
> > hello, my node has 2 gpus so I have specified gres=gpus:2 but the
> > scontrol show node displays this:
> >
> > State=IDLE+DRAIN
> > Reason=gres/gpus count too low (1 < 2)
> >
> >
> >
> >
> >
>
On Thu, 2019-06-27 at 15:33 +0200, Valerio Bellizzomi wrote:
> hello, my node has 2 gpus so I have specified gres=gpus:2 but the
> scontrol show node displays this:
>
> State=IDLE+DRAIN
> Reason=gres/gpus count too low (1 < 2)
Also, the node is repeating a debug message:
deb
hello, my node has 2 gpus so I have specified gres=gpus:2 but the
scontrol show node displays this:
State=IDLE+DRAIN
Reason=gres/gpus count too low (1 < 2)
On Wed, 2019-06-26 at 08:23 +0200, Marcus Wagner wrote:
> Have you restarted munge on all hosts?
Now it works, thanks.
>
> On 6/25/19 4:38 PM, Valerio Bellizzomi wrote:
> > On Tue, 2019-06-25 at 16:32 +0200, Valerio Bellizzomi wrote:
> >> On Tue, 2019-06-25 at 08:48 -040
On Tue, 2019-06-25 at 16:32 +0200, Valerio Bellizzomi wrote:
> On Tue, 2019-06-25 at 08:48 -0400, Eli V wrote:
> > My first guess would be that the host is not listed as one of the two
> > controllers in the slurm.conf. Also, keep in mind munge, and thus
> > slurm is very
urmd on the compute node refuses to
connect to the controller with this error: Protocol authentication error
>
>
> On Tue, Jun 25, 2019 at 1:50 AM Valerio Bellizzomi wrote:
> >
> > I have installed slurmctld on Debian Testing, trying to start the daemon
> > by hand:
I have installed slurmctld on Debian Testing, trying to start the daemon
by hand:
# /usr/sbin/slurmctld -D -v -f /etc/slurm-llnl/slurm.conf
slurmctld: error: High latency for 1000 calls to gettimeofday(): 2072
microseconds
slurmctld: pidfile not locked, assuming no running daemon
slurmctld: slu
Hello,
I have a recurring error in the log of slurmctld:
[2018-04-10T19:32:40.145] error: _unpack_ret_list: message type 24949,
record 0 of 56214
[2018-04-10T19:32:40.145] error: invalid type trying to be freed 24949
[2018-04-10T19:32:40.145] error: unpacking header
[2018-04-10T19:32:40.145] erro
28 matches
Mail list logo