Look up CUDA_DEVICE_ORDER and PCI_BUS_ID
Sent from my iPhone
> On Oct 19, 2022, at 04:33, Vecerka Daniel wrote:
>
We have had a similar problem, even with different partitions for CPU
and GPU nodes, people still submitted jobs to the GPU nodes, and we
suspected running CPU type jobs. Doesn't help to look for the missing
--gres=gpu:x because a user can ask for GPUs and simply not use them. We
thought of get
On 2021-02-03 10:32, Brian Andrus wrote:
Wow,
This is getting so ridiculous that my email program has started
putting this thread in junk...
The spirit of Linux is to give you the tools so you can do things how
you want. You have the tools, do not expect someone else to come over
and plan
d on a 'max cores per gpu' property. The
node names are appended to the job desc exc_nodes property.
It's not particularly elegant but it does work quite well for us.
Aaron
On 20 October 2020 at 18:17 BST, Relu Patrascu wrote:
Hi all,
We have a GPU cluster and have run into
Hi all,
We have a GPU cluster and have run into this issue occasionally. Assume
four GPUs per node; when a user requests a GPU on such a node, and all
the cores, or all the RAM, the other three GPUs will be wasted for the
duration of the job, as slurm has no more cores or RAM available to
all
|
+-+
--
-SS-
*From:* slurm-users *On Behalf
Of *Relu Patrascu
*Sent:* Thursday, October 8, 2020 4:26 PM
*To:* slurm-users@lists.schedmd.com
*Subject:* Re: [slurm-users] CUDA environment variable not
That usually means you don't have the nvidia kernel module loaded,
probably because there's no driver installed.
Relu
On 2020-10-08 14:57, Sajesh Singh wrote:
Slurm 18.08
CentOS 7.7.1908
I have 2 M500 GPUs in a compute node which is defined in the
slurm.conf and gres.conf of the cluster, b
Besides having a separate partition for each type of node, you can also
have a partition which includes all the nodes, and use the Default=yes
option in its definition.
From the docs:
'Default
If this keyword is set, jobs submitted without a partition
specification will utilize this p
If you don't use**OverSubscribe then resources are not shared. What
resources a job gets allocated is not available to other jobs,
regardless of partition.
Relu
**
On 2020-09-30 16:12, Ahmad Khalifa wrote:
I have a machine with 4 rtx2080ti and a core i9. I submit jobs to it
through MPI PMI2 (
search Computing - MSB
C630, Newark
`'
On Sep 30, 2020, at 11:29, Relu Patrascu mailto:r...@cs.toronto.edu>> wrote:
Thanks Ryan, I'll try the bugs site. And indeed, one person in our
organization has already said "let's pay for support, maybe
they&
---
||_// the State | Ryan Novosielski - novos...@rutgers.edu
<mailto:novos...@rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS
Campus
|| \\ of NJ | Office of Advanced Research Computing - MSB
C630, Newark
`'
On Sep 30, 2020
Hi all,
I posted recently on this mailing list a feature request and got no
reply from the developers. Is there a better way to contact the slurm
developers or we should just accept that they are not interested in
community feedback?
Regards,
Relu
Thank you for your ideas Diego.
On 2020-09-25 02:20, Diego Zuccato wrote:
Il 25/09/20 00:04, Relu Patrascu ha scritto:
1. Allow preemption in the same QOS, all else being equal, based on job
priority.
You'd risk having jobs continuously preempted by jobs that have been in
queue for
Hello all,
We're mostly a GPU compute shop, and we've been happy with slurm for the
last three years, but we think slurm would benefit from the following
two features:
1. Allow preemption in the same QOS, all else being equal, based on job
priority.
2. Job size calculation to take into acc
We've experienced the same problem on several versions of slurmdbd
(18, 19) so we downgraded mysql and put a hold on the package.
Hey Dustin, funny we meet here :)
Relu
On Tue, May 5, 2020 at 3:43 PM Dustin Lang wrote:
>
> I tried upgrading Slurm to 18.08.9 and I am still getting this Segmentati
We received no replies, so we solved the problem in house by writing a
simple plugin based on the qos priority plugin.
On Wed, Jan 22, 2020 at 2:50 PM Relu Patrascu
wrote:
> We're having a bit of a problem setting up slurm to achieve this:
>
> 1. Two QOSs, 'high' and &
We're having a bit of a problem setting up slurm to achieve this:
1. Two QOSs, 'high' and 'normal'.
2. Preemption type: requeue.
3. Any job has a guarantee of running 60 minutes before being preempted.
4. Any job submitted with --qos=high can preempt jobs with --qos=normal if
no resources availabl
17 matches
Mail list logo