Hi Erik,
Erik Bryer writes:
> Thanks for your reply. I can't find NVML in the logs going back to
> 11/22. dmesg goes back to the last boot, but has no mention of
> NVML. Regarding make one up on my own, how does Slurm know string
> "xyzzy" corresponds to a tesla gpu, e.g.?
As I understand it, S
Thanks for your reply. I can't find NVML in the logs going back to 11/22. dmesg
goes back to the last boot, but has no mention of NVML. Regarding make one up
on my own, how does Slurm know string "xyzzy" corresponds to a tesla gpu, e.g.?
Thanks,
Erik
From: slurm-
On 12/14/20 11:20 pm, Alpha Experiment wrote:
It is called using the following submission script:
#!/bin/bash
#SBATCH --partition=full
#SBATCH --job-name="Large"
source testenv1/bin/activate
python3 multithread_example.py
You're not asking for a number of cores, so you'll likely only be
getti
The times for the two runs suggest that the version run through slurm is using
only one core.
Best – Don
Don Krieger, PhD
Research Scientist
Department of Neurological Surgery
University of Pittsburgh
From: slurm-users On Behalf Of
Williams, Gareth (IM&T, Black Mountain)
Sent: Tuesday, Decemb
Hi John,
I’ll volunteer an opinion. There are circumstances where slurm could contribute
to slower overall times for tasks, as slurm can be configured to do pre-job
setup and post-job followup (prologue/epilogue). However, you are reporting
within-task timing, not overall timing so this is besi
Hey Sajesh,
Each public cloud vendor provides a standard way to create a virtual
private network in their infrastructure and connect that private network to
your existing private network for your cluster. The devil is in the
networking details.
So in that case, you can just treat it as a new rac
you can either make them up on your own or they get spit out by NVML
in the slurmd.log file
On Tue, Dec 15, 2020 at 12:55 PM Erik Bryer wrote:
>
> Hi,
>
> Where do I get the gres names, e.g. "rtx2080ti", to use for my gpus in my
> node definitions in slurm.conf?
>
> Thanks,
> Erik
Brian,
Thank you for the info. Will definitely keep you recommendations handy while
putting this together.
-SS-
From: slurm-users On Behalf Of Brian
Andrus
Sent: Tuesday, December 15, 2020 3:14 PM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] Burst to AWS cloud
EXTERNAL SEND
I have done that for several clients.
1. Staging data is a pain. The simplest thing was to have it as part of
the job script, or have the job itself be dependent upon a separate
staging job. Where bandwidth is an issue, we have implemented bbcp
2. Depending on size and connectivity, you can
We are currently investigating the use of the cloud scheduling features within
an on-site Slurm installation and was wondering if anyone had any experiences
that they wish to share of trying to use this feature. In particular I am
interested to know:
https://slurm.schedmd.com/elastic_computing.
Oh, yes! sorry, I confuse with the your email and Alpha Experiment's emails.
Ahmet M.
15.12.2020 21:59 tarihinde Avery Grieve yazdı:
Hi Ahmet,
Thank you for your suggestion. I assume you're talking about the
SlurmctldHost field in the slurm.conf file? If so, I've got that
variable defined
Hi;
I dont know the problem is this, but, I think the setting
"ControlMachine=localhost" and not setting a hostname for slurm master
node are not good decisions. How compute nodes decide the ip address of
the slurm masternode from "localhost". Also, I suggest not using capital
letters for any
I changed my .service file to write to a log. The slurm daemons are running
(manual start) on the compute nodes. I get this on startup with the service
enabled:
[2020-12-15T18:09:06.412] slurmctld version 20.11.1 started on cluster
cluster
[2020-12-15T18:09:06.539] No memory enforcing mechanism co
Hi,
Where do I get the gres names, e.g. "rtx2080ti", to use for my gpus in my node
definitions in slurm.conf?
Thanks,
Erik
Maybe a silly question, but where do you find the daemon logs or specify
their location?
~Avery Grieve
They/Them/Theirs please!
University of Michigan
On Mon, Dec 14, 2020 at 7:22 PM Alpha Experiment
wrote:
> Hi,
>
> I am trying to run slurm on Fedora 33. Upon boot the slurmd daemon is
> runni
Hi all,
we are setting up a new test cluster to test some features for our
next HPC system. On one of the compute nodes we get these messages
in the log:
[2020-12-15T10:00:21.753] error: Munge decode failed: Invalid credential
[2020-12-15T10:00:21.753] auth/munge: _print_cred: ENCODED: Thu Jan 0
16 matches
Mail list logo