I was never able to figure out how to use the Perl API shipped with Slurm,
but instead have written some wrappers around some of the Slurm commands
for Perl. My wrappers for the sacctmgr and share commands are available at
CPAN:
https://metacpan.org/release/Slurm-Sacctmgr
https://metacpan.org/rele
I have some questions about the Slurm Perl API
- Is it still actively supported? I see it's still in the source in Git.
- Does anyone use it? If so, do you have a pointer to some example code?
My immediate question is, for methods that take a data structure as an input
argument, how does one defi
Last week I upgraded from Slurm 18.08 to Slurm 19.05. Since that time,
several users have reported to me that they can't submit jobs without
specifying a memory requirement. In a way, this is intended - my
job_submit.lua script checks to make sure that --mem or --mem-per-node
is specified, and
Others might have more ideas, but anything I can think of would require a lot
of manual steps to avoid mutual interference with jobs in the other partitions
(allocating resources for a dummy job in the other partition, modifying the MPI
host list to include nodes in the other partition, etc.).
Hi Andy,
Yes, they are on teh same network fabric.
Sure, creating another partition that encompass all of the nodes of the two
or more partitions would solve the problem.
I am wondering if there are any other ways instead of creating a new
partition?
Thanks,
Chansup
On Mon, Mar 23, 2020 at 11:
The singleton dependency seems exactly what I need!
However, does it really matter to the network if I upload five 1 GB files
sequentially or all at once? I am not too savy on how routers operate. But
don't they already do so some kind of load balancing to make sure enough
bandwidth is availabl
When you say “distinct compute nodes,” are they at least on the same network
fabric?
If so, the first thing I’d try would be to create a new partition that
encompasses all of the nodes of the other two partitions.
Andy
From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf
Rather than configure it to only run one job at a time, you can use job
dependencies to make sure only one job of a particular type at a time. A
singleton dependency [1, 2] should work for this. From [1]:
#SBATCH --dependency=singleton --job-name=big-youtube-upload
in any job script would ens
Hi,
I'm running Slurm 19.05 version.
Is there any way to launch an MPI job on a group of distributed nodes from
two or more partitions, where each partition has distinct compute nodes?
I've looked at the heterogeneous job support but it creates two-separate
jobs.
If there is no such capability
I have a five node cluster of raspberry pis. Every hour they all have to upload
a local 1 GB file to YouTube. I want it so only one pi can upload at a time so
that network doesn't get bogged down.
Can slurm be configured to only run one job at a time? Or perhaps some other
way to accomplish wha
--parsable2 will print full names. You can also use -o to format your
output.
-Paul Edmon-
On 3/23/2020 10:46 AM, Sysadmin CAOS wrote:
Hi,
when I run "sshare -A myaccount -a" and, myaccount containts usernames
with more than 10 characters, "sshare" output shows a "+" at the 10th
character
Hi,
when I run "sshare -A myaccount -a" and, myaccount containts usernames
with more than 10 characters, "sshare" output shows a "+" at the 10th
character and, then, I can't know what user is. This is a big problem
for me because I have accounts in format "student-1, student-2, etc"...
Is th
Thanks Paul.
Holding and releasing or re-queueing the job didn,t clear the
SchedNodeList value, due to bacfilling mechanism. I could clear it by
restarting slurmctdl only.
Sefa Arslan
Paul Edmon , 23 Mar 2020 Pzt, 16:25 tarihinde şunu
yazdı:
> You could try holding the job and the releasing i
You could try holding the job and the releasing it. I've inquired of
SchedMD about this before and this is the response they gave:
https://bugs.schedmd.com/show_bug.cgi?id=8069
-Paul Edmon-
On 3/23/2020 8:05 AM, Sefa Arslan wrote:
Hi,
Due to lack of source in a partition, I updated the job
Hi,
Due to lack of source in a partition, I updated the job to another
partition and increased the priority to top value. Although there are
enough source for the job to be started, updated jobs have not started
yet. When I looked using "scontrol check jobid", I saw the
SchedNodeList value
is no
What happens if you change
AccountingStorageHost=localhost
to
AccountingStorageHost=192.168.1.1
i.e. same IP address as your ctl, and restart the ctld
Sean
--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
Research Computing Services | Business Services
The University of Melbourne,
Hi Pascal,
are the slurmdbd and slurmctld running on he same host?
Best
Marcus
Am 20.03.2020 um 18:12 schrieb Pascal Klink:
Hi Chris,
Thanks for the quick answer! I tried the 'sacctmgr show clusters‘ command,
which gave
Cluster ControlHost ControlPort RPC Share ... QOS
17 matches
Mail list logo