Re: [slurm-users] Slurm Perl API use and examples

2020-03-23 Thread Thomas M. Payerle
I was never able to figure out how to use the Perl API shipped with Slurm, but instead have written some wrappers around some of the Slurm commands for Perl. My wrappers for the sacctmgr and share commands are available at CPAN: https://metacpan.org/release/Slurm-Sacctmgr https://metacpan.org/rele

[slurm-users] Slurm Perl API use and examples

2020-03-23 Thread Burian, John
I have some questions about the Slurm Perl API - Is it still actively supported? I see it's still in the source in Git. - Does anyone use it? If so, do you have a pointer to some example code? My immediate question is, for methods that take a data structure as an input argument, how does one defi

[slurm-users] 19.05 not recognizing DefMemPerCPU?

2020-03-23 Thread Prentice Bisbal
Last week I upgraded from Slurm 18.08 to Slurm 19.05. Since that time, several users have reported to me that they can't submit jobs without specifying a memory requirement. In a way, this is intended - my job_submit.lua script checks to make sure that --mem or --mem-per-node is specified, and

Re: [slurm-users] Running an MPI job across two partitions

2020-03-23 Thread Renfro, Michael
Others might have more ideas, but anything I can think of would require a lot of manual steps to avoid mutual interference with jobs in the other partitions (allocating resources for a dummy job in the other partition, modifying the MPI host list to include nodes in the other partition, etc.).

Re: [slurm-users] Running an MPI job across two partitions

2020-03-23 Thread CB
Hi Andy, Yes, they are on teh same network fabric. Sure, creating another partition that encompass all of the nodes of the two or more partitions would solve the problem. I am wondering if there are any other ways instead of creating a new partition? Thanks, Chansup On Mon, Mar 23, 2020 at 11:

Re: [slurm-users] Can slurm be configured to only run one job at a time?

2020-03-23 Thread Faraz Hussain
The singleton dependency seems exactly what I need! However, does it really matter to the network if I upload five 1 GB files sequentially or all at once? I am not too savy on how routers operate. But don't they already do so some kind of load balancing to make sure enough bandwidth is availabl

Re: [slurm-users] Running an MPI job across two partitions

2020-03-23 Thread Riebs, Andy
When you say “distinct compute nodes,” are they at least on the same network fabric? If so, the first thing I’d try would be to create a new partition that encompasses all of the nodes of the other two partitions. Andy From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf

Re: [slurm-users] Can slurm be configured to only run one job at a time?

2020-03-23 Thread Renfro, Michael
Rather than configure it to only run one job at a time, you can use job dependencies to make sure only one job of a particular type at a time. A singleton dependency [1, 2] should work for this. From [1]: #SBATCH --dependency=singleton --job-name=big-youtube-upload in any job script would ens

[slurm-users] Running an MPI job across two partitions

2020-03-23 Thread CB
Hi, I'm running Slurm 19.05 version. Is there any way to launch an MPI job on a group of distributed nodes from two or more partitions, where each partition has distinct compute nodes? I've looked at the heterogeneous job support but it creates two-separate jobs. If there is no such capability

[slurm-users] Can slurm be configured to only run one job at a time?

2020-03-23 Thread Faraz Hussain
I have a five node cluster of raspberry pis. Every hour they all have to upload a local 1 GB file to YouTube. I want it so only one pi can upload at a time so that network doesn't get bogged down. Can slurm be configured to only run one job at a time? Or perhaps some other way to accomplish wha

Re: [slurm-users] sshare with usernames too long

2020-03-23 Thread Paul Edmon
--parsable2 will print full names.  You can also use -o to format your output. -Paul Edmon- On 3/23/2020 10:46 AM, Sysadmin CAOS wrote: Hi, when I run "sshare -A myaccount -a" and, myaccount containts usernames with more than 10 characters, "sshare" output shows a "+" at the 10th character

[slurm-users] sshare with usernames too long

2020-03-23 Thread Sysadmin CAOS
Hi, when I run "sshare -A myaccount -a" and, myaccount containts usernames with more than 10 characters, "sshare" output shows a "+" at the 10th character and, then, I can't know what user is. This is a big problem for me because I have accounts in format "student-1, student-2, etc"... Is th

Re: [slurm-users] reseting SchedNodeList

2020-03-23 Thread Sefa Arslan
Thanks Paul. Holding and releasing or re-queueing the job didn,t clear the SchedNodeList value, due to bacfilling mechanism. I could clear it by restarting slurmctdl only. Sefa Arslan Paul Edmon , 23 Mar 2020 Pzt, 16:25 tarihinde şunu yazdı: > You could try holding the job and the releasing i

Re: [slurm-users] reseting SchedNodeList

2020-03-23 Thread Paul Edmon
You could try holding the job and the releasing it.  I've inquired of SchedMD about this before and this is the response they gave: https://bugs.schedmd.com/show_bug.cgi?id=8069 -Paul Edmon- On 3/23/2020 8:05 AM, Sefa Arslan wrote: Hi, Due to lack of source in a partition, I updated the job

[slurm-users] reseting SchedNodeList

2020-03-23 Thread Sefa Arslan
Hi, Due to lack of source in a partition, I updated the job to another partition and increased the priority to top value. Although there are enough source for the job to be started, updated jobs have not started yet. When I looked using "scontrol check jobid", I saw the SchedNodeList value is no

Re: [slurm-users] Accounting Information from slurmdbd does not reach slurmctld

2020-03-23 Thread Sean Crosby
What happens if you change AccountingStorageHost=localhost to AccountingStorageHost=192.168.1.1 i.e. same IP address as your ctl, and restart the ctld Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne,

Re: [slurm-users] Accounting Information from slurmdbd does not reach slurmctld

2020-03-23 Thread Marcus Wagner
Hi Pascal, are the slurmdbd and slurmctld running on he same host? Best Marcus Am 20.03.2020 um 18:12 schrieb Pascal Klink: Hi Chris, Thanks for the quick answer! I tried the 'sacctmgr show clusters‘ command, which gave Cluster ControlHost ControlPort RPC Share ... QOS