[slurm-users] Re: [INTERNET] Re: question on sbatch --prefer

2024-02-10 Thread Brian Andrus via slurm-users
I imagine you could create a reservation for the node and then when you 
are completely done, remove the reservation.


Each helper could then target the reservation for the job.

Brian Andrus

On 2/9/2024 5:52 PM, Alan Stange via slurm-users wrote:

Chip,

Thank you for your prompt response.  We could do that, but the helper is
optional, and at times might involve additional helpers depending  on
the inputs to the problem being solved, and we don't a priori know the
number of helpers that might be needed.

Alan

On 2/9/24 10:59, Chip Seraphine wrote:

Normally I'd address this by having an sbatch script allocate enough resources 
for both jobs (specifying one node), and then kick off the helper as a separate 
step (assuming I am understanding your issue correctly).


On 2/9/24, 9:57 AM, "Alan Stange via slurm-users" mailto:slurm-users@lists.schedmd.com>> wrote:



Hello all,


I'm somewhat new to Slurm, but long time user of other batch systems.
Assume we have a simple cluster of uniform racks of systems with no
special resources, and our jobs are all single cpu tasks.


Lets say I have a long running job in the cluster, which needs to spawn
a helper process into the cluster. We have a strong preference for this
helper to run on the same cluster node as the original job, but if that
node is already scheduled full, then we want this new task to be
scheduled on another systems without any delay.


The problem I have is that the --nodelist doesn't solve this, and, as
far as I can tell, there's no option with --prefer to specify a node
name as a resource, without creating a gres for every hostname in the
cluster.


It seems like what I'm trying to do should be achievable, but having
read though the documentation and searched the archives of this list,
I'm not seeing a solution.


I'm hoping someone here has some experience with this and can point me
in the right direction.


Sincerely,


Alan


--
slurm-users mailing list -- slurm-users@lists.schedmd.com 

To unsubscribe send an email to slurm-users-le...@lists.schedmd.com 




This e-mail and any attachments may contain information that is confidential 
and proprietary and otherwise protected from disclosure. If you are not the 
intended recipient of this e-mail, do not read, duplicate or redistribute it by 
any means. Please immediately delete it and any attachments and notify the 
sender that you have received it by mistake. Unintended recipients are 
prohibited from taking action on the basis of information in this e-mail or any 
attachments. The DRW Companies make no representations that this e-mail or any 
attachments are free of computer viruses or other defects.




--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: canonical way to run longer shell/bash interactive job (instead of srun inside of screen/tmux at front-end)?

2024-02-27 Thread Brian Andrus via slurm-users

Josef,

for us, we put a load balancer in front of the login nodes with session 
affinity enabled. This makes them land on the same backend node each time.


Also, for interactive X sessions, users start a desktop session on the 
node and then use vnc to connect there. This accommodates disconnection 
for any reason even for X-based apps.


Personally, I don't care much for interactive sessions in HPC, but there 
is a large body that only knows how to do things that way, so it is there.


Brian Andrus


On 2/26/2024 12:27 AM, Josef Dvoracek via slurm-users wrote:
What is the recommended way to run longer interactive job at your 
systems?


Our how-to includes starting screen at front-end node and running srun 
with bash/zsh inside,
but that indeed brings dependency between login node (with screen) and 
the compute node job.


On systems with multiple front-ends users need to remember the login 
node where they have their screen session..


Are you anybody using something more advanced and still understandable 
by casual user of HPC?


(I know Open On Demand, but often the use of native console has 
certain benefits. )


cheers

josef







--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: [ext] Re: canonical way to run longer shell/bash interactive job (instead of srun inside of screen/tmux at front-end)?

2024-02-28 Thread Brian Andrus via slurm-users

Magnus,

That is a feature of the load balancer. Most of them have that these days.

Brian Andrus

On 2/28/2024 12:10 AM, Hagdorn, Magnus Karl Moritz via slurm-users wrote:

On Tue, 2024-02-27 at 08:21 -0800, Brian Andrus via slurm-users wrote:

for us, we put a load balancer in front of the login nodes with
session
affinity enabled. This makes them land on the same backend node each
time.

Hi Brian,
that sounds interesting - how did you implement session affinity?
cheers
magnus




--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: [ext] Re: canonical way to run longer shell/bash interactive job (instead of srun inside of screen/tmux at front-end)?

2024-02-28 Thread Brian Andrus via slurm-users

Most of my stuff is in the cloud, so I use their load balancing services.

HAProxy does have sticky sessions, which you can enable based on IP so 
it works with other protocols: 2 Ways to Enable Sticky Sessions in 
HAProxy (Guide) 
<https://www.haproxy.com/blog/enable-sticky-sessions-in-haproxy>


Brian Andrus

On 2/28/2024 12:54 PM, Dan Healy wrote:

Are most of us using HAProxy or something else?

On Wed, Feb 28, 2024 at 3:38 PM Brian Andrus via slurm-users 
 wrote:


Magnus,

That is a feature of the load balancer. Most of them have that
these days.

Brian Andrus

On 2/28/2024 12:10 AM, Hagdorn, Magnus Karl Moritz via slurm-users
wrote:
> On Tue, 2024-02-27 at 08:21 -0800, Brian Andrus via slurm-users
wrote:
>> for us, we put a load balancer in front of the login nodes with
>> session
>> affinity enabled. This makes them land on the same backend node
each
>> time.
> Hi Brian,
> that sounds interesting - how did you implement session affinity?
> cheers
> magnus
>
>

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com

To unsubscribe send an email to slurm-users-le...@lists.schedmd.com



--
Thanks,

Daniel Healy
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Is SWAP memory mandatory for SLURM

2024-03-04 Thread Brian Andrus via slurm-users

Joseph,

You will likely get many perspectives on this. I disable swap completely 
on our compute nodes. I can be draconian that way. For the workflow 
supported, this works and is a good thing.

Other workflows may benefit from swap.

Brian Andrus

On 3/3/2024 11:04 PM, John Joseph via slurm-users wrote:

Dear All,
Good morning
I do have a 4 node SLURM instance up and running.
Like to know if I disable the SWAP memory, will it effect the SLURM 
performance
Is SWAP a mandatory requirement, I have each node more RAM, if my 
phsicall RAM is more, is there any need for the SWAP

thanks
Joseph John


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Slurm billback and sreport

2024-03-04 Thread Brian Andrus via slurm-users

Chip,

I use 'sacct' rather than sreport and get individual job data. That is 
ingested into a db and PowerBI, which can then aggregate as needed.


sreport is pretty general and likely not the best for accurate 
chargeback data.


Brian Andrus

On 3/4/2024 6:09 AM, Chip Seraphine via slurm-users wrote:

Hello,

I am attempting to implement a billback model and finding  myself stymied by 
the way that sreport handles job arrays.   Basically, when a user submits a 
large array, their usage includes time that jobs in the back of the array spend 
waiting their turn.  (My #1 user in “sreport user topusage” shows more “used” 
cpu*minutes than the cluster physically _has_ during that interval.)   However, 
jobs that are idle pending resources are simply regarded as pending; as a 
result, a “polite” user who submits an array of 1000 jobs running N at a time 
is penalized over a user who just dumps 1000 loose jobs into the queue.   This 
incentives my users to do exactly what I do not want!

Has anyone tried to bill their users based on the results of sreport?  If so, 
how did you work around this problem?  What did you use to determine the # of 
CPU*Minutes that a user actually allocated on during a given interval?

--

Chip Seraphine
Grid Operations
For support please use help-grid in email or slack.
This e-mail and any attachments may contain information that is confidential 
and proprietary and otherwise protected from disclosure. If you are not the 
intended recipient of this e-mail, do not read, duplicate or redistribute it by 
any means. Please immediately delete it and any attachments and notify the 
sender that you have received it by mistake. Unintended recipients are 
prohibited from taking action on the basis of information in this e-mail or any 
attachments. The DRW Companies make no representations that this e-mail or any 
attachments are free of computer viruses or other defects.



--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: We're Live! Check out the new SchedMD.com now!

2024-03-13 Thread Brian Andrus via slurm-users

Wow, snazzy!

Looks very good. My compliments.

Brian Andrus

On 3/12/2024 11:24 AM, Victoria Hobson via slurm-users wrote:

Our website has gone through some much needed change and we'd love for
you to explore it!

The new SchedMD.com is equipped with the latest information about
Slurm, your favorite workload manager, and details about SchedMD
services, support, and training offerings.

Toggle through our Industries pages
(https://www.schedmd.com/slurm-industries/) to learn more about how
Slurm can service your specific site needs. Why Slurm?
(https://www.schedmd.com/slurm/why-slurm/) gives you all the basics
around our market-leading scheduler and SchedMD Services
(https://www.schedmd.com/slurm-support/our-services/) addresses all
the ways we can help you optimize your site.

These new web pages also feature access to our Documentation Site, Bug
Site, and Installation Guide. Browse our Events tab to see where we'll
be when, and be sure to register for our Slurm User Group (SLUG) in
Oslo, Norway this fall!
(https://www.schedmd.com/about-schedmd/events/)

SchedMD.com, your one stop shop for all things Slurm. Check it out now!

--
Victoria Hobson
SchedMD LLC
Vice President of Marketing



--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: controller backup slurmctld error while takeover

2024-03-25 Thread Brian Andrus via slurm-users

Miriam,

You need to ensure the SlurmSaveState directory is the same for both.
And by 'the same', I mean all contents are exactly the same.

This is usually achieved by using a shared drive or replication.

Brian Andrus

On 3/25/2024 8:11 AM, Miriam Olmi via slurm-users wrote:

Dear all,

I am having trouble finalizing the configuration of the backup 
controller for my slurm cluster.


In principle, if no job is running everything seems fine: both the 
slurmctld services on the
primary and the backup controller are running and if I stop the 
service on the primary controller
after 10s more or less (SlurmctldTimeout = 10 sec) the backup 
controller takes over.


Also, if I run the sinfo or squeue command during the 10s of 
inactivity, the shell stay pending
but it recover perfectly after the time needed by the backup 
controller to take control and it

works the same when the primary controller is back.


Unfortunately, if I try to do the same test while a job is running 
there are two different

behaviors depending on the initial scenario.

1st scenario:
Both the primary and backup controller are fine. I launch a batch 
script and I verify the script
is running with sinfo and squeue. While the script is still running I 
stop the service on the

primary controller with success but at this point everything gets crazy:

on the backup controller in the slurmctld service log I find the 
following errors:


slurmctld: error: Invalid RPC received REQUEST_JOB_INFO while in 
standby mode
slurmctld: error: Invalid RPC received REQUEST_PARTITION_INFO while in 
standby mode
slurmctld: error: Invalid RPC received REQUEST_JOB_INFO while in 
standby mode
slurmctld: error: Invalid RPC received REQUEST_PARTITION_INFO while in 
standby mode

slurmctld: error: slurm_accept_msg_conn poll: Bad address
slurmctld: error: slurm_accept_msg_conn poll: Bad address

and the commands sinfo and squeue are Unable to contact slurm 
controller (connect failure).


2nd scenario:
the primary controller is stopped and I launch a batch job while the 
backup controller
is the only one working. While the job is running, I restart the 
slurmctld service on the primary
controller. In this case the primary controller takes over 
immediately: everything is smooth

and safe and the sinfo and squeue commands continue to work perfectly.

What might be the problem?

Many thanks in advance!

Miriam



--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: controller backup slurmctld error while takeover

2024-03-25 Thread Brian Andrus via slurm-users

Quick correction, it is SaveStateLocation not SlurmSaveState.

Brian Andrus

On 3/25/2024 8:11 AM, Miriam Olmi via slurm-users wrote:

Dear all,

I am having trouble finalizing the configuration of the backup 
controller for my slurm cluster.


In principle, if no job is running everything seems fine: both the 
slurmctld services on the
primary and the backup controller are running and if I stop the 
service on the primary controller
after 10s more or less (SlurmctldTimeout = 10 sec) the backup 
controller takes over.


Also, if I run the sinfo or squeue command during the 10s of 
inactivity, the shell stay pending
but it recover perfectly after the time needed by the backup 
controller to take control and it

works the same when the primary controller is back.


Unfortunately, if I try to do the same test while a job is running 
there are two different

behaviors depending on the initial scenario.

1st scenario:
Both the primary and backup controller are fine. I launch a batch 
script and I verify the script
is running with sinfo and squeue. While the script is still running I 
stop the service on the

primary controller with success but at this point everything gets crazy:

on the backup controller in the slurmctld service log I find the 
following errors:


slurmctld: error: Invalid RPC received REQUEST_JOB_INFO while in 
standby mode
slurmctld: error: Invalid RPC received REQUEST_PARTITION_INFO while in 
standby mode
slurmctld: error: Invalid RPC received REQUEST_JOB_INFO while in 
standby mode
slurmctld: error: Invalid RPC received REQUEST_PARTITION_INFO while in 
standby mode

slurmctld: error: slurm_accept_msg_conn poll: Bad address
slurmctld: error: slurm_accept_msg_conn poll: Bad address

and the commands sinfo and squeue are Unable to contact slurm 
controller (connect failure).


2nd scenario:
the primary controller is stopped and I launch a batch job while the 
backup controller
is the only one working. While the job is running, I restart the 
slurmctld service on the primary
controller. In this case the primary controller takes over 
immediately: everything is smooth

and safe and the sinfo and squeue commands continue to work perfectly.

What might be the problem?

Many thanks in advance!

Miriam



--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: controller backup slurmctld error while takeover

2024-03-25 Thread Brian Andrus via slurm-users
I would hazard to guess that the DNS is not working fully from or for 
the nodes themselves.


Validate that you can ping the nodes by name from the backup controller. 
Also verify they are named what the dns says they are.  And validate you 
can ping the backup controller from the nodes by the name it has in the 
slurm.conf file.


Also, a quick way to do the failover check is to run (from the backup 
controller): scontrol takeover


Brian Andrus

On 3/25/2024 1:39 PM, Miriam Olmi wrote:

Hi Brian,

Thanks for replying.

In my first message I forgot to specify that the primary and the 
backup controller have a shared filesystem mounted.


The SaveStateLocation points to a directory placed on the shared 
filesystem so both the primary and the backup controller are really 
reading/writing the very same files.


Any other ideas?

Thanks again,
Miriam


Il 25 marzo 2024 19:23:23 CET, Brian Andrus via slurm-users 
 ha scritto:


Quick correction, it is SaveStateLocation not SlurmSaveState.
Brian Andrus On 3/25/2024 8:11 AM, Miriam Olmi via slurm-users wrote:

Dear all, I am having trouble finalizing the configuration of
the backup controller for my slurm cluster. In principle, if
no job is running everything seems fine: both the slurmctld
services on the primary and the backup controller are running
and if I stop the service on the primary controller after 10s
more or less (SlurmctldTimeout = 10 sec) the backup controller
takes over. Also, if I run the sinfo or squeue command during
the 10s of inactivity, the shell stay pending but it recover
perfectly after the time needed by the backup controller to
take control and it works the same when the primary controller
is back. Unfortunately, if I try to do the same test while a
job is running there are two different behaviors depending on
the initial scenario. 1st scenario: Both the primary and
backup controller are fine. I launch a batch script and I
verify the script is running with sinfo and squeue. While the
script is still running I stop the service on the primary
controller with success but at this point everything gets
crazy: on the backup controller in the slurmctld service log I
find the following errors: slurmctld: error: Invalid RPC
received REQUEST_JOB_INFO while in standby mode slurmctld:
error: Invalid RPC received REQUEST_PARTITION_INFO while in
standby mode slurmctld: error: Invalid RPC received
REQUEST_JOB_INFO while in standby mode slurmctld: error:
Invalid RPC received REQUEST_PARTITION_INFO while in standby
mode slurmctld: error: slurm_accept_msg_conn poll: Bad address
slurmctld: error: slurm_accept_msg_conn poll: Bad address and
the commands sinfo and squeue are Unable to contact slurm
controller (connect failure). 2nd scenario: the primary
controller is stopped and I launch a batch job while the
backup controller is the only one working. While the job is
running, I restart the slurmctld service on the primary
controller. In this case the primary controller takes over
immediately: everything is smooth and safe and the sinfo and
squeue commands continue to work perfectly. What might be the
problem? Many thanks in advance! Miriam 

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Elastic Computing: Is it possible to incentivize grouping power_up calls?

2024-04-08 Thread Brian Andrus via slurm-users

Xaver,

You may want to look at the ResumeRate option in slurm.conf:

   ResumeRate
   The rate at which nodes in power save mode are returned to normal
   operation by ResumeProgram. The value is a number of nodes per
   minute and it can be used to prevent power surges if a large number
   of nodes in power save mode are assigned work at the same time (e.g.
   a large job starts). A value of zero results in no limits being
   imposed. The default value is 300 nodes per minute.

I have all our nodes in the cloud and they power down/deallocate when 
idle for a bit. I do not use ansible to start them and use the cli 
interface directly, so the only cpu usage is by that command. I do plan 
on having ansible run from the node to do any hot-fix/updates from the 
base image or changes. By running it from the node, it would alleviate 
any cpu spikes on the slurm head node.


Just a possible path to look at.

Brian Andrus

On 4/8/2024 6:10 AM, Xaver Stiensmeier via slurm-users wrote:

Dear slurm user list,

we make use of elastic cloud computing i.e. node instances are created
on demand and are destroyed when they are not used for a certain amount
of time. Created instances are set up via Ansible. If more than one
instance is requested at the exact same time, Slurm will pass those into
the resume script together and one Ansible call will handle all those
instances.

However, more often than not workflows will request multiple instances
within the same second, but not at the exact same time. This leads to
multiple resume script calls and therefore to multiple Ansible calls.
This will lead to less clear log files, greater CPU consumption by the
multiple running Ansible calls and so on.

What I am looking for is an option to force Slurm to wait a certain
amount and then perform a single resume call for all instances within
that time frame (let's say 1 second).

Is this somehow possible?

Best regards,
Xaver


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Upgrading nodes

2024-04-10 Thread Brian Andrus via slurm-users
Yes. You can build the 8 rpms on 9. Look at 'mock' to do so. I did 
similar when I still had to support EL7


Fairly generic plan, the devil is in the details and verifying each 
step, but those are the basic bases you need to touch.


Brian Andrus


On 4/10/2024 1:48 PM, Steve Berg via slurm-users wrote:
I just finished migrating a few dozen blade servers from torque to 
slurm.  They're all running Alma 8 currently with the slurm that is 
available from epel.  I do want to get it all upgraded to running Alma 
9 and the current version of slurm.  Got one system set up as the 
slurmctld system running Alma 9.  I grabbed the tar ball and built 
RPMs for 9.x.  Got a few questions about the best path to proceed.


Can I use the Alma 9 system to build rpms for Alma 8?  I'm sure I can 
rig up an 8 system to build rpms on but thought I'd see if there was a 
way to do it on the one 9 system.


My plan will be to get the rpms built for 8 and 9, update the 
slurmctld system to the latest version of slurm, then update all the 
nodes to the current slurmd version.  Once that's done I should be 
able to reinstall individual nodes to Alma 9 and the same version of 
slurmd.


Am I missing anything in that sequence?  I'm fairly confident that the 
users aren't running any code that will notice the difference between 
a node running 8 or 9, that should be transparent to them.





--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Slurm.conf and workers

2024-04-15 Thread Brian Andrus via slurm-users

Xaver,

If you look at your slurmctld log, you likely end up seeing messages 
about each node's slurm.conf not being the same as that on the master.


So, yes, it can work temporarily, but unless there are some very 
specific settings done, issues will arise. The state you are in now, you 
will want to sync the config across all nodes and then 'scontrol 
reconfigure'


You may want to look into configless if you can set DNS entries and your 
config is basically monolithic or all parts are in /etc/slurm/


Brian Andrus

On 4/15/2024 2:55 AM, Xaver Stiensmeier via slurm-users wrote:

Dear slurm-user list,

as far as I understood it, the slurm.conf needs to be present on the
master and on the workers at slurm.conf (if no other path is set via
SLURM_CONF). However, I noticed that when adding a partition only in the
master's slurm.conf, all workers were able to "correctly" show the added
partition when calling sinfo on them.

Is the stored slurm.conf on every instance just a fallback for when
connection is down or what is the purpose? The documentation only says: .
"This file should be consistent across all nodes in the cluster."

Best regards,
Xaver




--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Integrating Slurm with WekaIO

2024-04-19 Thread Brian Andrus via slurm-users
This is because you have no slurm.conf in /etc/slurm, so it it is trying 
'configless' which queries DNS to find out where to get the config. It 
is failing because you do not have DNS configured to tell nodes where to 
ask about the config.


Simple solution: put a copy of slurm.conf in /etc/slurm/ on the node(s).

Brian Andrus

On 4/19/2024 9:56 AM, Jeffrey Layton via slurm-users wrote:

Good afternoon,

I'm working on a cluster of NVIDIA DGX A100's that is using BCM 10 
(Base Command Manager which is based on Bright Cluster Manager). I ran 
into an error and only just learned that Slurm and Weka don't get 
along (presumably because Weka pins their client threads to cores). I 
read through their documentation here: 
https://docs.weka.io/best-practice-guides/weka-and-slurm-integration#heading-h.4d34og8


I through I set everything correctly but when I try to restart the 
slurm server I get the following:


Apr 19 05:29:39 bcm10-headnode slurmd[3992058]: slurmd: error: 
resolve_ctls_from_dns_srv: res_nsearch error: Unknown host
Apr 19 05:29:39 bcm10-headnode slurmd[3992058]: slurmd: error: 
fetch_config: DNS SRV lookup failed
Apr 19 05:29:39 bcm10-headnode slurmd[3992058]: slurmd: error: 
_establish_configuration: failed to load configs
Apr 19 05:29:39 bcm10-headnode slurmd[3992058]: slurmd: error: slurmd 
initialization failed
Apr 19 05:29:39 bcm10-headnode slurmd[3992058]: error: 
resolve_ctls_from_dns_srv: res_nsearch error: Unknown host
Apr 19 05:29:39 bcm10-headnode slurmd[3992058]: error: fetch_config: 
DNS SRV lookup failed
Apr 19 05:29:39 bcm10-headnode slurmd[3992058]: error: 
_establish_configuration: failed to load configs
Apr 19 05:29:39 bcm10-headnode slurmd[3992058]: error: slurmd 
initialization failed
Apr 19 05:29:39 bcm10-headnode systemd[1]: slurmd.service: Main 
process exited, code=exited, status=1/FAILURE
Apr 19 05:29:39 bcm10-headnode systemd[1]: slurmd.service: Failed with 
result 'exit-code'.


Has anyone encountered this?

I read this is usually associated with configless Slurm, but I don't 
know how Slurm is built in BCM. slurm.conf is located in 
/cm/shared/apps/slurm/var/etc/slurm and this is what I edited. The 
environment variables for Slurm are set correctly so it points to this 
slurm.conf file.


One thing that I did not do was tell Slurm which cores Weka was using. 
I can seem to figure out the syntax for this. Can someone share the 
changes they made to slurm.conf?


Thanks!

Jeff


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Integrating Slurm with WekaIO

2024-04-19 Thread Brian Andrus via slurm-users
I would double-check where you are setting SLURM_CONF then. It is acting 
as if it is not set (typo maybe?)


It should be in /etc/defaults/slurmd (but could be /etc/sysconfig/slurmd).

Also check what the final, actual command being run to start it is. If 
anyone has changed the .service file or added an override file, that 
will affect things.


Brian Andrus


On 4/19/2024 10:15 AM, Jeffrey Layton wrote:
I like it, however, it was working before without a slurm.conf in 
/etc/slurm.


Plus the environment variable SLURM_CONF is pointing to the correct 
slurm.conf file (the one in /cm/...). Wouldn't Slurm pick up that one?


Thanks!

Jeff


On Fri, Apr 19, 2024 at 1:11 PM Brian Andrus via slurm-users 
 wrote:


This is because you have no slurm.conf in /etc/slurm, so it it is
trying 'configless' which queries DNS to find out where to get the
config. It is failing because you do not have DNS configured to
tell nodes where to ask about the config.

Simple solution: put a copy of slurm.conf in /etc/slurm/ on the
node(s).

Brian Andrus

On 4/19/2024 9:56 AM, Jeffrey Layton via slurm-users wrote:

Good afternoon,

I'm working on a cluster of NVIDIA DGX A100's that is using BCM
10 (Base Command Manager which is based on Bright Cluster
Manager). I ran into an error and only just learned that Slurm
and Weka don't get along (presumably because Weka pins their
client threads to cores). I read through their documentation
here:

https://docs.weka.io/best-practice-guides/weka-and-slurm-integration#heading-h.4d34og8

I through I set everything correctly but when I try to restart
the slurm server I get the following:

Apr 19 05:29:39 bcm10-headnode slurmd[3992058]: slurmd: error:
resolve_ctls_from_dns_srv: res_nsearch error: Unknown host
Apr 19 05:29:39 bcm10-headnode slurmd[3992058]: slurmd: error:
fetch_config: DNS SRV lookup failed
Apr 19 05:29:39 bcm10-headnode slurmd[3992058]: slurmd: error:
_establish_configuration: failed to load configs
Apr 19 05:29:39 bcm10-headnode slurmd[3992058]: slurmd: error:
slurmd initialization failed
Apr 19 05:29:39 bcm10-headnode slurmd[3992058]: error:
resolve_ctls_from_dns_srv: res_nsearch error: Unknown host
Apr 19 05:29:39 bcm10-headnode slurmd[3992058]: error:
fetch_config: DNS SRV lookup failed
Apr 19 05:29:39 bcm10-headnode slurmd[3992058]: error:
_establish_configuration: failed to load configs
Apr 19 05:29:39 bcm10-headnode slurmd[3992058]: error: slurmd
initialization failed
Apr 19 05:29:39 bcm10-headnode systemd[1]: slurmd.service: Main
process exited, code=exited, status=1/FAILURE
Apr 19 05:29:39 bcm10-headnode systemd[1]: slurmd.service: Failed
with result 'exit-code'.

Has anyone encountered this?

I read this is usually associated with configless Slurm, but I
don't know how Slurm is built in BCM. slurm.conf is located in
/cm/shared/apps/slurm/var/etc/slurm and this is what I edited.
The environment variables for Slurm are set correctly so it
points to this slurm.conf file.

One thing that I did not do was tell Slurm which cores Weka was
using. I can seem to figure out the syntax for this. Can someone
share the changes they made to slurm.conf?

Thanks!

Jeff




-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com

To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Submitting from an untrusted node

2024-05-14 Thread Brian Andrus via slurm-users

Rike,

Assuming the data, scripts and other dependencies are already on the 
cluster, you could just ssh and execute the sbatch command in a single 
shot: ssh submitnode sbatch some_script.sh


It will ask for a password if appropriate and could use ssh keys to 
bypass that need.


Brian Andrus

On 5/14/2024 5:10 AM, Rike-Benjamin Schuppner via slurm-users wrote:

Hi,

If I understand it correctly, the MUNGE and SACK authentication modules 
naturally require that no-one can get access to the key. This means that we 
should not use our normal workstations to which our users have physical access 
to run any jobs, nor could our users use the workstations to submit jobs to the 
compute nodes. They would have to ssh to a specific submit node and only then 
could they schedule their jobs.

Is there an elegant way to enable job submission from any computer (possibly 
requiring that users type their password for the submit node – or to their ssh 
key – at some point)? (All computers/users use the same LDAP server for logins.)

Best
/rike




--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Building Slurm debian package vs building from source

2024-05-22 Thread Brian Andrus via slurm-users
Not that I recommend it much, but you can build them for each 
environment and install the ones needed in each.


A simple example is when you have nodes with and without GPUs.
You can build slurmd packages without for those nodes and with for the 
ones that have them.


Generally, so long as versions are compatible, they can work together. 
You will need to be aware of differences for jobs and configs, but it is 
possible.


Brian Andrus

On 5/22/2024 12:45 AM, Arnuld via slurm-users wrote:
We have several nodes, most of which have different Linux 
distributions (distro for short). Controller has a different distro as 
well. The only common thing between controller and all the does is 
that all of them ar x86_64.


I can install Slurm using package manager on all the machines but this 
will not work because controller will have a different version of 
Slurm compared to the nodes (21.08 vs 23.11)


If I build from source then I see two solutions:
 - build a deb package
 - build a custom package (./configure, make, make install)

Building a debian package on the controller and then distributing the 
binaries on nodes won't work either because that binary will start 
looking for the shared libraries that it was built for and those don't 
exist on the nodes.


So the only solution I have is to build a static binary using a custom 
package. Am I correct or is there another solution here?




--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Building Slurm debian package vs building from source

2024-05-23 Thread Brian Andrus via slurm-users
I would guess either you install GPU drivers on the non-GPU nodes or 
build slurm without GPU support for that to work due to package 
dependencies.


Both viable options. I have done installs where we just don't compile 
GPU support in and that is left to the users to manage.


Brian Andrus

On 5/23/2024 6:16 AM, Christopher Samuel via slurm-users wrote:

On 5/22/24 3:33 pm, Brian Andrus via slurm-users wrote:


A simple example is when you have nodes with and without GPUs.
You can build slurmd packages without for those nodes and with for 
the ones that have them.


FWIW we have both GPU and non-GPU nodes but we use the same RPMs we 
build on both (they all boot the same SLES15 OS image though).




--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: slurmdbd archive format

2024-05-28 Thread Brian Andrus via slurm-users
Instead of using the archive files, couldn't you query the db directly 
for the info you need?


I would recommend sacct/sreport if those can get the info you need.

Brian Andrus

On 5/28/2024 9:59 AM, O'Neal, Doug (NIH/NCI) [C] via slurm-users wrote:


My organization needs to access historic job information records for 
metric reporting and resource forecasting. slurmdbd is archiving only 
the job information, which should be sufficient for our numbers, but 
is using the default archive script. In retrospect, this data should 
have been migrated to a secondary MariaDB instance, but that train has 
passed.



The format of the archive files is not well documented. Does anyone 
have a program (python/C/whatever) that will read a job_table_archive 
file and decode it into a parsable structure?


Douglas O’Neal, Ph.D. (contractor)

Manager, HPC Systems Administration Group, ITOG

Frederick National Laboratory for Cancer Research

Leidos Biomedical Research, Inc.

Phone: 301-228-4656

Email: Douglas.O’n...@nih.gov 




-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: slurmdbd archive format

2024-05-28 Thread Brian Andrus via slurm-users

Oh, to address the passed train:

Restore the archive data with "sacctmgr archive load", then you can do 
as you need.


From man sacctmgr:

*archive*{dump|load} 

    Write database information to a flat file or load information that 
has previously been written to a file.


Brian Andrus


Setup your other MariaDB instance, dump the current slurmdbd and 
restore/import it, then restore your archive


On 5/28/2024 11:38 AM, Brian Andrus wrote:


Instead of using the archive files, couldn't you query the db directly 
for the info you need?


I would recommend sacct/sreport if those can get the info you need.

Brian Andrus

On 5/28/2024 9:59 AM, O'Neal, Doug (NIH/NCI) [C] via slurm-users wrote:


My organization needs to access historic job information records for 
metric reporting and resource forecasting. slurmdbd is archiving only 
the job information, which should be sufficient for our numbers, but 
is using the default archive script. In retrospect, this data should 
have been migrated to a secondary MariaDB instance, but that train 
has passed.



The format of the archive files is not well documented. Does anyone 
have a program (python/C/whatever) that will read a job_table_archive 
file and decode it into a parsable structure?


Douglas O’Neal, Ph.D. (contractor)

Manager, HPC Systems Administration Group, ITOG

Frederick National Laboratory for Cancer Research

Leidos Biomedical Research, Inc.

Phone: 301-228-4656

Email: Douglas.O’n...@nih.gov 




-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: slurmdbd not connecting to mysql (mariadb)

2024-05-30 Thread Brian Andrus via slurm-users

That SIGTERM message means something is telling slurmdbd to quit.

Check your cron jobs, maintenance scripts, etc. Slurmdbd is being told 
to shutdown. If you are running in the foreground, a ^C does that. If 
you run a kill or killall on it, you will get that same message.


Brian Andrus

On 5/30/2024 6:53 AM, Radhouane Aniba via slurm-users wrote:
Yes I can connect to my database using mysql --user=slurm 
--password=slurmdbpass  slurm_acct_db and there is no firewall 
blocking mysql after checking the firewall question


ALso here is the output of slurmdbd -D -vvv (note I can only run this 
as sudo )


sudo slurmdbd -D -vvv
slurmdbd: debug: Log file re-opened
slurmdbd: debug: Munge authentication plugin loaded
slurmdbd: debug2: mysql_connect() called for db slurm_acct_db
slurmdbd: debug2: Attempting to connect to localhost:3306
slurmdbd: debug2: innodb_buffer_pool_size: 134217728
slurmdbd: debug2: innodb_log_file_size: 50331648
slurmdbd: debug2: innodb_lock_wait_timeout: 50
slurmdbd: error: Database settings not recommended values: 
innodb_buffer_pool_size innodb_lock_wait_timeout

slurmdbd: Accounting storage MYSQL plugin loaded
slurmdbd: debug2: ArchiveDir = /tmp
slurmdbd: debug2: ArchiveScript = (null)
slurmdbd: debug2: AuthAltTypes = (null)
slurmdbd: debug2: AuthInfo = (null)
slurmdbd: debug2: AuthType = auth/munge
slurmdbd: debug2: CommitDelay = 0
slurmdbd: debug2: DbdAddr = localhost
slurmdbd: debug2: DbdBackupHost = (null)
slurmdbd: debug2: DbdHost = hannibal-hn
slurmdbd: debug2: DbdPort = 7032
slurmdbd: debug2: DebugFlags = (null)
slurmdbd: debug2: DebugLevel = 6
slurmdbd: debug2: DebugLevelSyslog = 10
slurmdbd: debug2: DefaultQOS = (null)
slurmdbd: debug2: LogFile = /var/log/slurmdbd.log
slurmdbd: debug2: MessageTimeout = 100
slurmdbd: debug2: Parameters = (null)
slurmdbd: debug2: PidFile = /run/slurmdbd.pid
slurmdbd: debug2: PluginDir = /usr/lib/x86_64-linux-gnu/slurm-wlm
slurmdbd: debug2: PrivateData = none
slurmdbd: debug2: PurgeEventAfter = 1 months*
slurmdbd: debug2: PurgeJobAfter = 12 months*
slurmdbd: debug2: PurgeResvAfter = 1 months*
slurmdbd: debug2: PurgeStepAfter = 1 months
slurmdbd: debug2: PurgeSuspendAfter = 1 months
slurmdbd: debug2: PurgeTXNAfter = 12 months
slurmdbd: debug2: PurgeUsageAfter = 24 months
slurmdbd: debug2: SlurmUser = root(0)
slurmdbd: debug2: StorageBackupHost = (null)
slurmdbd: debug2: StorageHost = localhost
slurmdbd: debug2: StorageLoc = slurm_acct_db
slurmdbd: debug2: StoragePort = 3306
slurmdbd: debug2: StorageType = accounting_storage/mysql
slurmdbd: debug2: StorageUser = slurm
slurmdbd: debug2: TCPTimeout = 2
slurmdbd: debug2: TrackWCKey = 0
slurmdbd: debug2: TrackSlurmctldDown= 0
slurmdbd: debug2: acct_storage_p_get_connection: request new connection 1
slurmdbd: debug2: Attempting to connect to localhost:3306
slurmdbd: slurmdbd version 19.05.5 started
slurmdbd: debug2: running rollup at Thu May 30 13:50:08 2024
slurmdbd: debug2: Everything rolled up


It goes like this for some time and then it crashes with this message

slurmdbd: Terminate signal (SIGINT or SIGTERM) received
slurmdbd: debug: rpc_mgr shutting down


On Thu, May 30, 2024 at 8:18 AM mercan  
wrote:


Did you try to connect database using mysql command?

mysql --user=slurm --password=slurmdbpass slurm_acct_db

C. Ahmet Mercan

On 30.05.2024 14:48, Radhouane Aniba via slurm-users wrote:

Thank you Ahmet,
I dont have a firewall active.
And because slurmdbd cannot connect to the database I am not able
to getting it to be activated through systemctl I will share the
output for slurmdbd -D -vvv shortly but overall it is always
saying trying to connect to the db and then retries a couple of
times and crashes

R.




On Thu, May 30, 2024 at 2:51 AM mercan
 wrote:

Hi;

Did you check can you connect db with your conf parameters
from head-node:

mysql --user=slurm --password=slurmdbpass slurm_acct_db

Also, check and stop firewall and selinux, if they are running.

Last, you can stop slurmdbd, then run run terminal with:

slurmdbd -D -vvv

Regards;

C. Ahmet Mercan

On 30.05.2024 00:05, Radhouane Aniba via slurm-users wrote:

Hi everyone
I am trying to get slurmdbd to run on my local home server
but I am really struggling.
Note : am a novice slurm user
my slurmdbd always times out even though all the details in
the conf file are correct

My log looks like this

[2024-05-29T20:51:30.088] Accounting storage MYSQL plugin
loaded
[2024-05-29T20:51:30.088] debug2: ArchiveDir = /tmp
[2024-05-29T20:51:30.088] debug2: ArchiveScript = (null)
[2024-05-29T20:51:30.088] debug2: AuthAltTypes = (null)
[2024-05-29T20:51:30.088] debug2: AuthInfo = (null)
[2024-05-29T20:51:30.088] debug2: AuthType = auth/munge
[2024-05-29T20:51:30.088] debug2: CommitDelay = 0
  

[slurm-users] Re: Can Not Use A Single GPU for Multiple Jobs

2024-06-20 Thread Brian Andrus via slurm-users

Well, if I am reading this right, it makes sense.

Every job will need at least 1 core just to run and if there are only 4 
cores on the machine, one would expect a max of 4 jobs to run.


Brian Andrus

On 6/20/2024 5:24 AM, Arnuld via slurm-users wrote:
I have a machine with a quad-core CPU and an Nvidia GPU with 3500+ 
cores.  I want to run around 10 jobs in parallel on the GPU (mostly 
are CUDA based jobs).


PROBLEM: Each job asks for only 100 shards (runs usually for a minute 
or so), then I should be able to run 3500/100 = 35 jobs in 
parallel but slurm runs only 4 jobs in parallel keeping the rest in 
the queue.


I have this in slurm.conf and gres.conf:

# GPU
GresTypes=gpu,shard
# COMPUTE NODES
PartitionName=pzero Nodes=ALL Default=YES MaxTime=INFINITE State=UP`
PartitionName=pgpu Nodes=hostgpu MaxTime=INFINITE State=UP
NodeName=hostgpu NodeAddr=x.x.x.x Gres=gpu:gtx_1080_ti:1,shard:3500 
CPUs=4 Boards=1 SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=1 
RealMemory=64255 State=UNKNOWN

--
Name=gpu Type=gtx_1080_ti File=/dev/nvidia0 Count=1
Name=shard Count=3500  File=/dev/nvidia0





--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: How can I tell the OS that was used to build SLURM?

2024-06-20 Thread Brian Andrus via slurm-users

Carl,

You cannot tell from the binary alone.
It looks like you just did an apt-get install slurm or such under 
Ubuntu. Would that be right?


You may be able to look at the package and see info about the build 
environment.
Generally, it is best to build slurm yourself for the environment it is 
going to run in. Because there are multiple possible dependencies/uses, 
this is best.


Brian Andrus

On 6/20/2024 1:38 PM, Carl Ponder via slurm-users wrote:


We're seeing SLURM mis-behaving on one of your clusters, that runs 
Ubuntu 22.04.
Ampng other problems, we see an error-message regarding a missing 
library version that would have been shipped on Ubuntu 20.04 not 22.04.
It's not clear that the library is being called from a SLURM component 
or OpenMPI or something else.


I'm able to read some of the SLURM configuration settings, but is 
there a way to tell what OS had been used to build it?
I imagine that the build-log would show some system-detection steps 
that would expose that, but don't know if it's bundled with the rest 
of the SLURM install somewhere.





--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Using sharding

2024-07-04 Thread Brian Andrus via slurm-users
To help dig into it, can you paste the full output of scontrol show node 
compute01 while the job is pending? Also 'sinfo' would be good.


It is basically telling you there aren't enough resources in the 
partition to run the job. Often this is because all the nodes are in use 
at that moment.


Brian Andrus

On 7/4/2024 8:43 AM, Ricardo Cruz via slurm-users wrote:

Greetings,

There are not many questions regarding GPU sharding here, and I am 
unsure if I am using it correctly... I have configured it according to 
the instructions , and it seems 
to be configured properly:


$ scontrol show node compute01
NodeName=compute01 Arch=x86_64 CoresPerSocket=32
   CPUAlloc=48 CPUEfctv=128 CPUTot=128 CPULoad=10.95
   AvailableFeatures=(null)
   ActiveFeatures=(null)
*   Gres=gpu:8,shard:32
*
   [truncated]

When running with gres:gpu everything works perfectly:

$ /usr/bin/srun --gres=gpu:2 ls
srun: job 192 queued and waiting for resources
srun: job 192 has been allocated resources
(...)

However, when using sharding, it just stays waiting indefinitely:

$ /usr/bin/srun --gres=shard:2 ls
srun: job 193 queued and waiting for resources

The reason it gives for pending is just "Resources":

$ scontrol show job 193
JobId=193 JobName=ls
   UserId=rpcruz(1000) GroupId=rpcruz(1000) MCS_label=N/A
   Priority=1 Nice=0 Account=account QOS=normal
*   JobState=PENDING Reason=Resources Dependency=(null)
*   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=2-00:00:00 TimeMin=N/A
   SubmitTime=2024-06-28T05:36:51 EligibleTime=2024-06-28T05:36:51
   AccrueTime=2024-06-28T05:36:51
   StartTime=2024-06-29T18:13:22 EndTime=2024-07-01T18:13:22 Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2024-06-28T05:37:20 
Scheduler=Backfill:*

   Partition=partition AllocNode:Sid=localhost:47757
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=
   NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   ReqTRES=cpu=1,mem=1031887M,node=1,billing=1
   AllocTRES=(null)
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=ls
   WorkDir=/home/rpcruz
   Power=
*   TresPerNode=gres/shard:2**
*

Again, I think I have configured it properly - it shows up correctly 
in scontrol (as shown above).

Our setup is pretty simple - I just added shard to /etc/slurm/slurm.conf:
GresTypes=gpu,shard
NodeName=compute01 Gres=gpu:8,shard:32 [truncated]
Our /etc/slurm/gres.conf is also straight-forward: (it works fine for 
--gres=gpu:1)

Name=gpu File=/dev/nvidia[0-7]
Name=shard Count=32


Maybe I am just running srun improperly? Shouldn't it just be srun 
--gres=shard:2 to allocate half of a GPU? (since I am using 32 shards 
for the 8 gpus, so it's 4 shards per gpu)


Thank you very much for your attention,
--
Ricardo Cruz - https://rpmcruz.github.io


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Using sharding

2024-07-04 Thread Brian Andrus via slurm-users

Just a thought.

Try specifying some memory. It looks like the running jobs do that and 
by default, if not specified it is "all the memory on the node", so it 
can't start because some of it is taken.


Brian Andrus

On 7/4/2024 9:54 AM, Ricardo Cruz wrote:

Dear Brian,

Currently, we have 5 GPUs available (out of 8).

rpcruz@atlas:~$ /usr/bin/srun --gres=shard:2 ls
srun: job 515 queued and waiting for resources

The job shows as PD in squeue.
scontrol says that 5 GPUs are allocated out of 8...

rpcruz@atlas:~$ scontrol show node compute01
NodeName=compute01 Arch=x86_64 CoresPerSocket=32
   CPUAlloc=80 CPUEfctv=128 CPUTot=128 CPULoad=65.38
   AvailableFeatures=(null)
   ActiveFeatures=(null)
*   Gres=gpu:8,shard:32
*   NodeAddr=compute01 NodeHostName=compute01 Version=23.11.4
   OS=Linux 6.8.0-36-generic #36-Ubuntu SMP PREEMPT_DYNAMIC Mon Jun 10 
10:49:14 UTC 2024

   RealMemory=1031887 AllocMem=644925 FreeMem=701146 Sockets=2 Boards=1
   State=MIXED ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=partition
   BootTime=2024-07-02T14:08:37 SlurmdStartTime=2024-07-02T14:08:51
   LastBusyTime=2024-07-03T12:02:11 ResumeAfterTime=None
*   CfgTRES=cpu=128,mem=1031887M,billing=128,gres/gpu=8
   AllocTRES=cpu=80,mem=644925M,gres/gpu=5
*   CapWatts=n/a
   CurrentWatts=0 AveWatts=0
   ExtSensorsJoules=n/a ExtSensorsWatts=0 ExtSensorsTemp=n/a

rpcruz@atlas:~$ sinfo
PARTITION  AVAIL  TIMELIMIT  NODES  STATE NODELIST
partition*    up 5-00:00:00      1    mix compute01


The output is the same, independent of whether "srun --gres=shard:2" 
is pending or not.
I wonder if the problem is that CfgTRES is not showing gres/shard ... 
it sounds like it should, right?


The complete last part of my /etc/slurm/slurm.conf (which is of course 
the same in the login and compute node):


# COMPUTE NODES
GresTypes=gpu,shard
NodeName=compute01 Gres=gpu:8,shard:32 CPUs=128 RealMemory=1031887 
Sockets=2 CoresPerSocket=32 ThreadsPerCore=2 State=UNKNOWN
PartitionName=partition Nodes=ALL Default=YES MaxTime=5-00:00:00 
State=UP DefCpuPerGPU=16 DefMemPerGPU=128985


And in the compute node /etc/slurm/gres.conf is:
Name=gpu File=/dev/nvidia[0-7]
Name=shard Count=32


Thank you!
--
Ricardo Cruz - https://rpmcruz.github.io
<https://rpmcruz.github.io/>


Brian Andrus via slurm-users  escreveu 
(quinta, 4/07/2024 à(s) 17:16):


To help dig into it, can you paste the full output of scontrol
show node compute01 while the job is pending? Also 'sinfo' would
be good.

It is basically telling you there aren't enough resources in the
partition to run the job. Often this is because all the nodes are
in use at that moment.

Brian Andrus

On 7/4/2024 8:43 AM, Ricardo Cruz via slurm-users wrote:

Greetings,

There are not many questions regarding GPU sharding here, and I
am unsure if I am using it correctly... I have configured it
according to the instructions
<https://slurm.schedmd.com/gres.html>, and it seems to be
configured properly:

$ scontrol show node compute01
NodeName=compute01 Arch=x86_64 CoresPerSocket=32
   CPUAlloc=48 CPUEfctv=128 CPUTot=128 CPULoad=10.95
   AvailableFeatures=(null)
   ActiveFeatures=(null)
*   Gres=gpu:8,shard:32
*
   [truncated]

When running with gres:gpu everything works perfectly:

$ /usr/bin/srun --gres=gpu:2 ls
srun: job 192 queued and waiting for resources
srun: job 192 has been allocated resources
(...)

However, when using sharding, it just stays waiting indefinitely:

$ /usr/bin/srun --gres=shard:2 ls
srun: job 193 queued and waiting for resources

The reason it gives for pending is just "Resources":

$ scontrol show job 193
JobId=193 JobName=ls
   UserId=rpcruz(1000) GroupId=rpcruz(1000) MCS_label=N/A
   Priority=1 Nice=0 Account=account QOS=normal
*   JobState=PENDING Reason=Resources Dependency=(null)
*   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=2-00:00:00 TimeMin=N/A
   SubmitTime=2024-06-28T05:36:51 EligibleTime=2024-06-28T05:36:51
   AccrueTime=2024-06-28T05:36:51
   StartTime=2024-06-29T18:13:22 EndTime=2024-07-01T18:13:22
Deadline=N/A
   SuspendTime=None SecsPreSuspend=0
LastSchedEval=2024-06-28T05:37:20 Scheduler=Backfill:*
   Partition=partition AllocNode:Sid=localhost:47757
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=
   NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   ReqTRES=cpu=1,mem=1031887M,node=1,billing=1
   AllocTRES=(null)
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=ls
   WorkDir=/home/rpcruz
   Power=
*   TresPerNode=gres/shard:2**
*


[slurm-users] Re: Nodes TRES double what is requested

2024-07-10 Thread Brian Andrus via slurm-users

Jack,

To make sure things are set right, run 'slurmd -C' on the node and use 
that output in your config.


It can also give you insight as to what is being seen on the node versus 
what you may expect.


Brian Andrus

On 7/10/2024 1:25 AM, jack.mellor--- via slurm-users wrote:

Hi,

We are running slurm 23.02.6. Our nodes have hyperthreading disabled and we 
have slurm.conf set to CPU=32 for each node (each node has 2 processes with 16 
cores). When we allocated a job, such as salloc -n 32, it will allocate a whole 
node but using sinfo shows double the allocation in the TRES=64. It also shows 
in sinfo that the node has 4294967264 idle CPUs.

Not sure if its a known bug, or an issue with our config? I have tried various 
things, like setting the sockets/boards in slurm.conf.

Thanks
Jack



--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: SLURM noob administrator question

2024-07-11 Thread Brian Andrus via slurm-users
You probably want to look at scontrol show node and scontrol show job 
for that node and the jobs on it.


Compare those and you may find someone requested most all the resources, 
but are not running them properly. Look at the job itself to see what it 
is trying to do.


Brian Andrus

On 7/11/2024 7:48 AM, Cutts, Tim via slurm-users wrote:


Still learning about SLURM, so please forgive me if I ask a naïve question

I like to use Anders Halager’s gnodes command to visualise the state 
of our nodes.  I’ve noticed lately that we fairly often see things 
like this (apologies for line wrap):


+- core- 46 cores & 186GB 
-+---+---+


| seskscpn3010G___OOO | 
seskscpn309172G.. | 
seskscpn3170GOO |


Now, you can see in this that nodes 301 and 317 are more or less fully 
loaded.  This is great.  But 309 is in an interesting state.  Four 
overloaded cores, and all other cores unused, and plenty of RAM available.


And yet SLURM is not scheduling any more work to that node.  Right now 
there are more than 2000 jobs pending, many of which could run on that 
node.  But SLURM is not scheduling them, and I don’t know why.


One thing I’ve seen cause this is a job trying to use more CPUs than 
it has been allocated.  The cgroup stops this being a real problem of 
course, but it does cause the load average to go high.  Is this what’s 
causing SLURM to stop sending anything to the node?  Is there a 
configuration change that might help in this situation?


Thanks in advance,

Tim

--

*Tim Cutts*

Scientific Computing Platform Lead

AstraZeneca

Find out more about R&D IT Data, Analytics & AI and how we can support 
you by visiting ourService Catalogue 
|




AstraZeneca UK Limited is a company incorporated in England and Wales 
with registered number:03674842 and its registered office at 1 Francis 
Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA.


This e-mail and its attachments are intended for the above named 
recipient only and may contain confidential and privileged 
information. If they have come to you in error, you must not copy or 
show them to anyone; instead, please reply to this e-mail, 
highlighting the error to the sender and then immediately delete the 
message. For information about how AstraZeneca UK Limited and its 
affiliates may process information, personal data and monitor 
communications, please see our privacy notice at www.astrazeneca.com 




-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: CLOUD nodes with unknown IP addresses

2024-07-19 Thread Brian Andrus via slurm-users

Martin,

In a nutshell, when slurmd starts, it tells that info to slurmctld. That 
is the "registration" event mentioned.


Brian Andrus

On 7/19/2024 5:44 AM, Martin Lee via slurm-users wrote:

I've read the following in the slurm power saving docs:
https://slurm.schedmd.com/power_save.html


*cloud_dns*

By default, Slurm expects that the network addresses for cloud
nodes won't be known until creation of the node and that Slurm
will be notified of the node's address upon registration. Since
Slurm communications rely on the node configuration found in the
slurm.conf, Slurm will tell the client command, after waiting for
all nodes to boot, each node's IP address. However, in
environments where the nodes are in DNS, this step can be avoided
by configuring this option.



I am creating the nodes on demand and don't know the IP ahead of
the instance start, so cloud_dns is not set.

I'm confused specifically by "Slurm will be notified of the node's
address upon registration." Who/what is expected to do this? If it
is expected to be performed by the ResumeProgram, does it need to
be done before slurmd starts on the node? Is it OK if the node
does it after slurmd has started with something like:

scontrol update nodename=$(hostname -s) nodeaddr=$(hostname -I)
nodehostname=$(hostname)
scontrol reconfigure

Thank you,

Martin

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Background tasks in Slurm scripts?

2024-07-26 Thread Brian Andrus via slurm-users
Generally speaking, when the batch script exits, slurm will clean up (ie 
kill) any stray processes.

So, I would expect that executable to be killed at cleanup.

Brian Andrus

On 7/26/2024 2:45 AM, Steffen Grunewald via slurm-users wrote:

On Fri, 2024-07-26 at 10:42:45 +0300, Slurm users wrote:

Good Morning;

This is not a slurm issue. This is a default shell script feature. If you
want to wait to finish until all background processes, you should use wait
command after all.

Thank you - I already knew this in principle, and I also know that a login
shell will complain at an attempt to exit when there are leftover background
jobs. I was wondering though how Slurm's task control would react... Got to
try myself, I guess...

Best, S



--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: LRMS error: (-1) Job missing from SLURM."

2024-08-06 Thread Brian Andrus via slurm-users

Felix,

Finished jobs roll off the list shown in squeue, so that may be no 
surprise (depending on settings). If there was a power failure that 
caused the nodes to restart, it could also be that the job had not been 
written to slurmdbd, making it unavailable to sacct as well.


Your logs look to be from a front-end system that interfaces with slurm 
and does not seem to show the actual slurm jobid, unless those are the 
274398, 274399, and 274400 numbers. If so, you could look in the 
slurmctld logs for the jobs to see what may have happened.


Brian Andrus

On 8/6/2024 5:57 AM, Felix via slurm-users wrote:

Hello

at site RO-14-ITIM, after a power failure I get the following problem

2024-08-06 15:53:04 Finished - job id: 
c9INDmclYv5ngvuSSqSAreymYz3jwmOETUEmV71LDmABFKDm7KNpMn, unix user: 
1900:1900, name: "org.nordugrid.ARC-CE-result-ops", owner: 
"/dc=eu/dc=egi/c=hr/o=robots/o=srce/cn=robot:argo-...@cro-ngi.hr", 
lrms: SLURM, queue: debug, lrmsid: 274399, failure: "LRMS error: (-1) 
Job missing from SLURM."
2024-08-06 15:53:04 Finished - job id: 
tjJNDmclYv5ngvuSSqSAreymYz3jwmOETUEmd71LDmABFKDmePf7To, unix user: 
1900:1900, name: "org.nordugrid.ARC-CE-result-ops", owner: 
"/dc=eu/dc=egi/c=hr/o=robots/o=srce/cn=robot:argo-...@cro-ngi.hr", 
lrms: SLURM, queue: debug, lrmsid: 274400, failure: "LRMS error: (-1) 
Job missing from SLURM."
2024-08-06 15:53:04 Finished - job id: 
kiJNDmclYv5ngvuSSqSAreymYz3jwmOETUEml71LDmABFKDmCmwifm, unix user: 
1900:1900, name: "org.nordugrid.ARC-CE-result-ops", owner: 
"/dc=eu/dc=egi/c=hr/o=robots/o=srce/cn=robot:argo-...@cro-ngi.hr", 
lrms: SLURM, queue: debug, lrmsid: 274398, failure: "LRMS error: (-1) 
Job missing from SLURM."


The jobs can not be seen in sinfo or squeue

And indication on how where to look up the problem?

Thank you

Felix



--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Find out submit host of past job?

2024-08-07 Thread Brian Andrus via slurm-users
If you need it, you could add it to either prologue or epilogue to store 
the info somewhere.


I do that for the scripts themselves and keep the past two weeks backed 
up so we can debug if/when there is an issue.


Brian Andrus

On 8/7/2024 6:29 AM, Steffen Grunewald via slurm-users wrote:

On Wed, 2024-08-07 at 08:55:21 -0400, Slurm users wrote:

Warning on that one, it can eat up a ton of database space (depending on
size of environment, uniqueness of environment between jobs, and number of
jobs). We had it on and it nearly ran us out of space on our database host.
That said the data can be really useful depending on the situation.

-Paul Edmon-

On 8/7/2024 8:51 AM, Juergen Salk via slurm-users wrote:

Hi Steffen,

not sure if this is what you are looking for, but with 
`AccountingStoreFlags=job_env´
set in slurm.conf, the batch job environment will be stored in the
accounting database and can later be retrieved with `sacct -j  
--env-vars´
command.

On Wed, 2024-08-07 at 14:56:30 +0200, Slurm users wrote:

What you're looking for might be doable simply by setting the
AccountStoreFlags parameter in slurm.conf. [1]

Be aware, though, that job_env has sometimes been reported to grow quite
large.

I see, I cannot have the cake and eat it at the same time.
Given the size of our users' typical env, I'm dropping the idea for now -
maybe this will come up again in the not-so-far future. (Maybe it's worth
a feature request?)

Thanks everyone!

- Steffen



--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Upgrade compute node to 24.05.2

2024-08-15 Thread Brian Andrus via slurm-users
It sounds like the new version was built with different options, and/or 
an install was not done via packages.


If you do use rpms, you could try:

    dnf provides /usr/lib64/slurm/mpi_none.so

If that shows a package that is installed, remove it. If it shows 
nothing, move the file elsewhere and ensure slurmd is happier.


Brian Andrus

On 8/14/24 17:52, Sid Young via slurm-users wrote:

G'Day all,

I've been upgrading cmy cluster from 20.11.0 in small steps to get to 
24.05.2. Currently 1 have all nodes on 23.02.8, the controller on 
24.05.2 and a single test node on 24.05.2. All are Centos 7.9 (upgrade 
to Oracle Linux 8.10 is Phase 2 of the upgrades).


When I check the slurmd status on the test node I get:

[root@hpc-dev-01 24.05.2]# systemctl status slurmd
● slurmd.service - Slurm node daemon
   Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled; 
vendor preset: disabled)

   Active: active (running) since Thu 2024-08-15 10:45:15 AEST; 24s ago
 Main PID: 46391 (slurmd)
    Tasks: 1
   Memory: 1.2M
   CGroup: /system.slice/slurmd.service
           └─46391 /usr/sbin/slurmd --systemd

Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: Considering each 
NUMA node as a socket
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: Node reconfigured 
socket/core boundaries SocketsPerBoard=4:8(hw) CoresPerSocket=16:8(hw)
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: Considering each 
NUMA node as a socket
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: slurmd version 
24.05.2 started
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: 
*plugin_load_from_file: Incompatible Slurm plugin 
/usr/lib64/slurm/mpi_none.so version (23.02.8)*
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: error: Couldn't load 
specified plugin name for mpi/none: Incompatible plugin version
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: error: MPI: Cannot 
create context for mpi/none

Aug 15 10:45:15 hpc-dev-01 systemd[1]: Started Slurm node daemon.
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: slurmd started on 
Thu, 15 Aug 2024 10:45:15 +1000
Aug 15 10:45:15 hpc-dev-01 slurmd[46391]: slurmd: CPUs=64 Boards=1 
Sockets=8 Cores=8 Threads=1 Memory=257778 TmpDisk=15998 Uptime=2898769 
CPUSpecL...ve=(null)

Hint: Some lines were ellipsized, use -l to show in full.
[root@hpc-dev-01 24.05.2]#

We don't use MPI (life science workloads)... should I remove the file? 
If it is version 23.02.8 then doesn't 24.05.2 have that plugin built 
in? There are no references to mpi i the slurm.conf file




Sid

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Unable to run sequential jobs simultaneously on the same node

2024-08-19 Thread Brian Andrus via slurm-users
IIRC, slurm parses the batch file as options until it hits the first 
non-comment line, which includes blank lines.


You may want to double-check some of the gaps in the option section of 
your batch script.


That being said and you say you removed the '&' at the end of the 
command, which would help.


If they are all exiting with exit code 9, you need to look at the code 
for your a.out to see what code 9 means, as that is who is raising that 
error. Slurm sees that and if it is non-zero, it interprets it as a 
failed job.


Brian Andrus

On 8/19/2024 12:50 AM, Arko Roy via slurm-users wrote:
Thanks Loris and Gareth. here is the job submission script. if you 
find any errors please let me know.
since i am not the admin but just an user, i think i dont have access 
to the prolog and epilogue files.


If the jobs are independent, why do you want to run them all on the same
node?
I am running sequential codes. Essentially 50 copies of the same node 
with a variation in parameter.
Since I am using the Slurm scheduler, the nodes and cores are 
allocated depending upon the
available resources. So there are instances, when 20 of them goes to 
20 free cores located on a particular
node and the rest 30 goes to the free 30 cores on another node. It 
turns out that only 1 job out of 20 and 1 job
out of 30 are completed succesfully with exitcode 0 and the rest gets 
terminated with exitcode 9.

for information, i run sjobexitmod -l jobid to check the exitcodes.

--
the submission script is as follows:



#!/bin/bash

# Setting slurm options



# lines starting with "#SBATCH" define your jobs parameters
# requesting the type of node on which to run job
##SBATCH --partition 
#SBATCH --partition=standard

# telling slurm how many instances of this job to spawn (typically 1)

##SBATCH --ntasks 
##SBATCH --ntasks=1
#SBATCH --nodes=1
##SBATCH -N 1
##SBATCH --ntasks-per-node=1



# setting number of CPUs per task (1 for serial jobs)

##SBATCH --cpus-per-task 

##SBATCH --cpus-per-task=1

# setting memory requirements

##SBATCH --mem-per-cpu 
#SBATCH --mem-per-cpu=1G

# propagating max time for job to run

##SBATCH --time 
##SBATCH --time 
##SBATCH --time 
#SBATCH --time 10:0:0
#SBATCH --job-name gstate

#module load compiler/intel/2018_4
module load fftw-3.3.10-intel-2021.6.0-ppbepka
echo "Running on $(hostname)"
echo "We are in $(pwd)"



# run the program

/home/arkoroy.sps.iitmandi/ferro-detun/input1/a_1.out &



--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: playing with --nodes=

2024-08-29 Thread Brian Andrus via slurm-users

Your --nodes line is incorrect:

*-N*,*--nodes*=[-/maxnodes/]|
   Request that a minimum of/minnodes/nodes be allocated to this job. A
   maximum node count may also be specified with/maxnodes/.

Looks like it ignored that and used ntasks with ntasks-per-node as 1, 
giving you 3 nodes. Check your logs and check your conf see what your 
defaults are.


Brian Andrus


On 8/29/2024 5:04 AM, Matteo Guglielmi via slurm-users wrote:

Hello,

I have a cluster with four Intel nodes (node[01-04], Feature=intel) and four 
Amd nodes (node[05-08], Feature=amd).

# job file

#SBATCH --ntasks=3
#SBATCH --nodes=2,4
#SBATCH --constraint="[intel|amd]"


env | grep SLURM


# slurm.conf


PartitionName=DEFAULT  MinNodes=1 MaxNodes=UNLIMITED


# log


SLURM_JOB_USER=software
SLURM_TASKS_PER_NODE=1(x3)
SLURM_JOB_UID=1002
SLURM_TASK_PID=49987
SLURM_LOCALID=0
SLURM_SUBMIT_DIR=/home/software
SLURMD_NODENAME=node01
SLURM_JOB_START_TIME=1724932865
SLURM_CLUSTER_NAME=cluster
SLURM_JOB_END_TIME=1724933465
SLURM_CPUS_ON_NODE=1
SLURM_JOB_CPUS_PER_NODE=1(x3)
SLURM_GTIDS=0
SLURM_JOB_PARTITION=nodes
SLURM_JOB_NUM_NODES=3
SLURM_JOBID=26
SLURM_JOB_QOS=lprio
SLURM_PROCID=0
SLURM_NTASKS=3
SLURM_TOPOLOGY_ADDR=node01
SLURM_TOPOLOGY_ADDR_PATTERN=node
SLURM_MEM_PER_CPU=0
SLURM_NODELIST=node[01-03]
SLURM_JOB_ACCOUNT=dalco
SLURM_PRIO_PROCESS=0
SLURM_NPROCS=3
SLURM_NNODES=3
SLURM_SUBMIT_HOST=master
SLURM_JOB_ID=26
SLURM_NODEID=0
SLURM_CONF=/etc/slurm/slurm.conf
SLURM_JOB_NAME=mpijob
SLURM_JOB_GID=1002

SLURM_JOB_NODELIST=node[01-03] <<<=== why three nodes? Shouldn't this still be 
two nodes?

Thank you.



-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: playing with --nodes=

2024-08-29 Thread Brian Andrus via slurm-users
It looks to me that you requested 3 tasks spread across 2 to 4 nodes. 
Realize --nodes is not targeting your nodes named 2 and 4, it is a count 
of how many nodes to use. You only needed 3 tasks/cpus, so that is what 
you were allocated and you have 1 cpu per node, so you get 3 (of up to 
4) nodes. Slurm does not give you 4 nodes because you only want 3 tasks.


You see the result in your variables:

SLURM_NNODES=3
SLURM_JOB_CPUS_PER_NODE=1(x3)

If you only want 2 nodes, make --nodes=2.

Brian Andrus

On 8/29/24 08:00, Matteo Guglielmi via slurm-users wrote:


Hi,


On sbatch's manpage there is this example for :


--nodes=1,5,9,13


so either one specifies [-maxnodes] OR .


I checked the logs, and there are no reported errors about wrong or ignored 
options.


MG


From: Brian Andrus via slurm-users
Sent: Thursday, August 29, 2024 4:11:25 PM
To:slurm-users@lists.schedmd.com
Subject: [slurm-users] Re: playing with --nodes=


Your --nodes line is incorrect:

-N, --nodes=[-maxnodes]|
Request that a minimum of minnodes nodes be allocated to this job. A maximum 
node count may also be specified with maxnodes.

Looks like it ignored that and used ntasks with ntasks-per-node as 1, giving 
you 3 nodes. Check your logs and check your conf see what your defaults are.

Brian Andrus


On 8/29/2024 5:04 AM, Matteo Guglielmi via slurm-users wrote:

Hello,

I have a cluster with four Intel nodes (node[01-04], Feature=intel) and four 
Amd nodes (node[05-08], Feature=amd).

# job file

#SBATCH --ntasks=3
#SBATCH --nodes=2,4
#SBATCH --constraint="[intel|amd]"


env | grep SLURM


# slurm.conf


PartitionName=DEFAULT  MinNodes=1 MaxNodes=UNLIMITED


# log


SLURM_JOB_USER=software
SLURM_TASKS_PER_NODE=1(x3)
SLURM_JOB_UID=1002
SLURM_TASK_PID=49987
SLURM_LOCALID=0
SLURM_SUBMIT_DIR=/home/software
SLURMD_NODENAME=node01
SLURM_JOB_START_TIME=1724932865
SLURM_CLUSTER_NAME=cluster
SLURM_JOB_END_TIME=1724933465
SLURM_CPUS_ON_NODE=1
SLURM_JOB_CPUS_PER_NODE=1(x3)
SLURM_GTIDS=0
SLURM_JOB_PARTITION=nodes
SLURM_JOB_NUM_NODES=3
SLURM_JOBID=26
SLURM_JOB_QOS=lprio
SLURM_PROCID=0
SLURM_NTASKS=3
SLURM_TOPOLOGY_ADDR=node01
SLURM_TOPOLOGY_ADDR_PATTERN=node
SLURM_MEM_PER_CPU=0
SLURM_NODELIST=node[01-03]
SLURM_JOB_ACCOUNT=dalco
SLURM_PRIO_PROCESS=0
SLURM_NPROCS=3
SLURM_NNODES=3
SLURM_SUBMIT_HOST=master
SLURM_JOB_ID=26
SLURM_NODEID=0
SLURM_CONF=/etc/slurm/slurm.conf
SLURM_JOB_NAME=mpijob
SLURM_JOB_GID=1002

SLURM_JOB_NODELIST=node[01-03] <<<=== why three nodes? Shouldn't this still be 
two nodes?

Thank you.





-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: playing with --nodes=

2024-08-30 Thread Brian Andrus via slurm-users


Looks like it is not doing what you think it should. It does state:

If the number of tasks is given and a number of requested nodes is also 
given, the number of nodes used from that request will be reduced to 
match that of the number of tasks if the number of nodes in the request 
is greater than the number of tasks.


Your number of nodes was reduced because you only requested 3 tasks.

You are correct that it does not specify in detail what more than one 
value would do beyond making it a range of min/max. The way I read it, 
you can put multiple values and the other value(s) would be a max nodes. 
The "size_string" looks like parsing code similar to that used for 
arrays. Nowhere does it say what the effect of that is for --nodes.


You can either figure out what you want to actually do and do that, or 
open a bug with schedmd to add some clarity to the documentation. They 
are more than happy to do that.


Brian Andrus


On 8/29/2024 11:48 PM, Matteo Guglielmi via slurm-users wrote:

I'm sorry, but I still don't get it.


Isn't --nodes=2,4 telling slurm to allocate 2 OR 4 nodes and nothing else?


So, if:


--nodes=2 allocates only two nodes

--nodes=4 allocates only four nodes

--nodes=1-2 allocates min one and max two nodes

--nodes=1-4 allocates min one and max four nodes


what is the allocation rule for --nodes=2,4 which is the so-called size_string 
allocation?


man sbatch says:


Node count can also be specified as size_string. The size_string specification 
identifies what nodes

values should be used. Multiple values may be specified using a comma separated 
list or with a step

function by suffix containing a colon and number values with a "-" separator.

For example, "--nodes=1-15:4" is equivalent to "--nodes=1,5,9,13".

...

The job will be allocated as many nodes as possible within the range specified 
and without delaying the

initiation of the job.

____
From: Brian Andrus via slurm-users
Sent: Thursday, August 29, 2024 7:27:44 PM
To:slurm-users@lists.schedmd.com
Subject: [slurm-users] Re: playing with --nodes=


It looks to me that you requested 3 tasks spread across 2 to 4 nodes. Realize 
--nodes is not targeting your nodes named 2 and 4, it is a count of how many 
nodes to use. You only needed 3 tasks/cpus, so that is what you were allocated 
and you have 1 cpu per node, so you get 3 (of up to 4) nodes. Slurm does not 
give you 4 nodes because you only want 3 tasks.

You see the result in your variables:

SLURM_NNODES=3
SLURM_JOB_CPUS_PER_NODE=1(x3)



If you only want 2 nodes, make --nodes=2.

Brian Andrus

On 8/29/24 08:00, Matteo Guglielmi via slurm-users wrote:

Hi,


On sbatch's manpage there is this example for :


--nodes=1,5,9,13


so either one specifies [-maxnodes] OR .


I checked the logs, and there are no reported errors about wrong or ignored 
options.


MG

____
From: Brian Andrus via 
slurm-users<mailto:slurm-users@lists.schedmd.com>
Sent: Thursday, August 29, 2024 4:11:25 PM
To:slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>
Subject: [slurm-users] Re: playing with --nodes=


Your --nodes line is incorrect:

-N, --nodes=[-maxnodes]|
Request that a minimum of minnodes nodes be allocated to this job. A maximum 
node count may also be specified with maxnodes.

Looks like it ignored that and used ntasks with ntasks-per-node as 1, giving 
you 3 nodes. Check your logs and check your conf see what your defaults are.

Brian Andrus


On 8/29/2024 5:04 AM, Matteo Guglielmi via slurm-users wrote:

Hello,

I have a cluster with four Intel nodes (node[01-04], Feature=intel) and four 
Amd nodes (node[05-08], Feature=amd).

# job file

#SBATCH --ntasks=3
#SBATCH --nodes=2,4
#SBATCH --constraint="[intel|amd]"


env | grep SLURM


# slurm.conf


PartitionName=DEFAULT  MinNodes=1 MaxNodes=UNLIMITED


# log


SLURM_JOB_USER=software
SLURM_TASKS_PER_NODE=1(x3)
SLURM_JOB_UID=1002
SLURM_TASK_PID=49987
SLURM_LOCALID=0
SLURM_SUBMIT_DIR=/home/software
SLURMD_NODENAME=node01
SLURM_JOB_START_TIME=1724932865
SLURM_CLUSTER_NAME=cluster
SLURM_JOB_END_TIME=1724933465
SLURM_CPUS_ON_NODE=1
SLURM_JOB_CPUS_PER_NODE=1(x3)
SLURM_GTIDS=0
SLURM_JOB_PARTITION=nodes
SLURM_JOB_NUM_NODES=3
SLURM_JOBID=26
SLURM_JOB_QOS=lprio
SLURM_PROCID=0
SLURM_NTASKS=3
SLURM_TOPOLOGY_ADDR=node01
SLURM_TOPOLOGY_ADDR_PATTERN=node
SLURM_MEM_PER_CPU=0
SLURM_NODELIST=node[01-03]
SLURM_JOB_ACCOUNT=dalco
SLURM_PRIO_PROCESS=0
SLURM_NPROCS=3
SLURM_NNODES=3
SLURM_SUBMIT_HOST=master
SLURM_JOB_ID=26
SLURM_NODEID=0
SLURM_CONF=/etc/slurm/slurm.conf
SLURM_JOB_NAME=mpijob
SLURM_JOB_GID=1002

SLURM_JOB_NODELIST=node[01-03] <<<=== why three nodes? Shouldn't this still be 
two nodes?

Thank you.







-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Bug? sbatch not respecting MaxMemPerNode setting

2024-09-04 Thread Brian Andrus via slurm-users

Angel,

Unless you are using cgroups and constraints, there is no limit imposed. 
The numbers are used by slurm to track what is available, not what you 
may/may not use. So you could tell slurm the node only has 1GB and it 
will not let you request more than that, but if you do request only 1GB, 
without specific configuration, there is nothing stopping you from using 
more than that.


So your request did not exceed what slurm sees as available (1 cpu using 
4GB), so it is happy to let your script run. I suspect if you look at 
the usage, you will see that 1 cpu spiked high while the others did nothing.


Brian Andrus

On 9/4/2024 1:37 AM, Angel de Vicente via slurm-users wrote:

Hello,

we found an issue with Slurm 24.05.1 and the MaxMemPerNode
setting. Slurm is installed in a single workstation, and thus, the
number of nodes is just 1.

The relevant sections in slurm.conf read:

,
| EnforcePartLimits=ALL
| PartitionName=short   Nodes=. State=UP Default=YES MaxTime=2-00:00:00 
 MaxCPUsPerNode=76  MaxMemPerNode=231000 OverSubscribe=FORCE:1
`

Now, if I submit a job requesting 76 CPUs and each one needing 4000M
(for a total of 304000M), Slurm does indeed respect the MaxMemPerNode
setting and the job is not submitted in the following cases ("-N 1" is
not really necessary, as there is only one node):

,
| $ sbatch -N 1 -n 1 -c 76 -p short --mem-per-cpu=4000M test.batch
| sbatch: error: Batch job submission failed: Memory required by task is not 
available
|
| $ sbatch -N 1 -n 76 -c 1 -p short --mem-per-cpu=4000M test.batch
| sbatch: error: Batch job submission failed: Memory required by task is not 
available
|
| $ sbatch -n 1 -c 76 -p short --mem-per-cpu=4000M test.batch
| sbatch: error: Batch job submission failed: Memory required by task is not 
available
`


But with this submission Slurm is happy:

,
| $ sbatch -n 76 -c 1 -p short --mem-per-cpu=4000M test.batch
| Submitted batch job 133982
`

and the slurmjobcomp.log file does indeed tell me that the memory went
above MaxMemPerNode:

,
| JobId=133982 UserId=..(10487) GroupId=domain users(2000) Name=test 
JobState=CANCELLED Partition=short TimeLimit=45 StartTime=2024-09-04T09:11:17 
EndTime=2024-09-04T09:11:24 NodeList=.. NodeCnt=1 ProcCnt=76 WorkDir=/tmp/. 
ReservationName= Tres=cpu=76,mem=304000M,node=1,billing=76 Account=ddgroup 
QOS=domino WcKey= Cluster=.. SubmitTime=2024-09-04T09:11:17 
EligibleTime=2024-09-04T09:11:17 DerivedExitCode=0:0 ExitCode=0:0
`


What is the best way to report issues like this to the Slurm developers?
I thought of adding it to https://support.schedmd.com/ but it is not
clear to me if that page is only meant for Slurm users with a Support
Contract?

Cheers,


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: salloc not starting shell despite LaunchParameters=use_interactive_step

2024-09-06 Thread Brian Andrus via slurm-users
Folks have addressed the obvious config settings, but also check your 
prolog/epilog scripts/settings as well as the .bashrc/.bash_profile and 
stuff in /etc/profile.d/

That may be hanging it up.

Brian Andrus

On 9/5/2024 5:17 AM, Loris Bennett via slurm-users wrote:

Hi,

With

   $ salloc --version
   slurm 23.11.10

and

   $ grep LaunchParameters /etc/slurm/slurm.conf
   LaunchParameters=use_interactive_step

the following

   $ salloc  --partition=interactive --ntasks=1 --time=00:03:00 --mem=1000 
--qos=standard
   salloc: Granted job allocation 18928869
   salloc: Nodes c001 are ready for job

creates a job

   $ squeue --me
JOBID PARTITION NAME USER ST   TIME  NODES 
NODELIST(REASON)
 18928779 interacti interactloris  R   1:05  1 c001

but causes the terminal to block.

 From a second terminal I can log into the compute node:

   $ ssh c001
   [13:39:36] loris@c001 (1000) ~

Is that the expected behaviour or should salloc return a shell directly
on the compute node (like srun --pty /bin/bash -l used to do)?

Cheers,

Loris



--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: How do you guys track which GPU is used by which job ?

2024-10-16 Thread Brian Andrus via slurm-users
Looks like there is a step you would need to do to create the required 
job mapping files:


/The DCGM-exporter can include High-Performance Computing (HPC) job 
information into its metric labels. To achieve this, HPC environment 
administrators must configure their HPC environment to generate files 
that map GPUs to HPC jobs./


It does go on to show the conventions/format of the files.

I imagine you could have some bits in a prologue script that creates 
that as the job starts on the node and point dcgm-exporter there.


Brian Andrus

On 10/16/24 06:10, Sylvain MARET via slurm-users wrote:

Hey guys !

I'm looking to improve GPU monitoring on our cluster. I want to 
install this https://github.com/NVIDIA/dcgm-exporter and saw in the 
README that it can support tracking of job id : 
https://github.com/NVIDIA/dcgm-exporter?tab=readme-ov-file#enabling-hpc-job-mapping-on-dcgm-exporter


However I haven't been able to see any examples on how to do it nor 
does slurm seem to expose this information by default.
Does anyone do this here ? And if so do you have any examples I could 
try to follow ? If you have advise on best practices to monitor GPU 
I'd be happy to hear it out !


Regards,
Sylvain Maret


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Change primary alloc node

2024-11-03 Thread Brian Andrus via slurm-users

Bhaskar,

As I think about it, that assignment of process 0's node may well be 
something that is from your mpi, since that is where you can decide how 
to layout the processes (pack a node or equally, etc). I would look at 
the options/settings that apply to the particular flavor of mpi you are 
using.


For example, in openmpi, it has:

    To order processes' ranks in MPI_COMM_WORLD:

   --rank-by 
  Rank in round-robin fashion according to the specified 
object, defaults to slot. Supported options include slot, hwthread, 
core, L1cache, L2cache, L3cache, socket, numa, board, and node.


Brian Andrus

On 11/3/2024 12:06 AM, Bhaskar Chakraborty wrote:

Hi Brian,
Thanks for the response!
However, this particular approach where we need to accept whatever 
slurm gives us as starting node

and deal with it accordingly doesn’t work for us.

I think there should be flexibility in slurm to switch the starting 
node as requested,

through some C API. This is possible in other scheduling system like LSF.

Any other way to do this with the current slurm code base is welcome.

Regards,
Bhaskar.


Sent from Yahoo Mail for iPad 
<https://mail.onelink.me/107872968?pid=nativeplacement&c=Global_Acquisition_YMktg_315_Internal_EmailSignature&af_sub1=Acquisition&af_sub2=Global_YMktg&af_sub3=&af_sub4=10604&af_sub5=EmailSignature__Static_>


On Friday, November 1, 2024, 1:12 AM, Brian Andrus via slurm-users 
 wrote:


Likely many ways to do this, but if you have some code that is
dependent on something, that check could be in the code itself.

So instead of process 0 being the required process to run, it
would be whichever process meets the requirements.

eg:

case hostname:
harold)
    Run harold's stuff here
*)
    Run all other stuff here
esac

Takes some coding effort but keeps control of the processes within
your own code.

Brian Andrus

On 10/30/24 09:35, Bhaskar Chakraborty via slurm-users wrote:
Hi,

Is there a way to change/control the primary node (i.e. where the
initial task starts) as part of a job's allocation.

For eg, if a job requires 6 CPUs & its allocation is distributed
over 3 hosts h1, h2 & h3 I find that it always starts the task in
1 particular
node (say h1) irrespective of how many slots were available in the
hosts.

Can we somehow let slurm have the primary node as h2?

Is there any C-API inside select plugin which can do this trick if
we were to control it through the configured select plugin?

Thanks.
-Bhaskar.


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com

To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Change primary alloc node

2024-10-31 Thread Brian Andrus via slurm-users
Likely many ways to do this, but if you have some code that is dependent 
on something, that check could be in the code itself.


So instead of process 0 being the required process to run, it would be 
whichever process meets the requirements.


eg:

case hostname:
harold)
    Run harold's stuff here
*)
    Run all other stuff here
esac

Takes some coding effort but keeps control of the processes within your 
own code.


Brian Andrus

On 10/30/24 09:35, Bhaskar Chakraborty via slurm-users wrote:

Hi,

Is there a way to change/control the primary node (i.e. where the 
initial task starts) as part of a job's allocation.


For eg, if a job requires 6 CPUs & its allocation is distributed over 
3 hosts h1, h2 & h3 I find that it always starts the task in 1 particular

node (say h1) irrespective of how many slots were available in the hosts.

Can we somehow let slurm have the primary node as h2?

Is there any C-API inside select plugin which can do this trick if we 
were to control it through the configured select plugin?


Thanks.
-Bhaskar.

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: what updates NODEADDR

2024-09-21 Thread Brian Andrus via slurm-users

IIRC, you need to ensure reverse lookup for DNS matches your nodename

Brian Andrus

On 9/20/2024 4:55 PM, Jakub Szarlat via slurm-users wrote:

Hi

I'm using dynamic nodes with "slurmd -Z" with slurm 23.11.1.
Firstly I find that when you do "scontrol show node" it shows the NODEADDR as 
ip rather than the NODENAME. Because I'm playing around with running this in containers 
on docker swarm I find this ip can be wrong. I can force it with scontrol update however 
after a while something updates it to something else again. Does anybody know if this is 
done by slurmd or slurmctld or something else?
How can I stop this from happening?
How can I get the node to register with the hostname rather than ip?

cheers,
Jakub




--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: sinfo not listing any partitions

2024-12-02 Thread Brian Andrus via slurm-users

You only have one partition named 'default'
You are not allowed to name it that. Name it something else and you 
should be good.


Brian Andrus

On 11/28/2024 6:52 AM, Patrick Begou via slurm-users wrote:

Hi Kent,

on your management node could you run:
systemctl status slurmctld

and check your 'Nodename=' and 'PartitionName=...' in 
/etc/slurm.conf ? In my slurm.conf I have a more detailed description 
and the Nodename Keyword start with an upper case (do'nt know if 
slurm.conf is case sensitive) :


NodeName=kareline-0-[0-3]  Sockets=2 CoresPerSocket=6 ThreadsPerCore=1 
RealMemory=47900


and it looks like your nodes description is not understood by slurm.

Patrick


Le 27/11/2024 à 17:46, Ryan Novosielski via slurm-users a écrit :
At this point, I’d probably crank up the logging some and see what 
it’s saying in slurmctld.log.


--
#BlackLivesMatter

|| \\UTGERS, |---*O*---
||_// the State |         Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ 
RBHS Campus
||  \\    of NJ | Office of Advanced Research Computing - MSB 
A555B, Newark

     `'


On Nov 27, 2024, at 11:38, Kent L. Hanson  wrote:

Hey Ryan,
I have restarted the slurmctld and slurmd services several times. I 
hashed the slurm.conf files. They are the same. I ran “sinfo -a” as 
root with the same result.

Thanks,

Kent
*From:*Ryan Novosielski 
*Sent:*Wednesday, November 27, 2024 9:31 AM
*To:*Kent L. Hanson 
*Cc:*slurm-users@lists.schedmd.com
*Subject:*Re: [slurm-users] sinfo not listing any partitions
If you’re sure you’ve restarted everything after the config change, 
are you also sure that you don’t have that stuff hidden from your 
current user? You can try -a to rule that out. Or run as root.

--
#BlackLivesMatter

|| \\UTGERS , 
|---*O*---

||_// the State |         Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ 
RBHS Campus
||  \\    of NJ | Office of Advanced Research Computing - MSB 
A555B, Newark

     `'


On Nov 27, 2024, at 09:56, Kent L. Hanson via slurm-users
 wrote:
I am doing a new install of slurm 24.05.3 I have all the
packages built and installed on headnode and compute node with
the same munge.key, slurm.conf, and gres.conf file. I was able
to run munge and unmunge commands to test munge successfully.
Time is synced with chronyd. I can’t seem to find any useful
errors in the logs. For some reason when I run sinfo no nodes
are listed. I just see the headers for each column. Has anyone
seen this or know what a next step of troubleshooting would be?
I’m new to this and not sure where to go from here. Thanks for
any and all help!
The odd output I am seeing
[username@headnode ~] sinfo
PARTITION AVAIL    TIMELIMIT NODES   STATE NODELIST
*/(Nothing is output showing status of partition or nodes)/*
Slurm.conf
ClusterName=slurmkvasir
SlurmctldHost=kadmin2
MpiDefault=none
ProctrackType=proctrack/cgroup
PrologFlags=contain
ReturnToService=2
SlurmctldPidFile=/var/run/slurm/slurmctld.pid
SlurmctldPort=6817
SlurmPidFile=/var/run/slurm/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
StateSaveLocation=/var/spool/slurmctld
TaskPlugin=task/cgroup
MinJobAge=600
SchedulerType=sched/backfill
SelectType=select/cons_tres
PriorityType=priority/multifactor
AccountingStorageHost=localhost
AccountingStoragePass=/var/run/munge/munge.socket.2
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageTRES=gres/gpu,cpu,node
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/cgroup
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=info
SlurmLogFile=/var/log/slurm/slurmd.log
nodeName=k[001-448]
PartitionName=default Nodes=k[001-448] Default=YES
MaxTime=INFINITE State=up
Slurmctld.log
Error: Configured MailProg is invalid
Slurmctld version 24.05.3 started on cluster slurmkvasir
Accounting_storage/slurmdbd:
clusteracct_storage_p_register_ctld: Regisetering slurmctld at
port 8617
Error: read_slurm_conf: default partition not set.
Revovered state of 448 nodes
Down nodes: k[002-448]
Recovered information about 0 jobs
Revovered state of 0 reservations
Read_slurm_conf: backup_controller not specified
Select/cons_tres; select_p_reconfigure: select/cons_tres:
reconfigure
Running as primary controller
Slurmd.log
Error: Node configuration differs from hardware: CPUS=1:40(hw)
Boards=1:1(hw) SocketsPerBoard=1:2(hw) CoresPerSocket=1:20(hw)
ThreadsPerCore:1:1(hw)
CPU frequency setting not configured for this node
Slurmd version 24.05.3started
Slurmd started on Wed, 27 Nov 2

[slurm-users] Re: How to clean up?

2025-02-04 Thread Brian Andrus via slurm-users

Steven,


Looks like you may have had a secondary controller that took over and 
changed your StateSave files.


IF you don't need the job info AND no jobs are running, you can just 
rename/delete your StateSaveLocation directory and things will be 
recreated. Job numbers will start over (unless you set FirstJobId, which 
you should if you want to keep your sacct data).


It also looks like your logging does not have permissions. Change 
SlurmctldLogFile to be something like /var/log/slurm/slurmctld.log and 
set the owner of /var/log/slurm to the slurm user.



Ensure all slurmctld daemons are down, then start the first. Once it is 
up (you can run scontrol show config) start the second. Run 'scontrol 
show config' again and you should see both daemons listed as 'up at the 
end of the output.



-Brian Andrus


On 2/3/2025 7:29 PM, Steven Jones via slurm-users wrote:

>From the logs 2 errors,


8><---
Feb 04 03:08:48 vuwunicoslurmd1.ods.vuw.ac.nz systemd[1]: Starting 
Slurm controller daemon...
Feb 04 03:08:48 vuwunicoslurmd1.ods.vuw.ac.nz slurmctld[1045020]: 
slurmctld: error: chdir(/var/log): Permission denied
Feb 04 03:08:48 vuwunicoslurmd1.ods.vuw.ac.nz slurmctld[1045020]: 
slurmctld: slurmctld version 24.11.1 started on cluster poc-cluster(2175)
Feb 04 03:08:48 vuwunicoslurmd1.ods.vuw.ac.nz systemd[1]: Started 
Slurm controller daemon.
Feb 04 03:08:48 vuwunicoslurmd1.ods.vuw.ac.nz slurmctld[1045020]: 
slurmctld: fatal: Can not recover assoc_usage state, incompatible 
version, got 9728 need >= 9984 <= 10752, start with '-i' to ignore 
this. Warning: using -i will lose the data that can't be recovered.
Feb 04 03:08:48 vuwunicoslurmd1.ods.vuw.ac.nz systemd[1]: 
slurmctld.service: Main process exited, code=exited, status=1/FAILURE
Feb 04 03:08:48 vuwunicoslurmd1.ods.vuw.ac.nz systemd[1]: 
slurmctld.service: Failed with result 'exit-code'.


No idea on "slurmctld: error: chdir(/var/log): Permission denied"  
need more info but the log seems to be written to OK as we can see.


"fatal: Can not recover assoc_usage state, incompatible version,"

This seems to be me attempting to upgrade from ver22   to ver24  but 
google tells me ver22 "left a mess" and ver24 cant cope. Where would I 
go looking to clean up please?


regards

Steven



-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] multiple conf-server entries for sackd

2024-12-03 Thread Brian Andrus via slurm-users

Not sure anyone would know, but...

If you are running slurm in HA mode (multiple SlurmctldHost entries) is 
it possible to point sackd to more than one using the --conf-server option?
So either specify --conf-server more than once, or have a 
comma-delimited list of them?


The docs are a little light about that.

Brian Andrus


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: All GPUs are Usable if no Gres is Defined

2025-01-04 Thread Brian Andrus via slurm-users
Ensure cgroups is working and configured to limit access to devices 
(which includes gpus).


Check your cgroup.conf to see that there is an entry for:

    ConstrainDevices=yes


Brian Andrus


On 1/3/2025 10:49 AM, Groner, Rob via slurm-users wrote:
I'm not entirely sure, and I can't vouch for differences in  a 
(relatively) older version of slurm But I'm pretty sure on our 
cluster, we have to specify the GRES in the partition in order for 
Slurm to treat them as allocatable resources.  On our interactive 
nodes, we have GPUs but we don't list them as a GRES in the partition, 
which lets anyone on those nodes use them.  On our other partitions, 
we do specify the GRES, and that prevents a user from accessing them 
unless they specify --gres.


Rob



*From:* Jacob Gordon via slurm-users 
*Sent:* Friday, January 3, 2025 11:32 AM
*To:* slurm-users@lists.schedmd.com 
*Subject:* [slurm-users] All GPUs are Usable if no Gres is Defined

Hello,

We have a two node GPU cluster with 8 NVidia GPUs. GRES is currently 
configured and works if a user defines it within their 
sbtach/interactive job submission (--gres=gpu:3). Users only have 
access to the GPUs they request. However, when they omit 
 “--gres=gpu:n”, they can use every GPU, which interferes with running 
jobs that used the gres option. I’m at a loss as to why this is 
happening. Can someone please look at our configuration to see if 
anything stands out?



SLURM Version = 21.08.5


*_Slurm.conf_*

ClusterName=ommit

SlurmctldHost=headnode

ProctrackType=proctrack/cgroup

ReturnToService=2

SlurmdPidFile=/run/slurmd.pid

SlurmdSpoolDir=/var/lib/slurm/slurmd

StateSaveLocation=/var/lib/slurm/slurmctld

SlurmUser=slurm

TaskPlugin=task/cgroup

SchedulerType=sched/backfill

SelectType=select/cons_tres

SelectTypeParameters=CR_Core_Memory

AccountingStorageType=accounting_storage/slurmdbd

# AccountingStorageType for other resources

#

AccountingStorageTRES=gres/gpu

#DebugFlags=CPU_Bind,gres

JobCompType=jobcomp/none

JobAcctGatherType=jobacct_gather/cgroup

SlurmctldDebug=info

SlurmctldLogFile=/var/log/slurm/slurmctld.log

SlurmdDebug=info

SlurmdLogFile=/var/log/slurm/slurmd.log

DefMemPerCPU=4000

#NodeName=n01 CPUs=256 Boards=1 SocketsPerBoard=2 CoresPerSocket=64 
ThreadsPerCore=2 RealMemory=100


NodeName=n01 Gres=gpu:nvidia-l40:8 CPUs=256 Boards=1 SocketsPerBoard=2 
CoresPerSocket=64 ThreadsPerCore=2 RealMemory=100


NodeName=n02 Gres=gpu:nvidia-l40:8 CPUs=256 Boards=1 SocketsPerBoard=2 
CoresPerSocket=64 ThreadsPerCore=2 RealMemory=100


#Gres config for GPUs

GresTypes=gpu

PreemptType=preempt/qos

PreemptMode=REQUEUE

# reset usage after 1 week

PriorityUsageResetPeriod=WEEKLY

# The job's age factor reaches 1.0 after waiting in the

# queue for 2 weeks.

PriorityMaxAge=14-0

# This next group determines the weighting of each of the

# components of the Multifactor Job Priority Plugin.

# The default value for each of the following is 1.

PriorityWeightAge=1000

PriorityWeightFairshare=1

PriorityWeightJobSize=1000

PriorityWeightPartition=1000

PriorityWeightQOS=1500

# Primary partitions

PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP

PartitionName=all Nodes=n01,n02 Default=YES MaxTime=01:00:00 
DefaultTime=00:30:00 State=UP


PartitionName=statds Nodes=n01 Default=NO MaxTime=48:00:00 State=UP 
Priority=100 State=UP OverSubscribe=FORCE AllowAccounts=statds


PartitionName=phil Nodes=n02 Default=NO MaxTime=48:00:00 State=UP 
Priority=100 State=UP OverSubscribe=FORCE AllowAccounts=phil


#Set up condo mode

# Condo partitions

PartitionName=phil_condo Nodes=n02 Default=NO MaxTime=48:00:00 
DefaultTime=00:01:00 State=UP Priority=50 OverSubscribe=FORCE 
AllowQos=normal


PartitionName=statds_condo Nodes=n01 Default=NO MaxTime=48:00:00 
DefaultTime=00:01:00 State=UP Priority=50 OverSubscribe=FORCE 
AllowQos=normal


JobSubmitPlugins=lua

*_Gres.conf_*

NodeName=n01 Name=gpu Type=nvidia-l40 File=/dev/nvidia0

NodeName=n01 Name=gpu Type=nvidia-l40 File=/dev/nvidia1

NodeName=n01 Name=gpu Type=nvidia-l40 File=/dev/nvidia2

NodeName=n01 Name=gpu Type=nvidia-l40 File=/dev/nvidia3

NodeName=n01 Name=gpu Type=nvidia-l40 File=/dev/nvidia4

NodeName=n01 Name=gpu Type=nvidia-l40 File=/dev/nvidia5

NodeName=n01 Name=gpu Type=nvidia-l40 File=/dev/nvidia6

NodeName=n01 Name=gpu Type=nvidia-l40 File=/dev/nvidia7

NodeName=n02 Name=gpu Type=nvidia-l40 File=/dev/nvidia0

NodeName=n02 Name=gpu Type=nvidia-l40 File=/dev/nvidia1

NodeName=n02 Name=gpu Type=nvidia-l40 File=/dev/nvidia2

NodeName=n02 Name=gpu Type=nvidia-l40 File=/dev/nvidia3

NodeName=n02 Name=gpu Type=nvidia-l40 File=/dev/nvidia4

NodeName=n02 Name=gpu Type=nvidia-l40 File=/dev/nvidia5

NodeName=n02 Name=gpu Type=nvidia-l40 File=/dev/nvidia6

NodeName=n02 Name=gpu Type=nvidia-l40 File=/dev/nvidia7

*_Cgroup.conf_*

CgroupMountpoint="/sys/fs/cgroup"

CgroupAutomount=yes

CgroupReleaseAgen

[slurm-users] Re: Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions

2025-01-04 Thread Brian Andrus via slurm-users

Run 'sinfo -R' to see the reason any nodes may be down.

It may be as simple as running 'scontrol update state=resume 
nodename=' to bring them back, if they are down. It depends on the 
reason they went down (if that is the issue).


Otherwise, check the job requirements to see what it is asking for that 
does not exist 'scontrol show job xxx'


Brian Andrus

On 1/4/2025 3:41 AM, John Hearns via slurm-users wrote:

Output of sinfo and squeue

Look at slurmd log in an example node also
Tail -f is your friend

On Sat, Jan 4, 2025, 8:13 AM sportlecon sportlecon via slurm-users 
 wrote:


JOBID PARTITION     NAME       USER      ST       TIME  NODES
NODELIST(REASON)
                26       cpu myscript    user1  PD       0:00    
4         (Nodes required for job are DOWN, DRAINED or reserved
for jobs in higher priority partitions)
Anyone can help to  fix this?

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com

To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: 回复: Re: how to set slurmdbd.conf if using two slurmdb node with HA database?

2025-02-20 Thread Brian Andrus via slurm-users

Daniel,

One way to set up a true HA is to configure master-master SQL instances 
on both head nodes. Then have each slurmdbd point to the other SQL 
instance as the backup host.


This is likely not necessary as all data going to slurmdbd is cached if 
slurmdbd is unavailable. In the real world, this generally gives ample 
time to recover without issue.


Brian Andrus

On 2/20/2025 6:45 PM, hermes via slurm-users wrote:


Thank you for your insightful suggestions. Placing both slurmdbd and 
slurmctld on the same node is indeed a new structure  that we hadn’t 
considered before, and it seems to provide a much clearer logic for 
deployment.


Regarding the usage of DbdBackupHost, I would like to confirm my 
understanding of how it works. Is it mean that the DbdBackupHost 
option will only be referenced when slurmdbd service detects its local 
database (specified by StorageHost) is unavailable? And I guess in 
that case, the first slurmdbd service would act as a proxy who 
forwards requests to the DbdBackupHost and returns the data from there 
to slurmctld?


*发件人:*Daniel Letai 
*发送时间:*2025年2月20日21:56
*收件人:*taleinterve...@sjtu.edu.cn
*抄送:*slurm-users@lists.schedmd.com
*主题:*Re: [slurm-users] Re: how to set slurmdbd.conf if using two 
slurmdb node with HA database?


It's functionally the same with one difference - the configuration 
file is unmodified between nodes, allowing for simple deployment of 
nodes, and automation.


Regarding the backuphost - that depends on your setup. If you can 
ensure the slurmdbd service will stop if the local db replica is not 
healthy, you shouldn't need backuphost. Conversely, if there is no 
health check to ensure replica readiness, configure the backuphost. 
This will require using a different conf file for each node, unless 
setting up a more robust HA clustering scheme.


The other option is to separate the dbd from the db. Put the dbd on 
the ctld nodes (A,B) and let nodes C,D only be DB master replica (not 
dbd).


In slurm.conf on nodes A,B You will then have:

AccountingStorageHost = localhost

(without AccountingStorageBackupHost)

And in slurmdbd.conf you will have:

DbdHost = localhost

(without DbdBackupHost)

StorageHost = nodeC

StorageBackupHost = nodeD

This would mean identical slurm.conf and slurmdbd.conf on both nodes 
A,B, and no slurm conf files or processes on nodes C,D.


This setup assumes that the entire stack (ctld+dbd) is either working 
or not, which is usually true, as either the node is functioning or 
not. If the ctld is working but dbd is not, you will loose connection 
to the DB. If the ctld is not working, the other ctld will take charge 
and use its local dbd, so that scenario is covered.


Adding AccountingStorageBackupHost pointing to the other node is of 
course possible, but will mean different slurm.conf files which slurm 
will complain about.


It will mean that most of the time you will not load balance on the 
multi-master DB replicas. Whether that is a consideration or not is 
for you to decide.


On 20/02/2025 3:57, taleinterve...@sjtu.edu.cn wrote:

Do you mean the second configuration scheme?

I think configuring `dbdhost=localhost` is the same as configuring
` DbdAddr =nodeC` and ` DbdAddr =nodeD` on the two nodes respectively.

The key point is whether we should set the DbdBackupHost option
and how it work?

*发件人:*Daniel Letai  
*发送时间:*2025年2月19日18:21
*收件人:*slurm-users@lists.schedmd.com
*主题:*[slurm-users] Re: how to set slurmdbd.conf if using two
slurmdb node with HA database?

I'm not sure it will work, didn't test it, but could you just do
`dbdhost=localhost` to solve this?

On 18/02/2025 11:59, hermes via slurm-users wrote:

The deployment scenario is as follows:

*nodeA**nodeB*

(slurmctld)   (backup slurmctld)

| \---/ |

| /   \ |

*nodeC**nodeD*

(slurmdbd)  (backup slurmdbd)

(mysql)   <--multi master replica-->  (mysql)

Since the database is multi-master replicated, the slurmdbd
should only talk to the mysql on its own node.

In such case, how should we set the slurmdbd.conf? The conf
file contains options “DbdAddr”, “DbdHost”and “DbdBackupHost”.

Should they be consistent between nodeA-2 and nodeB-2? Such as:

DbdAddr = nodeC  | DbdAddr = nodeC

DbdHost = nodeC  | DbdHost = nodeC

DbdBackupHost = nodeD    | DbdBackupHost = nodeD

StorageHost = nodeC   | StorageHost = nodeD

Or maybe just set different conf and don’t use the
“DbdBackupHost”like:

DbdAddr = nodeC | DbdAddr = nodeD

DbdHost = nodeC     | DbdHost = nodeD

StorageHost = nodeC  | StorageHost = nodeD

I’m quite confused about the usage of DbdAddr and

[slurm-users] sllurmrestd via unix socket

2025-04-10 Thread Brian Andrus via slurm-users

All,

Maybe someone has seen this.
I have slurmrestd running listening on port 8081 as well as a unix 
socket at /run/slurmrestd/slurmrestd.sock


I am able to query the port with curl and do a ping. Everything seems 
fine. Other commands work as well.


  "pings": [
    {
  "hostname": "head01",
  "pinged": "UP",
  "responding": true,
  "latency": 1605,
  "mode": "primary",
  "primary": true
    },
    {
  "hostname": "head02",
  "pinged": "UP",
  "responding": true,
  "latency": 2353,
  "mode": "backup",
  "primary": false
    }

When I try doing so via socket, however, my two head nodes show 'DOWN' :
  "pings": [
    {
  "hostname": "head01",
  "pinged": "DOWN",
  "latency": 11784,
  "mode": "primary"
    },
    {
  "hostname": "head02",
  "pinged": "DOWN",
  "latency": 12668,
  "mode": "backup"
    }
Other commands fail with:
  "error_number": 1007,
  "error": "Protocol authentication error",

I'll admit, I don't usually use sockets, so I could easily be 
overlooking something there. Permissions on the socket look right. I am 
getting json back, so it is connecting. Note: slurmrestd is running 
under it's own user (not root and not slurmuser).


Any ideas?

Thanks in advance,

Brian Andrus


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com