[slurm-users] Re: Nodes TRES double what is requested

2024-07-11 Thread Kevin Buckley via slurm-users

On 2024/07/10 16:25, jack.mellor--- via slurm-users wrote:


We are running slurm 23.02.6.
Our nodes have hyperthreading disabled and we have slurm.conf
set to CPU=32 for each node (each node has 2 processes with 16 cores).
When we allocated a job, such as salloc -n 32, it will allocate
a whole node but using sinfo shows double the allocation in the TRES=64.
It also shows in sinfo that the node has 4294967264 idle CPUs.


What does an

  scontrol show node

tell you about the node(s)

On our systems, where, sadly, our vendor is unable/unwilling
to turn off SMT/hyperthreading, we see (not all fields shown),
for a fully allocated, AMD EPYC 7763: so 128 physical core, node

 CoresPerSocket=64
 
   CPUAlloc=256 CPUEfctv=256 CPUTot=256


 Sockets=2 Boards=1

 ThreadsPerCore=2

   CfgTRES=cpu=256
   AllocTRES=cpu=256

so I guess the question would be,
depending on exactly what you see,

have you explictly set, or tried setting,

 ThreadsPerCore=1

in the config.

















--
Supercomputing Systems Administrator
Pawsey Supercomputing Centre
SMS: +61 4 7497 6266
Eml: kevin.buck...@pawsey.org.au


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Access to --constraint= in Lua cli_filter?

2024-08-18 Thread Kevin Buckley via slurm-users

If I supply a

  --constraint=

option to an sbatch/salloc/srun, does the arg appear inside
any object that a Lua CLI Filter could access?

I've tried this basic check

if  is_unset(options['constraint']) then
slurm_errorf('constraint is unset ')
end

and seen that that object is, indeed, unset.

Kevin

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Access to --constraint= in Lua cli_filter?

2024-08-19 Thread Kevin Buckley via slurm-users

On 2024/08/19 15:11, Ward Poelmans via slurm-users wrote:


Have a look if you can spot them in:
function slurm_cli_pre_submit(options, pack_offset)
env_json = slurm.json_env()
slurm.log_info("ENV: %s", env_json)
opt_json = slurm.json_cli_options(options)
slurm.log_info("OPTIONS: %s", opt_json)
end

I thought all options could be access in the cli filter.


Cheers Ward, however, I'd already dumped the options array
(OK: it's Lua so make that: table) and not see anything,
hence wondering if constraints might be in their own
object/array/table.

But no matter: something I spotted in the options["args"]
array/table has since given me something reproducible to
"key off", so that I can take a different path through the
filter logic, when that is seen, which is what I was hoping
to do by passing a constraint in.

There's usually more than one way to skin a cat: and this
cat is now skinless!

Cheers again,
Kevin

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Non-Standard Mail Notification in Job

2024-12-04 Thread Kevin Buckley via slurm-users

On 2024/12/05 05:37, Daniel Miliate via slurm-users wrote:


I'm trying to send myself an email notification with a custom body/subject
for notification of important events. Is this possible? I've tried a few
things in the script below.

The bits you may have missed:

The emails that Slurm sends out, controlled by the SBATCH directives,
are not send from the node on which the jobs runs, but from the
controller, which will typically have been given a mail configuration
that allows it to do so.

In order to send email from a job, you would need to ensure that all
of your compute nodes can send email to the required destinations.

Onee way to test mail from your compute nodes would be to salloc an
interactive session, and see if you can run your mail/mailx commands,
from the command line, on the node you get allocated.

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: how to set slurmdbd.conf if using two slurmdb node with HA database?

2025-02-25 Thread Kevin Buckley via slurm-users

On 2025/02/20 21:55, Daniel Letai via slurm-users wrote:

...

Adding AccountingStorageBackupHost pointing to the other node is of course
possible, but will mean different slurm.conf files which slurm will complain
about.


Just thought to note that, in general, it is useful to be aware
that one way to avoid Slurm complaining about per-host differences
is to have your slurm.conf Include a file, containing the different
per-host settings,


So, you have a line

Include /etc/slurm/slurm-acct_strge_backup_host.conf

in the slurm.conf on both hosts,


but have different file content, in this case the address in
the one line

AccountingStorageBackupHost=IP.AD.RE.SS

in the included file on each of the two hosts.


The SlurmCtld won't complain about that, but the SlurmDs will run
against a different config on each of the nodes.


Background:

Older Crays used to have some Slurm infrastructure running on a node,
"inside the Cray box", that was accessed via different IP addresses,
depending on whether you were a compute node, so "in-the-box" or an
"eLogin" node, so "out-of-the-box" and that was how we overcame that.

We use the same construct now (on newer HPE/Crays) for Account Gathering
where not all node hardware supports it, and so we can include

AcctGatherEnergyType=acct_gather_energy/none

or

AcctGatherEnergyType=acct_gather_energy/pm_counters

depending on the node.


Same slurm.conf: no complaining from the SlurmCtld.




--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com