[slurm-users] Re: Nodes TRES double what is requested
On 2024/07/10 16:25, jack.mellor--- via slurm-users wrote: We are running slurm 23.02.6. Our nodes have hyperthreading disabled and we have slurm.conf set to CPU=32 for each node (each node has 2 processes with 16 cores). When we allocated a job, such as salloc -n 32, it will allocate a whole node but using sinfo shows double the allocation in the TRES=64. It also shows in sinfo that the node has 4294967264 idle CPUs. What does an scontrol show node tell you about the node(s) On our systems, where, sadly, our vendor is unable/unwilling to turn off SMT/hyperthreading, we see (not all fields shown), for a fully allocated, AMD EPYC 7763: so 128 physical core, node CoresPerSocket=64 CPUAlloc=256 CPUEfctv=256 CPUTot=256 Sockets=2 Boards=1 ThreadsPerCore=2 CfgTRES=cpu=256 AllocTRES=cpu=256 so I guess the question would be, depending on exactly what you see, have you explictly set, or tried setting, ThreadsPerCore=1 in the config. -- Supercomputing Systems Administrator Pawsey Supercomputing Centre SMS: +61 4 7497 6266 Eml: kevin.buck...@pawsey.org.au -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Access to --constraint= in Lua cli_filter?
If I supply a --constraint= option to an sbatch/salloc/srun, does the arg appear inside any object that a Lua CLI Filter could access? I've tried this basic check if is_unset(options['constraint']) then slurm_errorf('constraint is unset ') end and seen that that object is, indeed, unset. Kevin -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: Access to --constraint= in Lua cli_filter?
On 2024/08/19 15:11, Ward Poelmans via slurm-users wrote: Have a look if you can spot them in: function slurm_cli_pre_submit(options, pack_offset) env_json = slurm.json_env() slurm.log_info("ENV: %s", env_json) opt_json = slurm.json_cli_options(options) slurm.log_info("OPTIONS: %s", opt_json) end I thought all options could be access in the cli filter. Cheers Ward, however, I'd already dumped the options array (OK: it's Lua so make that: table) and not see anything, hence wondering if constraints might be in their own object/array/table. But no matter: something I spotted in the options["args"] array/table has since given me something reproducible to "key off", so that I can take a different path through the filter logic, when that is seen, which is what I was hoping to do by passing a constraint in. There's usually more than one way to skin a cat: and this cat is now skinless! Cheers again, Kevin -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: Non-Standard Mail Notification in Job
On 2024/12/05 05:37, Daniel Miliate via slurm-users wrote: I'm trying to send myself an email notification with a custom body/subject for notification of important events. Is this possible? I've tried a few things in the script below. The bits you may have missed: The emails that Slurm sends out, controlled by the SBATCH directives, are not send from the node on which the jobs runs, but from the controller, which will typically have been given a mail configuration that allows it to do so. In order to send email from a job, you would need to ensure that all of your compute nodes can send email to the required destinations. Onee way to test mail from your compute nodes would be to salloc an interactive session, and see if you can run your mail/mailx commands, from the command line, on the node you get allocated. -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
[slurm-users] Re: how to set slurmdbd.conf if using two slurmdb node with HA database?
On 2025/02/20 21:55, Daniel Letai via slurm-users wrote: ... Adding AccountingStorageBackupHost pointing to the other node is of course possible, but will mean different slurm.conf files which slurm will complain about. Just thought to note that, in general, it is useful to be aware that one way to avoid Slurm complaining about per-host differences is to have your slurm.conf Include a file, containing the different per-host settings, So, you have a line Include /etc/slurm/slurm-acct_strge_backup_host.conf in the slurm.conf on both hosts, but have different file content, in this case the address in the one line AccountingStorageBackupHost=IP.AD.RE.SS in the included file on each of the two hosts. The SlurmCtld won't complain about that, but the SlurmDs will run against a different config on each of the nodes. Background: Older Crays used to have some Slurm infrastructure running on a node, "inside the Cray box", that was accessed via different IP addresses, depending on whether you were a compute node, so "in-the-box" or an "eLogin" node, so "out-of-the-box" and that was how we overcame that. We use the same construct now (on newer HPE/Crays) for Account Gathering where not all node hardware supports it, and so we can include AcctGatherEnergyType=acct_gather_energy/none or AcctGatherEnergyType=acct_gather_energy/pm_counters depending on the node. Same slurm.conf: no complaining from the SlurmCtld. -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com