[slurm-users] Recover Batch Script Error

2024-02-16 Thread Jason Simms via slurm-users
Hello all, I've used the "scontrol write batch_script" command to output the job submission script from completed jobs in the past, but for some reason, no matter which job I specify, it tells me it is invalid. Any way to troubleshoot this? Alternatively, is there another way - even if a manual da

[slurm-users] Re: Question about IB and Ethernet networks

2024-02-25 Thread Jason Simms via slurm-users
Hello Daniel, In my experience, if you have a high-speed interconnect such as IB, you would do IPoIB. You would likely still have a "regular" Ethernet connection for management purposes, and yes that means both an IB switch and an Ethernet switch, but that switch doesn't have to be anything specia

[slurm-users] Re: pty jobs are killed when another job on the same node terminates

2024-02-28 Thread Jason Simms via slurm-users
Hello Thomas, I know I'm a few days late to this, so I'm wondering whether you've made any progress. We experience this, too, but in a different way. First, though, you may be aware, but you should use salloc rather than srun --pty for an interactive session. That's been the preferred method for

[slurm-users] Re: Enforcing relative resource restrictions in submission script

2024-02-28 Thread Jason Simms via slurm-users
Hello Matthew, You may be aware of this already, but most sites would make these kinds of checks/validations using job_submit.lua. I'm not an expert in that - though plenty of others on this list are - but I'm positive you could implement this type of validation logic. I'd like to say that I've co

[slurm-users] Re: Munge log-file fills up the file system to 100%

2024-04-16 Thread Jason Simms via slurm-users
As a related point, for this reason I mount /var/log separately from /. Ask me how I learned that lesson... Jason On Tue, Apr 16, 2024 at 8:43 AM Jeffrey T Frey via slurm-users < slurm-users@lists.schedmd.com> wrote: > AFAIK, the fs.file-max limit is a node-wide limit, whereas "ulimit -n" > is p

[slurm-users] Trying to Track Down root Usage

2024-04-29 Thread Jason Simms via slurm-users
Hello all, Each week, I generate an automated report of the top users by CPU hours. This week, for whatever reason the user root accounted for a massive number of hours: Login Proper Name Used A

[slurm-users] Re: Trying to Track Down root Usage

2024-04-29 Thread Jason Simms via slurm-users
r user root in place? > > sreport accounts resources reserved for a user as well (even if not > used by jobs) while sacct reports job accounting only. > > Best regards > Jürgen > > > * Jason Simms via slurm-users [240429 > 10:47]: > > Hello all, > > > >

[slurm-users] Partition Preemption Configuration Question

2024-05-02 Thread Jason Simms via slurm-users
Hello all, The Slurm docs have me a bit confused... I'm wanting to enable job preemption on certain partitions but not others. I *presume* I would set PreemptType=preempt/partition_prio globally, but then on the partitions where I don't want jobs to be able to be preempted, I would set PreemptMode

[slurm-users] Re: With slurm, how to allocate a whole node for a single multi-threaded process?

2024-08-01 Thread Jason Simms via slurm-users
On the one hand, you say you want "to *allocate a whole node* for a single multi-threaded process," but on the other you say you want to allow it to "*share nodes* with other running jobs." Those seem like mutually exclusive requirements. Jason On Thu, Aug 1, 2024 at 1:32 PM Henrique Almeida via

[slurm-users] Re: salloc not starting shell despite LaunchParameters=use_interactive_step

2024-09-05 Thread Jason Simms via slurm-users
I know this doesn't particularly help you, but for me on 23.11.6 it works as expected and immediately drops me onto the allocated node. In answer to your question, yes, as I understand it the default/expected behavior is to return the shell directly. Jason On Thu, Sep 5, 2024 at 8:18 AM Loris Ben

[slurm-users] Re: salloc not starting shell despite LaunchParameters=use_interactive_step

2024-09-05 Thread Jason Simms via slurm-users
Ours works fine, however, without the InteractiveStepOptions parameter. JLS On Thu, Sep 5, 2024 at 9:53 AM Carsten Beyer via slurm-users < slurm-users@lists.schedmd.com> wrote: > Hi Loris, > > we use SLURM 23.02.7 (Production) and 23.11.1 (Testsystem). Our config > contains a second parameter In

[slurm-users] Re: First setup of slurm with a GPU node

2024-11-13 Thread Jason Simms via slurm-users
Hello Patrick, Yeah I'd recommend upgrading, and I imagine most others will, too. I have found with Slurm that upgrades are nearly mandatory, at least annually or so, mostly because it's more challenging to upgrade from much older versions and requires bootstrapping. Not sure about the minus sign;

[slurm-users] Best Way to See GPUs in Use?

2025-04-02 Thread Jason Simms via slurm-users
Hello all, Apologies for the basic question, but is there a straightforward, best-accepted method for using Slurm to report on which GPUs are currently in use? I've done some searching and people recommend all sorts of methods, including parsing the output of nvidia-smi (seems inefficient, especia