Hi Davide,
Two things you may want to look into:
1. some (most?) web services have "email-to-service" mechanisms of
some sort: for instance, you can send an email to a Slack channel,
which will create a message from it:
https://slack.com/help/articles/206819278-Send-emails-to-Slack
2. Slurm has
sident of Marketing *
>
> 909.609.8889
>
> www.schedmd.com
>
>
> On Mon, Sep 23, 2024 at 10:49 AM Kilian Cavalotti via slurm-users <
> slurm-users@lists.schedmd.com> wrote:
>
>> Hi SchedMD,
>>
>> I'm sure they will eventually, but do you know when the
Hi SchedMD,
I'm sure they will eventually, but do you know when the slides of the
SLUG'24 presentation will be available online at
https://slurm.schedmd.com/publications.html, like previous editions'?
Thanks!
--
Kilian
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe
Those CVEs are indeed for different software (one for PMIx, one for
Slurm), even though they're ultimately for the same kind of underlying
problem (chown() being used instead of lchown(), which could lead in
taking over privileged files).
The Slurm patches include more fixes related to permissions
And to close the loop on this, the "smail" fix will be in 23.02.4 when
it's released
https://bugs.schedmd.com/show_bug.cgi?id=17123
Cheers,
--
Kilian
On Mon, Jul 3, 2023 at 9:30 AM Angel de Vicente wrote:
>
> Hello,
>
> Angel de Vicente writes:
>
> > Any idea what could be going on or how to de
On Tue, Jun 23, 2020 at 7:37 AM Bas van der Vlies
wrote:
>
> Which version of slurm do you use? as slurm 19.05:
> * DefCpuPerGPU
Sorry for necroposting and undigging this old thread, but the
DefCpuPerGpu configuration option is actually just a default, which
will happily get overridden by job s
Hi Simon,
On Mon, Mar 6, 2023 at 1:34 PM Simon Gao wrote:
> We are experiencing an issue with deleting any Slurm account.
>
> When running a command like: sacctmgr delete account ,
> following errors are returned and the command failed.
>
> # sacctmgr delete account
> Database is busy or waitin
Hi Sefa,
`scontrol -d show job ` should give you that information:
# scontrol -d show job 2781284 | grep Nodes=
NumNodes=10 NumCPUs=256 NumTasks=128 CPUs/Task=2 ReqB:S:C:T=0:0:*:*
Nodes=sh03-01n29 CPU_IDs=4-6,12-19,22-23,25 Mem=71680 GRES=
Nodes=sh03-01n[38,40] CPU_IDs=0-31 Mem=1638
Hi Allan,
On Fri, Dec 9, 2022 at 3:20 PM Carter, Allan wrote:
> If a job is pending only because it needs a license and all are being used,
> can it preempt jobs in a lower priority partition that are using the license?
> Or does preemption only work for compute resources. I've tried to configu
Hi Loris,
On Thu, Dec 8, 2022 at 12:59 AM Loris Bennett
wrote:
> However, I do have a chronic problem with users requesting too much
> memory. My approach has been to try to get people to use 'seff' to see
> what resources their jobs in fact need. In addition each month we
> generate a graphical
Hi Phil,
Link-time optimization (LTO) has been enabled by default in RHEL9:
https://fedoraproject.org/wiki/LTOByDefault
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html-single/developing_c_and_cpp_applications_in_rhel_9/index#ref_link-time-optimization_using-libraries
On Wed, Jun 2, 2021 at 10:13 PM Ahmad Khalifa wrote:
> How to send a job to a particular gpu card using its ID (0,1,2...etc)?
Well, you can't, because:
1. GPU ids are something of a relative concept:
https://bugs.schedmd.com/show_bug.cgi?id=10933
2. requesting specific GPUs is not supported:
ht
On Tue, May 11, 2021 at 5:55 AM Renfro, Michael wrote:
>
> XDMoD [1] is useful for this, but it’s not a simple script. It does have some
> user-accessible APIs if you want some report automation. I’m using that to
> create a lightning-talk-style slide at [2].
>
> [1] https://open.xdmod.org/
> [2
Hi Joshua,
On Thu, Mar 4, 2021 at 8:38 PM Joshua Baker-LePain wrote:
> slurmd: error: _nvml_get_mem_freqs: Failed to get supported memory
> frequencies
> slurmd: error: for the GPU : Not Supported
> slurmd: 4 GPU system device(s) detected
> slurmd: WARNING: The following autodetected GPUs a
On Wed, Jan 20, 2021 at 12:56 PM Brian Andrus wrote:
> We would need more information.
> At a minimum, what client is it? As this is not a slurm issue, you would need
> to dig into what is causing that behavior with your storage system.
And if the question is how to make sure Slurm won't allocat
Hi Jason,
We're taking the approach proposed in
https://bugs.schedmd.com/show_bug.cgi?id=7919: same RPM everywhere,
but without the dependencies that you don't want installed globally
(like NVML, PMIx...). Of course you need to satisfy those dependencies
some other way on the nodes that require th
On Fri, Feb 21, 2020 at 12:38 AM Benjamin Redling
wrote:
> If there isn't already a better name, I suggest
> "PerilogueInterfacePlugin", because of the following possible historical
> IT-roots:
>
> As "prologue" comes from the Greek "προ", meaning "before", and as
> "epilogue" comes from the Greek
Hi Chris,
On Thu, Dec 12, 2019 at 10:47 AM Christopher Benjamin Coffey
wrote:
> I believe I heard recently that you could limit the number of users jobs that
> accrue age priority points. Yet, I cannot find this option in the man pages.
> Anyone have an idea? Thank you!
It's the *JobsAccrue*
Hi Lev,
On Mon, Dec 2, 2019 at 2:31 PM Lev Lafayette
wrote:
> Do others have a special arrangement for managing jobs during outages, apart
> from "no arrangements, no jobs".
Slurm supports reservations, which can typically be used to make sure
no job runs during a scheduled downtime (but can sti
Hi Jürgen,
I would take a look at the various *KmemSpace options in cgroups.conf,
they can certainly help with this.
Cheers,
--
Kilian
On Thu, Jun 13, 2019 at 2:41 PM Juergen Salk wrote:
>
> Dear all,
>
> I'm just starting to get used to Slurm and play around with it in a small test
> environm
On Thu, Jun 6, 2019 at 11:16 AM Christopher Samuel wrote:
> Sounds like a good reason to file a bug.
Levi did already. Everybody can vote at
https://bugs.schedmd.com/show_bug.cgi?id=7191 :)
Cheers,
--
Kilian
Hi Paul,
I'm wondering about this part in your SchedulerParameters:
### default_queue_depth should be some multiple of the partition_job_depth,
### ideally number_of_partitions * partition_job_depth, but typically the
main
### loop exits prematurely if you go over about 400. A partition_job_depth
Hi Ahmet,
Very useful tool for us, we've adopted it!
https://news.sherlock.stanford.edu/posts/a-better-view-at-sherlock-s-resources
Thank you very much for writing it.
Cheers,
--
Kilian
On Wed, Mar 27, 2019, 02:53 mercan wrote:
> Hi;
>
> Except sjstat script, Slurm does not contains a comman
Hi Randy!
> We have a slurm cluster with a number of nodes, some of which have more than
> one GPU. Users select how many or which GPUs they want with srun's "--gres"
> option. Nothing fancy here, and in general this works as expected. But
> starting a few days ago we've had problems on one
On Fri, Jan 18, 2019 at 6:31 AM Prentice Bisbal wrote:
> > Note that if you care about node weights (eg. NodeName=whatever001
> > Weight=2, etc. in slurm.conf), using the topology function will disable it.
> > I believe I was promised a warning about that in the future in a
> > conversation wit
Hi Bill,
On Tue, Nov 13, 2018 at 5:35 PM Bill Broadley wrote:
> (gdb) bt
> #0 _step_dealloc_lps (step_ptr=0x555787af0f70) at step_mgr.c:2092
> #1 post_job_step (step_ptr=step_ptr@entry=0x555787af0f70) at step_mgr.c:4720
> #2 0x55578571d1d8 in _post_job_step (step_ptr=0x555787af0f70) at
>
On Wed, Sep 19, 2018 at 9:21 AM Christopher Benjamin Coffey
wrote:
> The only thing that I've gotten working so far is this:
> sudo -u slurm bash -c "strigger --set -D -n cn15 -p
> /common/adm/slurm/triggers/nodestatus"
>
> So, that will run the nodestatus script which emails when the node cn15 g
Hi Didier,
On Wed, Sep 5, 2018 at 7:39 AM Didier GAZEN
wrote:
> What do you think?
I'd recommend opening a bug at https://bugs.schedmd.com to report your
findings, if you haven't done that already.
This is the best way to get attention of the developers and get this fixed.
Cheers,
--
Kilian
Hi Christian,
On Wed, Aug 22, 2018 at 7:27 AM, Christian Peter
wrote:
> we observed a strange behavior of pam_slurm_adopt regarding the involved
> cgroups:
>
> when we start a shell as a new Slurm job using "srun", the process has
> freezer, cpuset and memory cgroups setup as e.g.
> "/slurm/uid_5
Hi Chris,
On Sun, Aug 19, 2018 at 6:26 PM, Christopher Samuel wrote:
> We are using QOS's for projects which have been granted a fixed set of
> time for higher priority work which works nicely, but have just been
> asked the obvious question "how much time do we have left?".
I _think_ that "scon
On Wed, Aug 15, 2018 at 11:57 AM, Michael Jennings wrote:
> We [...] are planning to investigate clush [...] in the near future.
I can only encourage you to do so, as ClusterShell comes with nice
Slurm bindings out of the box, that allow, among other things, to
execute commands on all the nodes:
On Wed, Aug 15, 2018 at 7:01 AM, Paul Edmon wrote:
> So we use NHC for our automatic node closer. For reopening we have a series
> of scripts that we use but they are all ad hoc and not formalized. Same
> with closing off subsets of nodes we just have a bunch of bash scripts that
> we have rolle
On Tue, Jul 10, 2018 at 10:34 AM, Taras Shapovalov
wrote:
> I noticed the commit that can be related to this:
>
> https://github.com/SchedMD/slurm/commit/bf4cb0b1b01f3e165bf12e69fe59aa7b222f8d8e
Yes. See also this bug: https://bugs.schedmd.com/show_bug.cgi?id=5240
This commit will be reverted in
On Tue, Jul 10, 2018 at 10:05 AM, Jessica Nettelblad
wrote:
> In the master branch, scontrol write batch_script also has the option to
> write the job script to STDOUT instead of a file. This is what we use in the
> prolog when we gather information for later (possible) troubleshooting. So I
> sup
Hi Nadav,
On Tue, Jun 12, 2018 at 8:18 AM, Nadav Toledo
wrote:
> How can one send a few jobs running in parallel with different cpus
> allocation on the same node?
According to https://slurm.schedmd.com/srun.html#OPT_cpu-bind, you may
want to use "srun --exclusive":
By default, a job step h
Hi Paul,
I'd first suggest to upgrade to 17.11.6, I think the first couple
17.11.x releases had some issues in terms of GRES binding.
Then, I believe you also need to request all of your cores to be
allocated on the same socket, if that's what you want. Something like
--ntasks-per-socket=16.
Her
Hi Andy,
On Mon, Apr 16, 2018 at 8:43 AM, Andy Riebs wrote:
> I hadn't realized that jobs can be scheduled to run on a node that is still
> in "completing" state from an earlier job. We occasionally use epilog
> scripts that can take 30 seconds or longer, and we really don't want the
> next job t
Hi Ryan,
On Mon, Feb 5, 2018 at 8:06 AM, Ryan Novosielski wrote:
> We currently use SLURM 16.05.10 and one of our staff asked how they
> can check for allocated GPUs, as you might check allocated CPUs by
> doing scontrol show node. I could have sworn that you can see both,
> but I see that only C
Hi Miguel,
On Tue, Jan 23, 2018 at 4:41 AM, Miguel Gila wrote:
> Hi Kilian, a question on this: which version of Slurm/Lua are you running
> this against??
Slurm 17.11.x and Lua 5.1
> I don’t seem able to generate the RPM on 17.02.9/Lua 5.2 ; it throws similar
> errors to what I had seen earlie
Hi all,
We (Stanford Research Computing Center) developed a SPANK plugin which
allows users to choose the GPU compute mode [1] for their jobs.
[1]
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-modes
This came from the need to give our users some control on the way GPUs
Hi Jeff,
Quite close:
$ sinfo --Format=nodehost,statelong
Cheers,
--
Kilian
41 matches
Mail list logo