On Monday, 06 March 2023, at 10:15:22 (+0100),
Niels Carl W. Hansen wrote:
Seems there still are some issues with the autofs -
job_container/tmpfs functionality in Slurm 23.02.
If the required directories aren't mounted on the allocated node(s)
before jobstart, we get:
slurmstepd: error: coul
On Wednesday, 01 March 2023, at 10:28:24 (+0100),
Ole Holm Nielsen wrote:
but there may be some significant improvements included in 23.02
TL;DR: I can vouch for this.
The primary problem with the interaction between the new namespace
code and the automounter daemon was simply that the shar
On Tuesday, 29 November 2022, at 08:44:48 (+),
Mark Holliman wrote:
I mentioned Fedora 9 and CentOS 9 (Stream) simply because they tend
to be compatible, and something that works on them is likely to work
on Rocky9.
RHEL 8.x is based on Fedora 28. RHEL 9.x is based on Fedora 34 via
CentOS
On Wednesday, 04 May 2022, at 10:00:57 (-0700),
David Henkemeyer wrote:
I am seeing what I think might be a bug with sacct. When I do the
following:
*> sbatch --export=NONE --wrap='uname -a' --exclusive*
*Submitted batch job 2869585*
Then, I ask sacct for the SubmitLine, as such:
*> sacc
On Tuesday, 03 May 2022, at 15:46:38 (+0800),
taleinterve...@sjtu.edu.cn wrote:
We need to detect some problem at job end timepoint, so we write some
detection script in slurm epilog, which should drain the node if check is
not passed.
I know exit epilog with non-zero code will make slurm autom
On Thursday, 27 May 2021, at 08:19:14 (+0200),
Loris Bennett wrote:
Thanks for the detailed explanations. I was obviously completely
confused about what MUNGE does. Would it be possible to say, in very
hand-waving terms, that MUNGE performs a similar role for the access of
processes to nodes a
On Tuesday, 25 May 2021, at 14:09:54 (+0200),
Loris Bennett wrote:
> I think my main problem is that I expect logging in to a node with a job
> to work with pam_slurm_adopt but without any SSH keys. My assumption
> was that MUNGE takes care of the authentication, since users' jobs start
> on node
On Thursday, 15 April 2021, at 10:58:31 (-0300),
Heitor wrote:
> I'm trying to setup NHC[0] for our Slurm cluster, but I'm not
> getting it to work properly.
Just for future reference, NHC has its own mailing lists, and even
though your question does relate to Slurm tangentially, it's really an
N
On Wednesday, 03 February 2021, at 18:06:27 (+),
Philip Kovacs wrote:
> I am familiar with the package rename process and it would not have
> the effect you might think it would.If I provide an upgrade path to
> a new package name, e.g. slurm-xxx, the net effect would be to tell
> yum ordnf-ma
On Tuesday, 20 October 2020, at 15:49:25 (+0800),
Kevin Buckley wrote:
> On 2020/10/20 11:50, Christopher Samuel wrote:
> >
> > I forgot I do have access to a SLES15 SP1 system, that has:
> >
> > # rpm -q libmunge2 --provides
> > libmunge.so.2()(64bit)
> > libmunge2 = 0.5.14-4.9.1
> > libmunge2(
On Monday, 14 September 2020, at 13:46:27 (+),
Braun, Ruth A wrote:
> Is there any issue if I set/change the slurm account password?I'm running
> 19.05.x
>
> Current state is locked but I have to reset it periodically:
> # passwd --status slurm
> slurm LK 2014-02-03 -1 -1 -1 -1 (Password
On Tuesday, 09 June 2020, at 15:26:36 (-0400),
Prentice Bisbal wrote:
> Host-based security is not considered as safe as user-based security, so
> should only be used in special cases.
That's a pretty significant claim, and certainly one that would need
to be backed up with evidence, references,
On Tuesday, 09 June 2020, at 21:27:27 (+0200),
Ole Holm Nielsen wrote:
> Thanks very much, this is really cool! I need to look into the
> HostbasedAuthentication for intra-cluster MPI tasks spawned by SSH (not
> using srun).
>
> Presumably external access still needs to use SSH authorized keys?
On Tuesday, 09 June 2020, at 12:43:34 (+0200),
Ole Holm Nielsen wrote:
> in which case you need to set up SSH authorized_keys files for such
> users.
I'll admit that I didn't know about this until I came to LANL, but
there's actually a much better alternative than having to create user
key pairs
They do something even better: They allow the user/customer to make
the choice in the spec file! :-) And to be clear, they don't expect
users to be experts in building packages; that's why their Quick-Start
Guide (https://slurm.schedmd.com/quickstart_admin.html) is as thorough
as it is; it even h
On Friday, 01 November 2019, at 10:41:26 (-0700),
Brian Andrus wrote:
> That's pretty much how I did it too.
>
> But...
>
> When you try to run slurmd, it chokes on the missing symbols issue.
I don't yet have a full RHEL8 cluster to test on, and this isn't
really my area of expertise, but have
On Friday, 01 November 2019, at 11:37:37 (-0600),
Michael Jennings wrote:
> I build with Mezzanine, but the equivalent would roughly be this:
>
> rpmbuild -ts slurm-19.05.3-2.tar.bz2
> cat the_above_diff.patch | (cd ~/rpmbuild/SPECS ; patch -p0)
> rpmbuild --with x11 --with
On Tuesday, 29 October 2019, at 15:11:38 (+),
Christopher Benjamin Coffey wrote:
> Brian, I've actually just started attempting to build slurm 19 on
> centos 8 yesterday. As you say, there are packages missing now from
> repos like:
They're not missing; they're just harder to get at now, for
On Thursday, 17 October 2019, at 16:50:29 (+),
Goetz, Patrick G wrote:
> Are applications even aware when they've been hit by a SIGSTP? This
> idea of a license being released under these circumstances just
> seems very unlikely.
No, which is why SIGSTOP cannot be caught. The action is carr
On Thursday, 19 September 2019, at 19:27:38 (-0400),
Fulcomer, Samuel wrote:
> I obviously haven't been keeping up with any security concerns over the use
> of Singularity. In a 2-3 sentence nutshell, what are they?
So before I do that, if you have a few minutes, I do think you'll find
it worth y
On Thursday, 19 September 2019, at 20:00:40 (+),
Goetz, Patrick G wrote:
> On 9/19/19 8:22 AM, Thomas M. Payerle wrote:
> > one of our clusters
> > is still running RHEL6, and while containers based on Ubuntu 16,
> > Debian 8, or RHEL7 all appear to work properly,
> > containers based on Ubunt
On Friday, 20 September 2019, at 00:03:28 (+0430),
Mahmood Naderan wrote:
> For the replies. Matlab was an example. I would also like to create
> to containers for OpenFoam with different versions. Then a user can
> choose what he actually wants.
All modern container runtimes support the OCI stan
On Thursday, 19 September 2019, at 12:38:43 (+0430),
Mahmood Naderan wrote:
> The question is not directly related to Slurm, but is actually related to
> the people in this community.
>
> For heterogeneous environments, where different operating systems,
> application and library versions are nee
On Monday, 02 September 2019, at 20:02:57 (+0200),
Ole Holm Nielsen wrote:
> We have some users requesting that a certain minimum size of the
> *Available* (i.e., free) TmpFS disk space should be present on nodes
> before a job should be considered by the scheduler for a set of
> nodes.
>
> I bel
On Monday, 25 March 2019, at 12:57:46 (+),
Ryan Novosielski wrote:
> If the error message is accurate, the fix may be having the VNC
> server not set DISPLAY equal to localhost:10.0 or similar as SSH
> normally does these days, but to configure it to set DISPLAY to
> fqdn:10.0. We had to do so
On Tuesday, 16 October 2018, at 09:30:13 (-0400),
Dave Botsch wrote:
> Hrm... it looks like the default install of OHPC went with DHA keys
> instead:
>
> .ssh]$ cat config
> # Added by Warewulf 2018-10-08
> Host *
>IdentityFile ~/.ssh/cluster
>StrictHostKeyChecking=no
> $ file cluster
>
On Wednesday, 15 August 2018, at 10:01:19 (-0400),
Paul Edmon wrote:
> On 08/14/2018 05:16 AM, Pablo Llopis wrote:
> >
> >Integration with a possible built-in healthcheck is also something
> >to consider, as the orchestration logic would need to take care of
> >disabling the healthcheck funcionali
On Thursday, 10 May 2018, at 10:09:22 (-0400),
Paul Edmon wrote:
> Not that I am aware of. Since the header isn't really part of the
> script bash doesn't evaluate them as far as I know.
>
> On 05/10/2018 09:19 AM, Dmitri Chebotarov wrote:
> >
> >Is it possible to access environment variables in
On Thursday, 10 May 2018, at 20:02:37 (+1000),
Chris Samuel wrote:
> For instance there's the LBNL Node Health Check (NHC) system that plugs into
> both Slurm and Torque.
>
> https://slurm.schedmd.com/SUG14/node_health_check.pdf
>
> https://github.com/mej/nhc
>
> At ${JOB-1} we would run our i
On Tuesday, 08 May 2018, at 17:00:33 (+),
Chester Langin wrote:
> Is there no way to scancel a list of jobs? Like from job 120 to job
> 150? I see cancelling by user, by pending, and by job name. --Chet
If you're using BASH, you can just do: scancel {120..150}
In other POSIX-compatible s
On Wednesday, 21 March 2018, at 20:14:22 (+0100),
Ole Holm Nielsen wrote:
> Thanks for your friendly advice! I keep forgetting about Systemd
> details, and your suggestions are really detailed and useful for
> others! Do you mind if I add your advice to my Slurm Wiki page?
Of course not! Espec
On Wednesday, 21 March 2018, at 12:08:00 (+0100),
Ole Holm Nielsen wrote:
> One working solution is to modify the slurmd Systemd service file
> /usr/lib/systemd/system/slurmd.service to add a line:
> LimitCORE=0
This is a bit off-topic, but I see this a lot, so I thought I'd
provide a friendly
On Wednesday, 21 March 2018, at 08:40:32 (-0600),
Ryan Cox wrote:
> UsePAM has to do with how jobs are launched when controlled by
> Slurm. Basically, it sends jobs launched under Slurm through the
> PAM stack. UsePAM is not required by pam_slurm_adopt because it is
> *sshd* and not *slurmd or s
On Wednesday, 21 March 2018, at 12:05:49 (+0100),
Alexis Huxley wrote:
> > >Depending on the load on the scheduler, this can be slow. Is there
> > >faster way? Perhaps one that doesn't involve communicating with
> > >the scheduler node? Thanks!
>
> Thanks for the suggestion Ole, but we have somet
On Thursday, 15 February 2018, at 16:11:29 (+0100),
Manuel Rodríguez Pascual wrote:
> Although this is not strictly related to Slurm, maybe you can recommend me
> some actions to deal with a particular user.
>
> On our small cluster, currently there are no limits to run applications in
> the fron
On Wednesday, 06 December 2017, at 08:23:10 (-0800),
Jeff White wrote:
> A Web portal is exactly why I am doing this. The remote server is a
> Web server running some software that expects to pass a script to
> sbatch directly. So the SSH stuff you mention doesn't apply.
I'm not sure I agree wi
36 matches
Mail list logo