srun -c17 --pty bash
srun: error: CPU count per node can not be satisfied
srun: error: Unable to allocate resources: Requested node
configuration is not available
On 18/01/2018 08:37, Loris Bennett
wrote:
Nadav Toledo writes:
Nadav Toledo writes:
> Hey everyone,
>
> We've just setup a slurm cluster with few nodes each has 16 cores.
> Is it possible to submit a job for 17cores or more?
> If not, is there a workaround?
>
> Thanks in advance, Nadav
It should be possible. Have you tried? If so, do you get an error?
Ch
Hey everyone,
We've just setup a slurm cluster with few nodes each has 16 cores.
Is it possible to submit a job for 17cores or more?
If not, is there a workaround?
Thanks in advance, Nadav
On newer systemd-based systems you can just use timedatectl -- I find
this does everything I need it to do. Although I think on RHEL/CentOS
systems timedatectl is just set start chrony, or something like this.
On 01/14/2018 08:11 PM, Lachlan Musicman wrote:
Hi all,
As part of both Munge and S
On 18/01/18 03:50, Patrick Goetz wrote:
Can anyone shed some light on the situation? I'm very surprised that
a module script isn't just an explicit command that comes with the
lmod package, and am curious as to why this isn't completely
standard.
The module command needs to be able to manipul
On 01/17/2018 08:12 AM, Ole Holm Nielsen wrote:
John: I would refrain from installing the old default package
"environment-modules" from the Linux distribution, since it doesn't seem
to be maintained any more.
Lmod, on the other hand, is actively maintained and solves some problems
with the o
Hi John
thanks for the infos.
We are investigating the slowdown of sssd and I found some bug reports
regarding slow sssd query
with almost the same backtrace. Hopefully an update of sssd could solve this
issue.
We'll let you know if we found a solution.
thanks
ale
- Original Message
On 18/01/18 02:53, Loris Bennett wrote:
This is all very OT, so it might be better to discuss it on, say, the
OpenHPC mailing list, since as far as I can tell Spack, EasyBuild and
Lmod (but not old or new 'environment-modules') are part of OpenHPC.
Another place might be the Beowulf list, all
Hi Ole,
Ole Holm Nielsen writes:
> John: I would refrain from installing the old default package
> "environment-modules" from the Linux distribution, since it doesn't
> seem to be maintained any more.
Is this still true? Here
http://modules.sourceforge.net/
there is a version 4.1.0 which is
On 18/01/18 01:52, Paul Edmon wrote:
We've been typically taking 4G off the top for memory in our slurm.conf
for the system and other processes. This seems to work pretty well.
Where I was working previously we'd discount the memory by the amount
of GPFS page cache too, plus a little for syst
We've been typically taking 4G off the top for memory in our slurm.conf
for the system and other processes. This seems to work pretty well.
-Paul Edmon-
On 01/17/2018 01:44 AM, Marcin Stolarek wrote:
I think that it depends on your kernel and the way the cluster is
booted (for instance initr
Ale,
> As Matthieu said it seems something related to SSS daemon.
That was a great catch by Matthieu.
> Moreover, only 3 SLURM partitions have the AllowGroups ACL
Correct, which may seem negligent, but after each `scontrol
reconfigure`, slurmctld restart, and/or AllowGroups= partition update,
t
Hi Bill!
Always glad to contribute to the Lmod cause! ;)
Back to the discussion, I simply gave my contribution based on how we set up
our system. In no way I intended to say that that is the only way to deploy
software. Yours is definitely a valid alternative, although it requires a
deeper exp
John: I would refrain from installing the old default package
"environment-modules" from the Linux distribution, since it doesn't seem
to be maintained any more.
Lmod, on the other hand, is actively maintained and solves some problems
with the old "environment-modules" software.
There's an e
I’d go slightly further, though I do appreciate the Lmod shout-out!: In some
cases, you may not even want the software on the frontend nodes (hear me out
before I retract it).
If it’s a library that requires linking against before it can be used, then you
probably have to have it unless you re
I should also say that Modules should be easy to install on Ubuntu. It will
be the package named "environment-modules"
You probably will have to edit the configuration file a little bit since
the default install will assume al lModules files are local.
You need to set your MODULESPATH to include
I can highly recommend EasyBuild as an easy way to provide software
packages as "modules" to your cluster. We have been very pleased with
EasyBuild in our cluster.
I made some notes about installing EasyBuild in a Wiki page:
https://wiki.fysik.dtu.dk/niflheim/EasyBuild_modules
We use CentOS
Ciao Elisabetta,
I second John's reply.
On our cluster we install software on the shared parallel filesystem with
EasyBuild and use Lmod as a module front-end. Then users will simply load
software in the job's environment by using the module command.
Feel free to ping me directly if you need sp
Hi Elisabetta. No, you normally do not need to install software on all the
compute nodes separately.
It is quite common to use the 'modules' environment to manage software like
this
http://www.admin-magazine.com/HPC/Articles/Environment-Modules
Once you have numpy installed on a shared drive on
Hi,
let's say I need to execute a python script with slurm. The script require
a particular library installed on the system like numpy.
If the library is not installed to the system, it is necessary to install
it on the master AND the nodes, right? This has to be done on each machine
separately or
Ciao Gennaro!
> > *NodeName=node[01-08] CPUs=16 RealMemory=16000 State=UNKNOWN*
> > to
> > *NodeName=node[01-08] CPUs=16 RealMemory=15999 State=UNKNOWN*
> >
> > Now, slurm works and the nodes are running. There is only one minor
> problem
> >
> > *error: Node node04 has low real_memory size (7984
Hi Matthieu & John
this is the backtrace of slurmctld during the slowdown
(gdb) bt
#0 0x7fb0e8b1e69d in poll () from /lib64/libc.so.6
#1 0x7fb0e8617bfa in sss_cli_make_request_nochecks () from
/lib64/libnss_sss.so.2
#2 0x7fb0e86185a3 in sss_nss_make_request () from /lib64/libnss_s
I tend to run a test program on an otherwise idle node, allocating (and
actually using!) more and more memory, and then see when it starts
swapping. I typically end up with between 1 and 1.5 GiB less than what
"free" reports as the total memory.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Departm
23 matches
Mail list logo