Re: [slurm-users] More cores than on one node

2018-01-17 Thread Nadav Toledo
srun -c17 --pty bash srun: error: CPU count per node can not be satisfied srun: error: Unable to allocate resources: Requested node configuration is not available On 18/01/2018 08:37, Loris Bennett wrote: Nadav Toledo writes:

Re: [slurm-users] More cores than on one node

2018-01-17 Thread Loris Bennett
Nadav Toledo writes: > Hey everyone, > > We've just setup a slurm cluster with few nodes each has 16 cores. > Is it possible to submit a job for 17cores or more? > If not, is there a workaround? > > Thanks in advance, Nadav It should be possible. Have you tried? If so, do you get an error? Ch

[slurm-users] More cores than on one node

2018-01-17 Thread Nadav Toledo
Hey everyone, We've just setup a slurm cluster with few nodes each has 16 cores. Is it possible to submit a job for 17cores or more? If not, is there a workaround? Thanks in advance, Nadav

Re: [slurm-users] ntpd or chrony?

2018-01-17 Thread Patrick Goetz
On newer systemd-based systems you can just use timedatectl -- I find this does everything I need it to do. Although I think on RHEL/CentOS systems timedatectl is just set start chrony, or something like this. On 01/14/2018 08:11 PM, Lachlan Musicman wrote: Hi all, As part of both Munge and S

Re: [slurm-users] Slurm and available libraries

2018-01-17 Thread Christopher Samuel
On 18/01/18 03:50, Patrick Goetz wrote: Can anyone shed some light on the situation? I'm very surprised that a module script isn't just an explicit command that comes with the lmod package, and am curious as to why this isn't completely standard. The module command needs to be able to manipul

Re: [slurm-users] Slurm and available libraries

2018-01-17 Thread Patrick Goetz
On 01/17/2018 08:12 AM, Ole Holm Nielsen wrote: John: I would refrain from installing the old default package "environment-modules" from the Linux distribution, since it doesn't seem to be maintained any more. Lmod, on the other hand, is actively maintained and solves some problems with the o

Re: [slurm-users] slurm 17.11.2: Socket timed out on send/recv operation

2018-01-17 Thread Alessandro Federico
Hi John thanks for the infos. We are investigating the slowdown of sssd and I found some bug reports regarding slow sssd query with almost the same backtrace. Hopefully an update of sssd could solve this issue. We'll let you know if we found a solution. thanks ale - Original Message

Re: [slurm-users] Slurm and available libraries

2018-01-17 Thread Christopher Samuel
On 18/01/18 02:53, Loris Bennett wrote: This is all very OT, so it might be better to discuss it on, say, the OpenHPC mailing list, since as far as I can tell Spack, EasyBuild and Lmod (but not old or new 'environment-modules') are part of OpenHPC. Another place might be the Beowulf list, all

Re: [slurm-users] Slurm and available libraries

2018-01-17 Thread Loris Bennett
Hi Ole, Ole Holm Nielsen writes: > John: I would refrain from installing the old default package > "environment-modules" from the Linux distribution, since it doesn't > seem to be maintained any more. Is this still true? Here http://modules.sourceforge.net/ there is a version 4.1.0 which is

Re: [slurm-users] Best practice: How much node memory to specify in slurm.conf?

2018-01-17 Thread Christopher Samuel
On 18/01/18 01:52, Paul Edmon wrote: We've been typically taking 4G off the top for memory in our slurm.conf for the system and other processes.  This seems to work pretty well. Where I was working previously we'd discount the memory by the amount of GPFS page cache too, plus a little for syst

Re: [slurm-users] Best practice: How much node memory to specify in slurm.conf?

2018-01-17 Thread Paul Edmon
We've been typically taking 4G off the top for memory in our slurm.conf for the system and other processes.  This seems to work pretty well. -Paul Edmon- On 01/17/2018 01:44 AM, Marcin Stolarek wrote: I think that it depends on your kernel and the way the cluster is booted (for instance initr

Re: [slurm-users] slurm 17.11.2: Socket timed out on send/recv operation

2018-01-17 Thread John DeSantis
Ale, > As Matthieu said it seems something related to SSS daemon. That was a great catch by Matthieu. > Moreover, only 3 SLURM partitions have the AllowGroups ACL Correct, which may seem negligent, but after each `scontrol reconfigure`, slurmctld restart, and/or AllowGroups= partition update, t

Re: [slurm-users] Slurm and available libraries

2018-01-17 Thread Vanzo, Davide
Hi Bill! Always glad to contribute to the Lmod cause! ;) Back to the discussion, I simply gave my contribution based on how we set up our system. In no way I intended to say that that is the only way to deploy software. Yours is definitely a valid alternative, although it requires a deeper exp

Re: [slurm-users] Slurm and available libraries

2018-01-17 Thread Ole Holm Nielsen
John: I would refrain from installing the old default package "environment-modules" from the Linux distribution, since it doesn't seem to be maintained any more. Lmod, on the other hand, is actively maintained and solves some problems with the old "environment-modules" software. There's an e

Re: [slurm-users] Slurm and available libraries

2018-01-17 Thread Bill Barth
I’d go slightly further, though I do appreciate the Lmod shout-out!: In some cases, you may not even want the software on the frontend nodes (hear me out before I retract it). If it’s a library that requires linking against before it can be used, then you probably have to have it unless you re

Re: [slurm-users] Slurm and available libraries

2018-01-17 Thread John Hearns
I should also say that Modules should be easy to install on Ubuntu. It will be the package named "environment-modules" You probably will have to edit the configuration file a little bit since the default install will assume al lModules files are local. You need to set your MODULESPATH to include

Re: [slurm-users] Slurm and available libraries

2018-01-17 Thread Ole Holm Nielsen
I can highly recommend EasyBuild as an easy way to provide software packages as "modules" to your cluster. We have been very pleased with EasyBuild in our cluster. I made some notes about installing EasyBuild in a Wiki page: https://wiki.fysik.dtu.dk/niflheim/EasyBuild_modules We use CentOS

Re: [slurm-users] Slurm and available libraries

2018-01-17 Thread Vanzo, Davide
Ciao Elisabetta, I second John's reply. On our cluster we install software on the shared parallel filesystem with EasyBuild and use Lmod as a module front-end. Then users will simply load software in the job's environment by using the module command. Feel free to ping me directly if you need sp

Re: [slurm-users] Slurm and available libraries

2018-01-17 Thread John Hearns
Hi Elisabetta. No, you normally do not need to install software on all the compute nodes separately. It is quite common to use the 'modules' environment to manage software like this http://www.admin-magazine.com/HPC/Articles/Environment-Modules Once you have numpy installed on a shared drive on

[slurm-users] Slurm and available libraries

2018-01-17 Thread Elisabetta Falivene
Hi, let's say I need to execute a python script with slurm. The script require a particular library installed on the system like numpy. If the library is not installed to the system, it is necessary to install it on the master AND the nodes, right? This has to be done on each machine separately or

Re: [slurm-users] Slurm not starting

2018-01-17 Thread Elisabetta Falivene
Ciao Gennaro! > > *NodeName=node[01-08] CPUs=16 RealMemory=16000 State=UNKNOWN* > > to > > *NodeName=node[01-08] CPUs=16 RealMemory=15999 State=UNKNOWN* > > > > Now, slurm works and the nodes are running. There is only one minor > problem > > > > *error: Node node04 has low real_memory size (7984

Re: [slurm-users] slurm 17.11.2: Socket timed out on send/recv operation

2018-01-17 Thread Alessandro Federico
Hi Matthieu & John this is the backtrace of slurmctld during the slowdown (gdb) bt #0 0x7fb0e8b1e69d in poll () from /lib64/libc.so.6 #1 0x7fb0e8617bfa in sss_cli_make_request_nochecks () from /lib64/libnss_sss.so.2 #2 0x7fb0e86185a3 in sss_nss_make_request () from /lib64/libnss_s

Re: [slurm-users] Best practice: How much node memory to specify in slurm.conf?

2018-01-17 Thread Bjørn-Helge Mevik
I tend to run a test program on an otherwise idle node, allocating (and actually using!) more and more memory, and then see when it starts swapping. I typically end up with between 1 and 1.5 GiB less than what "free" reports as the total memory. -- Regards, Bjørn-Helge Mevik, dr. scient, Departm