Re: [slurm-users] Question about memory allocation

2019-12-16 Thread Marcus Wagner
Dear Mahmood, could you please show the output of scontrol show -d job 119 Best Marcus On 12/16/19 5:41 PM, Mahmood Naderan wrote: Excuse me, still I have problem. Although I freed memory on the nodes as below    RealMemory=64259 AllocMem=1024 FreeMem=61882 Sockets=32 Boards=1    RealMemory

Re: [slurm-users] slurmd.service fails to register

2019-12-16 Thread Marcus Wagner
Hi Dean, first make sure, the munge.key is really the same on  all systems. Also the users must be the same on the systems, as the submission itself is done on the controller. Please be sure also, that the systems have the same date and time. After that, restart munge service and then the sl

[slurm-users] Switching from Moab to Slurm: allocations, resource limits?

2019-12-16 Thread Grigory Shamov
Hi All, I am in process of switching to SLURM from a Torque/Moab setup . Could you please advise me on the following questions? 1) We use Moabs MAXPE and MAXPS limits per accounting group (which is not the same as Unix group for us; users group membership does not match their allocations mem

Re: [slurm-users] Partition question

2019-12-16 Thread Brian Andrus
There are numerous ways to get this functionality. The simplest is probably to just have a separate partition that will only allow job times of 1 hour or less. There are also options that would involve preemption of the longer jobs so the quicker ones could run, priorities, etc. It all depe

Re: [slurm-users] Upgraded Slurm 17.02 to 19.05, now GRPTRESRunMin limits are applied incorrectly

2019-12-16 Thread Renfro, Michael
Resolved now. On older versions of Slurm, I could have queues without default times specified (just an upper limit, in my case). As of Slurm 18 or 19, I had to add a default time to all my queues to avoid the AssocGrpCPURunMinutesLimit flag. > On Dec 16, 2019, at 2:00 PM, Renfro, Michael wrote

[slurm-users] Partition question

2019-12-16 Thread Ransom, Geoffrey M.
Hello I am looking into switching from Univa (sge) to slurm and am figuring out how to implement some of our usage policy in slurm. We have a Univa queue which uses job classes and RQSes to limit jobs with a run time over 4 hours to only half the available slots (CPU cores) so some slots ar

[slurm-users] slurmd.service fails to register

2019-12-16 Thread Dean Schulze
I have my controller running (slurmctld and slrumdbd) and my controller and node host can ping each other by name so they resolve via /etc/hosts settings. When I try to start the slurmd.service it shows that it is active (running), but gives these errors: Unable to register: Zero Bytes were trans

[slurm-users] Where is the slurmstepd location configured?

2019-12-16 Thread Dean Schulze
When I try to start a node it fails with this message: fatal: Unable to find slurmstepd file at /storage/slurm-build/sbin/slurmstepd The location /storage/slurm-build/sbin/slurmstepd is where the binaries were built by make (I used ./configure --prefix=/storage/slurm-build). After I created the

Re: [slurm-users] Upgraded Slurm 17.02 to 19.05, now GRPTRESRunMin limits are applied incorrectly

2019-12-16 Thread Renfro, Michael
Thanks, Ole. I forgot I had that tool already. Not seeing where the limits are getting enforced. But now I’ve narrowed it down to some of my partitions or my job routing Lua plugin: = [renfro@login ~]$ hpcshell --reservation=slurm-upgrade --partition=interactive srun: job 232423 queued and

Re: [slurm-users] Upgraded Slurm 17.02 to 19.05, now GRPTRESRunMin limits are applied incorrectly

2019-12-16 Thread Ole Holm Nielsen
Hi Mike, My showuserlimits tool prints nicely user limits from the Slurm database: https://github.com/OleHolmNielsen/Slurm_tools/tree/master/showuserlimits Maybe this can give you further insights into the source of problems. /Ole On 16-12-2019 17:27, Renfro, Michael wrote: Hey, folks. I’ve j

Re: [slurm-users] Question about memory allocation

2019-12-16 Thread Mahmood Naderan
Excuse me, still I have problem. Although I freed memory on the nodes as below RealMemory=64259 AllocMem=1024 FreeMem=61882 Sockets=32 Boards=1 RealMemory=120705 AllocMem=1024 FreeMem=115257 Sockets=32 Boards=1 RealMemory=64259 AllocMem=26624 FreeMem=61795 Sockets=32 Boards=1 RealMemor

Re: [slurm-users] Small FreeMem is reported by scontrol

2019-12-16 Thread Mahmood Naderan
OK. It takes some time for scontrol to update the values. I can now see more free memory as below RealMemory=120705 AllocMem=1024 FreeMem=115290 Sockets=32 Boards=1 Thank you William. Regards, Mahmood On Mon, Dec 16, 2019 at 7:55 PM Mahmood Naderan wrote: > >Memory may be being used by

Re: [slurm-users] Small FreeMem is reported by scontrol

2019-12-16 Thread Mahmood Naderan
>Memory may be being used by jobs running, or tasks outside the control of >Slurm running, or possibly NFS buffer cache or similar. You may need to >start an ssh session on the node and look. I checked that. For example, on compute-0-1, I see RealMemory=120705 AllocMem=1024 FreeMem=8442 Sock

[slurm-users] Upgraded Slurm 17.02 to 19.05, now GRPTRESRunMin limits are applied incorrectly

2019-12-16 Thread Renfro, Michael
Hey, folks. I’ve just upgraded from Slurm 17.02 (way behind schedule, I know) to 19.05. The only thing I’ve noticed going wrong is that my user resource limits aren’t being applied correctly. My typical user has a GrpTRESRunMin limit of cpu=144 (1000 CPU-days), and after the upgrade, it app

Re: [slurm-users] Small FreeMem is reported by scontrol

2019-12-16 Thread William Brown
Memory may be being used by jobs running, or tasks outside the control of Slurm running, or possibly NFS buffer cache or similar. You may need to start an ssh session on the node and look. William On Mon, 16 Dec 2019 at 15:38, Mahmood Naderan wrote: > Hi, > With the following output > >Rea

[slurm-users] Small FreeMem is reported by scontrol

2019-12-16 Thread Mahmood Naderan
Hi, With the following output RealMemory=64259 AllocMem=1024 FreeMem=38620 Sockets=32 Boards=1 RealMemory=120705 AllocMem=1024 FreeMem=309 Sockets=32 Boards=1 RealMemory=64259 AllocMem=1024 FreeMem=59334 Sockets=32 Boards=1 RealMemory=64259 AllocMem=1024 FreeMem=282 Sockets=10 Boards=1

[slurm-users] Sreport: given wrong/weird results?

2019-12-16 Thread Thiago Abdo
Hi, I built a small testing cluster before I can put in production, I was testing the sreport capabilities and it is showing some inconsistencies (or maybe/probably I miss understood something). In this Friday our virtual machines was offline, so I would expect sreport to give 0 for all user

Re: [slurm-users] get

2019-12-16 Thread Wiegand, Paul
Okay ... obviously an auto-complete error that I failed to check: Please ignore and accept my apologies. > On Dec 16, 2019, at 7:03 AM, Wiegand, Paul wrote: > > unlock stokes-arcc > get stokes-arcc >

[slurm-users] get

2019-12-16 Thread Wiegand, Paul
unlock stokes-arcc get stokes-arcc

Re: [slurm-users] Slurm 18.08.8 --mem-per-cpu + --exclusive = strange behavior

2019-12-16 Thread Beatrice Charton
Hi Marcus and Bjørn-Helge Thank you for your answers. We don’t use slurm billing. We use system acct billing. I also confirm that with --exclusive, there is a difference between ReqCPUS and AllocCPUS, but --mem-per-cpu was more a --mem-per-task than a --mem-per-cpu : it was associated to ReqCPU