Re: [slurm-users] DefMemPerCPU is reset to 1 after upgrade

2018-07-10 Thread Kilian Cavalotti
On Tue, Jul 10, 2018 at 10:34 AM, Taras Shapovalov wrote: > I noticed the commit that can be related to this: > > https://github.com/SchedMD/slurm/commit/bf4cb0b1b01f3e165bf12e69fe59aa7b222f8d8e Yes. See also this bug: https://bugs.schedmd.com/show_bug.cgi?id=5240 This commit will be reverted in

Re: [slurm-users] Finding submitted job script

2018-07-10 Thread Kilian Cavalotti
On Tue, Jul 10, 2018 at 10:05 AM, Jessica Nettelblad wrote: > In the master branch, scontrol write batch_script also has the option to > write the job script to STDOUT instead of a file. This is what we use in the > prolog when we gather information for later (possible) troubleshooting. So I > sup

Re: [slurm-users] cpu limit issue

2018-07-10 Thread Jeffrey Frey
Check the Gaussian log file for mention of its using just 8 CPUs-- just because there are 12 CPUs available doesn't mean the program uses all of them. It will scale-back if 12 isn't a good match to the problem as I recall. /*! @signature Jeffrey Frey, Ph.D @email f...@udel.edu @source iPh

Re: [slurm-users] cpu limit issue

2018-07-10 Thread Renfro, Michael
Gaussian? Look for NProc=8 or similar lines (NPRocShared, could be other options, too) in their input files. There could also be some system-wide parallel settings for Gaussian, but that wouldn’t be the default. > On Jul 10, 2018, at 2:04 PM, Mahmood Naderan wrote: > > Hi, > I see that althoug

[slurm-users] cpu limit issue

2018-07-10 Thread Mahmood Naderan
Hi, I see that although I have specified cpu limit of 12 for a user, his job only utilizes 8 cores. [root@rocks7 ~]# sacctmgr list association format=partition,account,user,grptres,maxwall PartitionAccount User GrpTRES MaxWall -- -- -- - -

Re: [slurm-users] DefMemPerCPU is reset to 1 after upgrade

2018-07-10 Thread stolarek.marcin
What is the change in the commit you're thinking about? Original message From: Taras Shapovalov Date: 10/07/2018 19:34 (GMT+01:00) To: slurm-us...@schedmd.com Subject: [slurm-users] DefMemPerCPU is reset to 1 after upgrade Hey guys, When we upgraded to 17.11.7, then on some

[slurm-users] GPU + no_consume

2018-07-10 Thread Félix C . Morency
Hi, I'm currently playing with SLURM 17.11.7, cgroups and a node with 2 GPUs. Everything works fine if I set the GPU to be consumable. Cgroups are doing their jobs and the right device is allocated to the right job. However, it doesn't work if I set `Gres=gpu:no_consume:2`. For some reason, SLURM d

Re: [slurm-users] DefMemPerCPU is reset to 1 after upgrade

2018-07-10 Thread Roberts, John E.
Hi, I ran into this recently after upgrading from 16.05.10 to 17.11.7 and couldn’t run any jobs on any partitions. The only way I got around this was to set this flag on all “NodeName” definitions in slurm.conf: RealMemory= Where foo is the total memory of the nodes in MB. I believe the documen

[slurm-users] DefMemPerCPU is reset to 1 after upgrade

2018-07-10 Thread Taras Shapovalov
Hey guys, When we upgraded to 17.11.7, then on some clusters all jobs are killed with these messages: slurmstepd: error: Job 374 exceeded memory limit (1308 > 1024), being killed slurmstepd: error: Exceeded job memory limit slurmstepd: error: *** JOB 374 ON node002 CANCELLED AT 2018-06-28T0

Re: [slurm-users] Finding submitted job script

2018-07-10 Thread Mahmood Naderan
Thank you very much. I can see it. Regards, Mahmood On Tue, Jul 10, 2018 at 9:35 PM, Jessica Nettelblad < jessica.nettelb...@gmail.com> wrote: > Since 17.11, there's a command to write the job script to a file: > "scontrol write batch_script job_id optional_filename > Write the batch script fo

Re: [slurm-users] Finding submitted job script

2018-07-10 Thread Jessica Nettelblad
Since 17.11, there's a command to write the job script to a file: "scontrol write batch_script job_id optional_filename Write the batch script for a given job_id to a file. The file will default to slurm-.sh if the optional filename argument is not given. The batch script can only be retrieved by a

[slurm-users] Fwd: An issue with HOSTNAME env var when using salloc/srun for interactive job with Slurm 17.11.7

2018-07-10 Thread CB
Hi, We've recently upgraded to Slurm 17.11.7 from 16.05.8. We noticed that the environment variable, HOSTNAME, does not refelct the compute node with an interactive job using the salloc/srun command. Instead it still points to the submit hostname although .SLURMD_NODENAME reflects the correct co

Re: [slurm-users] Finding submitted job script

2018-07-10 Thread Tina Friedrich
As in the submit script? I believe "scontrol show jobid -dd $JOBID" (with $JOBID being the ID of the job you're after) should show you. (Does for me anyway :) ). Tina On Tuesday, 10 July 2018 20:32:33 BST Mahmood Naderan wrote: > Hi > How can I check the submitted script of a running based on

Re: [slurm-users] Finding submitted job script

2018-07-10 Thread Shenglong Wang
scontrol show job -dd JOBID then search Command= Best, Shenglong > On Jul 10, 2018, at 12:02 PM, Mahmood Naderan wrote: > > Hi > How can I check the submitted script of a running based on its jobid? > > > Regards, > Mahmood > >

Re: [slurm-users] Finding submitted job script

2018-07-10 Thread Kandes, Martin
Hi Mahmood, You should be able to find the script like so with the squeue command: [mkandes@comet-ln2 ~]$ squeue -j 17797604 --Format="command:132" COMMAND /home/jlentfer/S2018/cometTS_7_phe_me_proximal_anti_398_2.sb [mkandes@comet-ln2 ~]$ Marty From: slurm-u

[slurm-users] Finding submitted job script

2018-07-10 Thread Mahmood Naderan
Hi How can I check the submitted script of a running based on its jobid? Regards, Mahmood