> On Mar 27, 2019, at 9:32 PM, Chris Samuel wrote:
>
> On 27/3/19 2:43 pm, Noam Bernstein wrote:
>
>> Hi fellow slurm users - I’ve been using slurm happily for a few months, but
>> now I feel like it’s gone crazy, and I’m wondering if anyone can explain
>> what
Hi fellow slurm users - I’ve been using slurm happily for a few months, but now
I feel like it’s gone crazy, and I’m wondering if anyone can explain what’s
going on. I have a trivial batch script which I submit multiple times, and
ends up with different numbers of nodes allocated. Does anyone h
> On Mar 21, 2019, at 12:38 PM, Alex Chekholko wrote:
>
> Hey Graziano,
>
> To make your decision more "data-driven", you can pipe your SLURM accounting
> logs into a tool like XDMOD which will make you pie charts of usage by user,
> group, job, gres, etc.
>
> https://open.xdmod.org/8.0/inde
Hello fellow slurm users - can anyone explain what SlurmctlDebug=4 means? I
see in the documentation a list of possible string level names, but I have a
working slurm.conf which uses 3 and 4. Is what documented levels those map to
written anywhere?
> On Nov 9, 2018, at 3:14 AM, Bjørn-Helge Mevik wrote:
>
> Noam Bernstein writes:
>
>> Can anyone shed some light on where the _virtual_ memory limit comes from?
>
> Perhaps it comes from a VSizeFactor setting in slurm.conf:
>
> VSizeFactor
>
Can anyone shed some light on where the _virtual_ memory limit comes from?
We're getting jobs killed with the message
slurmstepd: error: Step 3664.0 exceeded virtual memory limit (79348101120 >
72638634393), being killed
Is this a limit that's dictated by cgroup.conf or by some srun option (like
ground otherwise bash will not process the signal until this command
> finishes
>
> wait # < wait until all the background processes are finished. If a
> signal is received this will stop, process the signal and finish the script.
>
>
> On 7/11/18 21:16, Noam Bernstein
Hi slurm users - I’ve been looking through the slurm prolog/epilog manuals, but
haven’t been able to figure out if there’s a way to get an epilog script
(requested by the user) to run when a job is killed for running out of time,
and have the epilog script be able to detect that (through an env
> On Oct 23, 2018, at 5:35 PM, Noam Bernstein
> wrote:
>
>>
>
>
> Any ideas as to what might be happening?
Could it be that the nodes are missing the RealMemory setting?
Noam
smime.p7s
Description: S/MIME cryptographic signature
> On Oct 20, 2018, at 3:06 AM, Chris Samuel wrote:
>
> On Saturday, 20 October 2018 9:57:16 AM AEDT Noam Bernstein wrote:
>
>> If not, is there another way to do this?
>
> You can use --exclusive for jobs that want whole nodes.
>
> You will likely also want to
Hi - I have a slurm usage question that I haven't been able to figure out from
the docs. We basically have two types of jobs - ones that require entire
nodes, and ones that do not. An additional (minor) complication is that the
nodes have hyperthreading enabled, but we want (usually) to use on
> On Oct 10, 2018, at 12:07 PM, Noam Bernstein
> wrote:
>
>
> slurmd -C confirms that indeed slurm understands the architecture, so that’s
> good. However, removing the CPUs entry from the node list doesn’t change
> anything. It still drains the node. If I just remov
ist item it just picks 1 cpu.
Noam
||
|U.S. NAVAL|
|_RESEARCH_|
LABORATORY
Noam Bernstein, Ph.D.
Center for Materials Physics and Technology
U.S. Naval Research Laboratory
T +1 202 404 8628 F +1 202 404 7546
https://www.nrl.navy.mil <https://www.nrl.navy.mil/>
Hi all - I’m new to slurm, and in many ways it’s been very nice to work with,
but I’m having an issue trying to properly set up thread/core/socket counts on
nodes. Basically, if I don’t specify anything except CPUs, the node is
available, but doesn’t appear to know about cores and hyperthreadin
14 matches
Mail list logo