In terms of dependencies, please think about timing. Currently one loop
takes ~70 minutes, and say there is a queue time T for any job. If you
split the slow part to run serial one loop takes ~190 minutes + 2T. The
time for N iterations would be ~ 190N +570*T versus 70N+T.
---
Professor Laurence M
Dependencies is not an appropriate approach.
---
Professor Laurence Marks (Laurie)
www.numis.northwestern.edu
"Research is to see what everybody else has seen, and to think what nobody
else has thought" Albert Szent-Györgyi
On Wed, Dec 20, 2023, 14:40 Renfro, Michael wrote:
> Is this Northweste
Is this Northwestern’s Quest HPC or another one? I know at least a few of the
people involved with Quest, and I wouldn’t have thought they’d be in dire need
of coaching.
And to follow on with Davide’s point, this really sounds like a case for
submitting multiple jobs with dependencies between t
It is a University "supercomputer", not a national facility. Hence they are
not that expert, which is why I am asking here. I am pretty certain that it
is some form of communication issue, but beyond that it is not clear.
If I get suggestions such as "why don't they look for ABC in XYZ" then I
may
Laurence Marks wrote:
> After some (irreproducible) time, often one of the three slow tasks hangs.
> A symptom is that if I try and ssh into the main node of the subtask (which
> is running 128 mpi on the 4 nodes) I get "Authentication failed".
How about asking an admin to check why it hangs?
On 20-12-2023 15:59, Michael Bernasconi wrote:
I'm trying to get slurm working on an Intel 12th gen CPU. slurmd
instantly fails with the error message "Thread count (24) not multiple
of core count (16)".
I have tried adding "SlurmdParameters=config_overrides" to slurm.conf,
and I have experimen
Not an answer to your question, but if the jobs need to be subdivided, why
not submit smaller jobs?
Also, this does not sound like a slurm problem, but rather a code or
infrastructure issue.
Finally, are you typically able to ssh into the main node of each subtask?
In many places that is not allo
Probably not the answer you’re looking for, but in my environment I simply
disabled hyperthreading in the BIOS. This avoided situations where we had
things like “2 processes running on different threads on the same core while
another core is sitting idle”. If you are more often constrained b
I'm trying to get slurm working on an Intel 12th gen CPU. slurmd instantly
fails with the error message "Thread count (24) not multiple of core count
(16)".
I have tried adding "SlurmdParameters=config_overrides" to slurm.conf, and
I have experimented with various combinations of "Sockets",
"Coresp
Thank you, I struggled with that. Very unintuitive to use “create user” on an
existing user! I think I was actually looking at the answer a few times but
assumed they were doing something else, given the syntax.
From: slurm-users on behalf of Michael
Gutteridge
Reply-To: Slurm User Communi
I know that sounds improbable, but please readon.
I am running a reasonably large job on a University supercomputer (not a
national facility) with 12 nodes on 64 core nodes. The job loops through a
sequence of commands some of which are single cpu, but with a slow step
where 3 tasks each with 4 no
11 matches
Mail list logo