And also the DMTCP project.
On 30/10/2020 14:10, Thomas M. Payerle wrote:
On Fri, Oct 30, 2020 at 5:37 AM Loris Bennett
mailto:loris.benn...@fu-berlin.de>> wrote:
Hi Zacarias,
Zacarias Benta mailto:zacar...@lip.pt>> writes:
> Good morning everyone.
>
> I'm having a "is
Thanks Tom,
You are right it is suspend and not pendind that I would like the job
state to go into.
I'll take a look into the *OverTimeLimit *flag and see if it helps.*
*
On 30/10/2020 14:10, Thomas M. Payerle wrote:
On Fri, Oct 30, 2020 at 5:37 AM Loris Bennett
mailto:loris.benn...@fu-b
Hello all.
Is there a way to see how much a job, requesting the given set of TRES
for the given time, would get billed according to the scheduler decisions?
I tried
srun --test-only -t 1-0 -p blade -n 32 --mem 1g
but it only reports the expected start time.
I thought I saw it, but can't find it
Il 30/10/20 14:38, Zacarias Benta ha scritto:
> I know it sound kind o silly giving a limit and at the same time
> allowing for exceptions, but we are trying to prevent the waste of
> valuable cpu time.
Then convince your users to use checkpointing. Then use shorter run
times (we have 24h for 'nor
On Fri, Oct 30, 2020 at 5:37 AM Loris Bennett
wrote:
> Hi Zacarias,
>
> Zacarias Benta writes:
>
> > Good morning everyone.
> >
> > I'm having a "issue", I don't know if it is a "bug or a feature".
> > I've created a QOS: "sacctmgr add qos myqos set GrpTRESMins=cpu=10
> > flags=NoDecay". I know
Hi Loris,
Thanks for taking the time to reply to my message.
We are indeed wanting to limit and not limit at the same time, I know
that it is kind of tricky, but let me try to explain.
Our hpc center currently limits jobs from running for more than 5 days
straight when users submit single core
Hi Zacarias,
Zacarias Benta writes:
> Good morning everyone.
>
> I'm having a "issue", I don't know if it is a "bug or a feature".
> I've created a QOS: "sacctmgr add qos myqos set GrpTRESMins=cpu=10
> flags=NoDecay". I know the limit it too low, but I just wanted to
> give you guys an example.
Hello,
My students cluster has 12 computers that act as "execution node". I have
configured a partition where these 12 computers are defined. When someone
submits a job that requires only one computer, if 12 computers are available,
always job runs in the first defined computer in slurm.conf.