Il 18/11/20 15:15, Jason Simms ha scritto:
> Use of uninitialized value $hash{"2"} in division (/) at /bin/seff line
> 108, line 602.
> Use of uninitialized value $hash{"2"} in division (/) at /bin/seff line
> 108, line 602.
Seems some setups report data in a different format, hence the
uninitia
These log lines about the prolog script looks very suspicious to me:
[2020-11-18T10:19:35.388] debug: [job 110] attempting to run prolog
[/cm/local/apps/cmd/scripts/prolog]
then
[2020-11-18T10:21:10.121] debug: Waiting for job 110's prolog to complete
[2020-11-18T10:21:10.121] debug: Finis
The epilog script does have exit 0 set at the end. Epilogs exit cleanly
when run.
With log set to debug5 I get the following results for any scancel call.
Submit host slurmctld.log
[2020-11-18T10:19:34.944] _slurm_rpc_submit_batch_job: JobId=110
InitPrio=110503 usec=191
[2020-11-18T10:19:35.
Hi;
Check epilog return value which comes from the return value of the last
line of epilog script. Also, you can add a "exit 0" line at the last
line of the epilog script to ensure to get a zero return value for
testing purpose.
Ahmet M.
18.11.2020 20:00 tarihinde William Markuske yazdı:
Dear Peter,
Thanks for your response. Yes, I am running ProctrackType=proctrack/cgroup
The behavior that I was seeing with the default seff, and that Diego saw as
well, was simply that seff was not reporting really any information for a
given job. I'm glad it's working for you, but it doesn't for
On Wed, 18 Nov 2020 09:15:59 -0500
Jason Simms wrote:
> Dear Diego,
>
> A while back, I attempted to make some edits locally to see whether I
> could produce "better" results. Here is a comparison of the output of
> your latest version, and then mine:
I'm not sure what bug or behavior you're se
Hello,
I am having an odd problem where users are unable to kill their jobs
with scancel. Users can submit jobs just fine and when the task
completes it is able to close correctly. However, if a user attempts to
cancel a job via scancel the SIGKILL signals are sent to the step but
don't compl
Hi Navin,
I can't help with the sreport problem, but I did recognize the situation
with the gap in job numbers (the use of federation), and jumped in for
that one.
Since this list is completely populated by volunteers, there is no one
"assigned" to topic areas, but people jump in where they
Thank you Andy.
but when i am trying to get the utilization for the months it says it is
100%.
when i tried to find it using utilization by user it gives me a very
different value which i am unable to understand.
deda1x1466:~ # sreport cluster AccountUtilizationByUser start=10/02/20
end=10/02/2
I see from your subsequent post that you're using a pair of clusters
with a single database, so yes, you are using federation.
The high order bits of the Job ID identify the cluster that ran the job,
so you will typically have a huge gap between ranges of Job IDs.
Andy
On 11/18/2020 9:15 AM,
Dear Diego,
A while back, I attempted to make some edits locally to see whether I could
produce "better" results. Here is a comparison of the output of your latest
version, and then mine:
[root@hpc bin]# seff 24567
Use of uninitialized value $hash{"2"} in division (/) at /bin/seff line
108, line
Are you using federated clusters? If not, check slurm.conf -- do you
have FirstJobId set?
Andy
On 11/18/2020 8:42 AM, navin srivastava wrote:
While running the sacct we found that some jobid are not listing.
5535566 SYNTHLIBT+ stdg_defq stdg_acc 1 COMPLETED
0:0
5535567
While running the sacct we found that some jobid are not listing.
5535566 SYNTHLIBT+ stdg_defq stdg_acc 1 COMPLETED 0:0
5535567 SYNTHLIBT+ stdg_defq stdg_acc 1 COMPLETED 0:0
11016496 jupyter-s+ stdg_defq stdg_acc 1RUNNING 0:0
1
> normal
>
> While generating the report I am able to generate for the local
> cluster(hpc1) without any issue and it looks good. but from the second
> cluster data it
14 matches
Mail list logo