Hi all,
thank you for the comment and input.
Yes, it is true, the uppercase is one of the main problem.
After correcting the letter case, the job does not stuck anymore.
However, as Daniel notices, there is memory problem.
Running the same script, the job successfully passes the QOS limit.
Howe
On Friday, 15 November 2019 2:13:15 AM PST Loris Bennett wrote:
> If the contents of the column are wider than the column, they
> will be truncated - this is indicated by the '+'.
You can also use the -p option to sacct to make it parseable (which outputs
the full width of fields too).
--
Thanks!
Prentice
On 11/15/19 6:58 AM, Janne Blomqvist wrote:
On 14/11/2019 20.41, Prentice Bisbal wrote:
Is there any way to see how much a job used the GPU(s) on a cluster
using sacct or any other slurm command?
We have created
https://github.com/AaltoScienceIT/ansible-role-sacct_gpu/ as a
Thank! Nice code and just what I was needing! A few wrinkles:
a) on reading the Gres from scontrol for each job on my version this is on a
TRES record not as an individual Gres. Possibly version/configuration issue.
b) converting pid2id from /proc//cgroup is problematic on array jobs.
Again many
On 14/11/2019 20.41, Prentice Bisbal wrote:
> Is there any way to see how much a job used the GPU(s) on a cluster
> using sacct or any other slurm command?
>
We have created
https://github.com/AaltoScienceIT/ansible-role-sacct_gpu/ as a quick
hack to put GPU utilization stats into the comment fie
Loris Bennett writes:
> Hi Uwe,
>
> Uwe Seher writes:
>
>> Hello!
>> Whats the meaning of the plus sign? I can not fand anything in the
>> documentation. This is the full output when a job is cancelled:
>>
>> 277 1808_Modell_107vh1 CANCELLED+
>> UNLIMITED 2019
Hi Uwe,
Uwe Seher writes:
> Hello!
> Whats the meaning of the plus sign? I can not fand anything in the
> documentation. This is the full output when a job is cancelled:
>
> 277 1808_Modell_107vh1 CANCELLED+ UNLIMITED
> 2019-11-14T11:28:39 2019-11-14T13:12:06
Hello!
Whats the meaning of the plus sign? I can not fand anything in the
documentation. This is the full output when a job is cancelled:
277 1808_Modell_107vh1 CANCELLED+
UNLIMITED 2019-11-14T11:28:39 2019-11-14T13:12:06 01:43:27 115
277.ba+