Hi Lech,
I'm glad that it is working out well with the modifications you've put in
place! Yes, there can be a huge volume of jobscripts out there. That’s a pretty
good way of dealing with it! . We've backed up 1.1M jobscripts since its
inception 1.5 months ago and aren't too worried yet about t
All,
We have a cluster that is using Azure and nodes are started up as needed.
I have encountered an interesting situation where a user did a loop to
launch 100 jobs using srun. Simple job to just do an 'id' command for
testing.
The intention was to have 100 jobs on 100 machines. The partiti
Hi Christoph,
I suspect that the answer to both of these is no. When I tried to modify an
account I got ...
$ sudo sacctmgr modify account where name=user1 set account=newaccount1
Can't modify the name of an account
Also, the sacctmgr can only reset a user's rawusage, as it only supports a
va
> Hi Chris
>
> You are right in pointing that the job actually runs, despite of the error in
> the sbatch. The customer mention that:
> === start ===
> Problem had usual scenario - job script was submitted and executed, but
> sbatch command returned non-zero exit status to ecflow, which thus as
Hi Jürgen,
I'm not aware of a Slurm-onic way of doing this. As you've said this is the
behaviour cgroups, which Slurm is employing. As I understand it, upon
allocation the page cache is accounted within the calling process's cgroup, and
I'm not aware of way of preventing the memory resource con
Hello Chris,
we’ve tried out your archiver and adapted it to our needs, it works quite well.
The changes:
- we get lots of jobs per day, ca. 3k-5k, so storing them as individual files
would waste too much inodes and 4k-blocks. Instead everything is written into
two log files (job_script.log and
Hi Chris
You are right in pointing that the job actually runs, despite of the error in
the sbatch. The customer mention that:
=== start ===
Problem had usual scenario - job script was submitted and executed, but sbatch
command returned non-zero exit status to ecflow, which thus assumed job to b
Dear Kilian,
thanks for pointing this out. I should have mentioned that I had
already browsed the croups.conf man page up and down but did not find
any specific hints on how to achieve the desired behavior. Maybe I am
still missing something obvious?
Also the kernel cgroups documentation indicate
Christopher Benjamin Coffey writes:
> Hi, you may want to look into increasing the sssd cache length on the
> nodes,
We have thought about that, but it will not solve the problem, only make
it less frequent, I think.
> and improving the network connectivity to your ldap
> directory.
That is so