[slurm-users] Re: Find out submit host of past job?

2024-08-07 Thread Juergen Salk via slurm-users
ent between jobs, and number of > jobs). We had it on and it nearly ran us out of space on our database host. > That said the data can be really useful depending on the situation. > > -Paul Edmon- > > On 8/7/2024 8:51 AM, Juergen Salk via slurm-users wrote: > > Hi Steffen

[slurm-users] Re: Find out submit host of past job?

2024-08-07 Thread Juergen Salk via slurm-users
Hi Steffen, not sure if this is what you are looking for, but with `AccountingStoreFlags=job_env´ set in slurm.conf, the batch job environment will be stored in the accounting database and can later be retrieved with `sacct -j --env-vars´ command. We find this quite useful for debugging purp

[slurm-users] Re: maxrss reported by sachet is wrong

2024-06-07 Thread Juergen Salk via slurm-users
Hi, to my very best knowledge MaxRSS does report aggregated memory consumption of all tasks but including all the shared libraries that the individual processes uses, even though a shared library is only loaded into memory once regardless of how many processes use it. So shared libraries do count

[slurm-users] Re: cpu distribution question

2024-06-07 Thread Juergen Salk via slurm-users
Hi Alan, unfortunately, process placement in Slurm is kind of black magic for sub-node jobs, i.e. jobs that allocate only a small number of CPUs of a node. I have recently raised a similar question here: https://support.schedmd.com/show_bug.cgi?id=19236 And the buttom line was, that to "reall

[slurm-users] Re: Trying to Track Down root Usage

2024-04-29 Thread Juergen Salk via slurm-users
Hi Jason, do or did you maybe have a reservation for user root in place? sreport accounts resources reserved for a user as well (even if not used by jobs) while sacct reports job accounting only. Best regards Jürgen * Jason Simms via slurm-users [240429 10:47]: > Hello all, > > Each week,

[slurm-users] Re: Avoiding fragmentation

2024-04-09 Thread Juergen Salk via slurm-users
Hi Gerhard, I am not sure if this counts as administrative measure, but we do highly encourage our users to always explicitely specify --nodes=n together with --ntasks-per-node=m (rather than just --ntasks=n*m and omitting --nodes option, which may lead to cores allocated here and there and eve

Re: [slurm-users] What happens if GPU GRES exceeding number of GPUs per node

2024-01-18 Thread Juergen Salk
Hi Wirawan, in general `--gres=gpu:6´ actually means six units of a generic resource named `gpu´ per node. Each unit may or may not be associated with a physical GPU device. I'd check the node configuration for the number of gres=gpu resource units that are configured for that node. scont

Re: [slurm-users] HELP: error between compilation and execution on gpu cluster

2023-05-19 Thread Juergen Salk
Hi, I am not sure if this related to GPUs. I rather think the issue has to do with how your OpenMPI has been built. What does ompi_info command show? Look for "Configure command line" in the output. Does this include '--with-slurm' and '--with-pmi' flags? To my very best knowledge, both flags

Re: [slurm-users] Unexpected negative NICE values

2023-05-03 Thread Juergen Salk
Hi Sebastian, maybe it's a silly thought on my part, but do you have the `enable_user_top´ Option included in your SchedulerParameters configuration? This would allow regular users to use `scontrol top ´ to push some of their jobs ahead of other jobs owned by them and this works internally by

Re: [slurm-users] Getting usage reporting from sacct/sreport

2023-03-26 Thread Juergen Salk
Hi Thomas, I think sreport should actually do what you want out of the box if you have permissions to retrieve that information for other users than yourself. In my understanding, sacct is meant for individual job and job step accounting while sreport is more suitable for aggregated cluster usa

Re: [slurm-users] Interactive jobs using "srun --pty bash" and MPI

2022-11-03 Thread Juergen Salk
term > > salloc: Granted job allocation 65537 > > > > which works as advertised (I'm not sure that i miss xterms or not -- at > least on our cluster we dont configure them explicitly as a primary > terminal tool) > > And thanks also Chris and Jason

Re: [slurm-users] Interactive jobs using "srun --pty bash" and MPI

2022-11-02 Thread Juergen Salk
Hi Em, this is most probably because in Slurm version 20.11 the behaviour of srun was changed to not allow job steps to overlap by default any more. An interactive job launched by `srun --pty bash´ always creates a regular step (step .0), so mpirun or srun will hang when trying to launch anoth

Re: [slurm-users] slurm accounting shows more MaxRSS than physically available memory

2022-11-02 Thread Juergen Salk
Hi Martin, to my very best knowledge MaxRSS does report aggregated memory consumption of all tasks but including all the shared libraries that the individual processes uses, even though a shared library is only loaded into memory once regardless of how many processes use it. So shared librarie

Re: [slurm-users] rpmbuild with custom sysconfdir not working in 21.08.8

2022-05-25 Thread Juergen Salk
Hi, SchedMD also recently changed their online documentation on building RPM packages for Slurm: https://slurm.schedmd.com/quickstart_admin.html They now refer to '_slurm_sysconfdir' macro while it was '_sysconfdir' in previous versions of the documentation. Now it reads: --- snip --- To bui

Re: [slurm-users] rpmbuild with custom sysconfdir not working in 21.08.8

2022-05-24 Thread Juergen Salk
Hi Marko, I have had a very similar issue with setting up a custom path for the Slurm configuration files when using the '%_sysconfdir' macro in .rpmmacros, but this also happened with version 21.08.6 to me. Does it work for you if you use '%_slurm_sysconfdir' instead of '%_sysconfdir' macro in

Re: [slurm-users] Phantom jobs in sreport

2022-05-10 Thread Juergen Salk
Hi William, do those jobs show up when you run `sacctmgr show runaway` command? This command will also give you an option to fix them if it finds jobs in that state. For some more details see https://slurm.schedmd.com/sacctmgr.html and slide #14 from https://slurm.schedmd.com/SLUG19/Troubleshoo

Re: [slurm-users] Slurm 21.08.8-2 upgrade

2022-05-05 Thread Juergen Salk
Hi John, this is really bad news. We have stopped our rolling update from Slurm 21.08.6 to Slurm 21.08.8-1 today for exactly that reason: State of compute nodes already running slurmd 21.08.8-1 suddenly started flapping between responding and not responding but all other nodes that were still r

Re: [slurm-users] Issues with pam_slurm_adopt

2022-04-08 Thread Juergen Salk
Hi Nicolas, it looks like you have pam_access.so placed in your PAM stack *before* pam_slurm_adopt.so so this may get in your way. In fact, the logs indicate that it's pam_access and not pam_slurm_adopt that denies access in the first place: Apr 8 19:11:32 magi46 sshd[20542]: pam_access(sshd:ac

Re: [slurm-users] srun and --cpus-per-task

2022-03-25 Thread Juergen Salk
Hi Bjørn-Helge, that's very similar to what we did as well in order to avoid confusion with Core vs. Threads vs. CPU counts when Hyperthreading is kept enabled in the BIOS. Adding CPUs= (not ) will tell Slurm to only schedule physical cores. We have SelectType=select/cons_res SelectTypePara

[slurm-users] Enforce shell options for job environment when submitting job?

2021-12-06 Thread Juergen Salk
Hi, does anybody know a simple way to enforce certain shell options such as set -o errexit (a.k.a. set -e) set -o pipefail and maybe also set -o nounset (a.k.a. set -u) for the job environment at job submission time (without modifying the batch scripts themselves)? Background of t

Re: [slurm-users] Suspending jobs for file system maintenance

2021-10-25 Thread Juergen Salk
age correctly, `scontrol suspend` sends a SIGSTOP to all > > job processes. The processes remain in memory, but are paused. What > > happens to open file handles, since the underlying filesystem goes away > > and comes back? > > > > Thank you, > > > > On Sat

Re: [slurm-users] Suspending jobs for file system maintenance

2021-10-22 Thread Juergen Salk
in a staggered manner. Best regards Jürgen * Paul Edmon [211019 15:15]: > Yup, we follow the same process for when we do Slurm upgrades, this looks > analogous to our process. > > -Paul Edmon- > > On 10/19/2021 3:06 PM, Juergen Salk wrote: > > Dear all, > >

[slurm-users] Suspending jobs for file system maintenance

2021-10-19 Thread Juergen Salk
Dear all, we are planning to perform some maintenance work on our Lustre file system which may or may not harm running jobs. Although failover functionality is enabled on the Lustre servers we'd like to minimize risk for running jobs in case something goes wrong. Therefore, we thought about s

Re: [slurm-users] is there a way to temporarily freeze an account?

2021-10-06 Thread Juergen Salk
Hi, I think setting MaxSubmitJobs=0 for the association should also do the trick if you don't want to code something special in the submit.lua script. E.g. for a single user: sacctmgr update user set maxsubmitjobs=0 Setting MaxSubmitJobs=-1 will then release this limit. Best regards Jür

Re: [slurm-users] Calculate the GPU usages

2021-09-01 Thread Juergen Salk
Dear Jeherul, sacct is for job accounting, sreport for cluster usage accounting. Did you maybe had any resource reservations for this user in place during that period of time? To my very best knowledge, resource reservations for one or more users do count in terms of cluster usage as reporte

Re: [slurm-users] PrivateData does not filter the billing info "scontrol show assoc_mgr flags=qos"

2021-08-19 Thread Juergen Salk
Hi Hemanta, is PrivateData also set in your slurmdbd.conf? Best regards Juergen * Hemanta Sahu [210818 15:04]: > I am still searching for a solution for this . > > On Fri, Aug 7, 2020 at 1:15 PM Hemanta Sahu > wrote: > > > Hi All, > > > > I have configured in our test cluster "PrivateDa

Re: [slurm-users] [External] What is an easy way to prevent users run programs on the master/login node.

2021-06-11 Thread Juergen Salk
Hi, I can't speak specifically for arbiter but to my very best knowledge this is just how cgroup memory limits work in general, i.e. both, anonymous memory and page cache, always count against the cgroup memory limit. This also applies for memory constraints imposed to compute jobs if Constrain

Re: [slurm-users] Maui equivalent Nodeallocationpolicy

2021-06-07 Thread Juergen Salk
* David Chaffin [210607 14:44]: > > we get a lot of small sub-node jobs that we want to pack together. Maui > does this pretty well with the smallest node that will hold the job, > NODEALLOCATIONPOLICY MINRESOURCE > I can't figure out the slurm equivalent. Default backfill isn't working > well.

Re: [slurm-users] pam_slurm_adopt not working for all users

2021-05-21 Thread Juergen Salk
* Tina Friedrich [210521 16:35]: > If this is simply about quickly accessing nodes that they have jobs on to > check on them - we tell our users to 'srun' into a job allocation (srun > --jobid=XX). Hi Tina, sadly, this does not always work in version 20.11.x any more because of the new non-

Re: [slurm-users] pam_slurm_adopt not working for all users

2021-05-21 Thread Juergen Salk
Hi Loris, this depends largely on whether host-based authentication is configured (which does not seem to be the case for you) and also on how exactly the PAM stack for sshd looks like in /etc/pam.d/sshd. As the rules are worked through in the order they appear in /etc/pam.d/sshd, pam_slurm_adopt

Re: [slurm-users] Determining Cluster Usage Rate

2021-05-16 Thread Juergen Salk
* Juergen Salk [210515 23:54]: > * Christopher Samuel [210514 15:47]: > > > > Usage reported in Percentage of Total > > > > > > > > >   Cluster  TRES Name 

Re: [slurm-users] Determining Cluster Usage Rate

2021-05-15 Thread Juergen Salk
* Christopher Samuel [210514 15:47]: > > Usage reported in Percentage of Total > > > > > >   Cluster  TRES Name    Allocated Down PLND Dow    Idle > > Reserved Reported > > - --

Re: [slurm-users] OpenMPI interactive change in behavior?

2021-04-28 Thread Juergen Salk
Hi John, does it work with `srun --overlap ...´ or if you do `export SLURM_OVERLAP=1´ before running your interactive job? Best regards Jürgen * John DeSantis [210428 09:41]: > Hello all, > > Just an update, the following URL almost mirrors the issue we're seeing: > https://github.com/open-

Re: [slurm-users] Jobs that may still be running at X time?

2021-04-16 Thread Juergen Salk
* Ryan Novosielski [210416 21:33]: > Does anyone have a particularly clever way, either built-in or > scripted, to find out which jobs will still be running at > such-and-such time? Hi Ryan, coincidentally, I just did this today. For exactly the same reason. squeue does have a "%L" format opti

Re: [slurm-users] Grp* Resource Limits on User Associations

2021-04-16 Thread Juergen Salk
* Matthias Leopold [210416 19:35]: > can someone please explain to me why it's possible to set Grp* resource > limits on user associations? What's the use for this? Hi Matthias, this probably does not fully answer your question, but Grp* limits on user associations provide the ability to impos

Re: [slurm-users] derived counters

2021-04-13 Thread Juergen Salk
* Heckes, Frank [210413 12:04]: > This result from a mgmt. - question. How long jobs have to wait (in s, min, > h, day) before they getting executed and > how many jobs are waiting (are queued) for each partition in a certain time > interval. > The first one is easy to find with sacct and sub

Re: [slurm-users] Slurm prolog export variable

2021-03-23 Thread Juergen Salk
Hi Mike, for pushing environvent variables into the job environment, you'll have to use the TaskProlog script (not the regular Prolog script). The location of the TaskProlog script needs to be defined in slurm.conf, e.g. TaskProlog=/etc/slurm/task_prolog and the standard output of TaskProlog

Re: [slurm-users] pam_slurm_adopt always claims now active jobs even when they do

2020-10-24 Thread Juergen Salk
Hi Paul, maybe this is totally unrelated but we also have a similar issue with pam_slurm_adopt in case that ConstrainRAMSpace=no is set in cgroup.conf and more than one job is running on that node. There is a bug report open at: https://bugs.schedmd.com/show_bug.cgi?id=9355 As a workaround we

Re: [slurm-users] How to print a user's creation timestamp from the Slurm database?

2020-01-20 Thread Juergen Salk
* Marcus Wagner [200120 09:17]: > I was astonished about the "Modify Clusters" transactions, so I looked a bit > further: > $> sacctmgr list transactions Action="Modify Clusters" -p > 2020-01-15T00:00:12|Modify > Clusters|slurmadm|name='rcc'|control_host='134.61.193.19', > control_port=6750, last

Re: [slurm-users] How to print a user's creation timestamp from the Slurm database?

2020-01-19 Thread Juergen Salk
* Ole Holm Nielsen [200118 12:06]: > When we have created a new Slurm user with "sacctmgr create user name=xxx", > I would like inquire at a later date about the timestamp for the user > creation. As far as I can tell, the sacctmgr command cannot show such > timestamps. Hi Ole, for me (current

Re: [slurm-users] Question concerning cgroups and

2020-01-17 Thread Juergen Salk
Hi Michael, not sure if this is the root cause of your problem, but SchedMD recommends to set TaskAffinity to "no" in cgroup.conf when using both, task/affinity and task/cgroup, together for TaskPlugin in slurm.conf (see the NOTE for TaskPlugin in slurm.conf). Best regards Ju

Re: [slurm-users] Job completed but child process still running

2020-01-13 Thread Juergen Salk
* Chris Samuel [200113 07:30]: > On 1/13/20 5:55 am, Youssef Eldakar wrote: > > > In an sbatch script, a user calls a shell script that starts a Java > > background process. The job immediately is completed, but the child Java > > process is still running on the compute node. > > > > Is there a

Re: [slurm-users] slurm- not allow a user submmit jobs

2020-01-08 Thread Juergen Salk
Hello Angelines, for me (Slurm 19.05.2) the following command seems to work: sacctmgr update user set maxsubmitjobs=0 Job submission is then rejected with the following message: $ sbatch job.slurm sbatch: error: AssocMaxSubmitJobLimit sbatch: error: Batch job submission failed: Job violates a

Re: [slurm-users] cleanup script after timeout

2019-12-11 Thread Juergen Salk
Hi Brian, can you maybe elaborate on how exactly you verified that your epilog does not run when a job exceeds it's walltime limit? Does it run when the jobs end normally or when a running job is cancelled by the user? I am asking because in our environment the epilog also runs when a job hits

Re: [slurm-users] SLURM_TMPDIR

2019-12-10 Thread Juergen Salk
Hi Angelines, we create a job specific scratch directory in the prolog script but use the task_prolog script to set the environment variable. In prolog: scratch_dir=/your/path /bin/mkdir -p ${scratch_dir} /bin/chmod 700 ${scratch_dir} /bin/chown ${SLURM_JOB_USER} ${scratch_dir} In task_prolog:

Re: [slurm-users] Running job using our serial queue

2019-11-04 Thread Juergen Salk
* David Baker [191104 15:14]: > It looks like the downside of the serial queue is that jobs from > different users can interact quite badly. Hi David, what exactly do you mean with "jobs from different users can interact quite badly"? > [...] On the other hand I wonder if our cgroups setup i

Re: [slurm-users] job priority keeping resources from being used?

2019-11-03 Thread Juergen Salk
Hi, maybe I missed it, but what does squeue say in the reason field for your pending jobs that you expect to slip in? Is your partition maybe configured for exclusive node access, e.g. by setting `OverSubscribe=EXCLUSIVE´? Best regards Jürgen -- Jürgen Salk Scientific Software & Compute Ser

Re: [slurm-users] OverMemoryKill Not Working?

2019-10-25 Thread Juergen Salk
Hi Mike, IIRC, I once did some tests with the very same configuration as your's, i.e. `JobAcctGatherType=jobacct_gather/linux´ and `JobAcctGatherParams=OverMemoryKill´ and got this to work as expected: Jobs were killed when they exceeded the requested amount of memory. This was with Slurm 18.08.7.

Re: [slurm-users] jobacct_gather/linux vs jobacct_gather/cgroup

2019-10-22 Thread Juergen Salk
Dear Chris, I could not find this warning in the slurm.conf man page. So I googled it and found a reference in the Slurm developers documentation: https://slurm.schedmd.com/jobacct_gatherplugins.html However, this web page says in its footer: "Last modified 27 March 2015". So maybe (means: hop

Re: [slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?

2019-10-08 Thread Juergen Salk
> On 19-10-08 10:36, Juergen Salk wrote: > > * Bjørn-Helge Mevik [191008 08:34]: > > > Jean-mathieu CHANTREIN writes: > > > > > > > I tried using, in slurm.conf > > > > TaskPlugin=task/affinity, task/cgroup > > > >

Re: [slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?

2019-10-08 Thread Juergen Salk
* Bjørn-Helge Mevik [191008 08:34]: > Jean-mathieu CHANTREIN writes: > > > I tried using, in slurm.conf > > TaskPlugin=task/affinity, task/cgroup > > SelectTypeParameters=CR_CPU_Memory > > MemLimitEnforce=yes > > > > and in cgroup.conf: > > CgroupAutomount=yes > > ConstrainCores=yes > > C

Re: [slurm-users] After reboot nodes are in state = down

2019-09-27 Thread Juergen Salk
* Rafał Kędziorski [190927 14:58]: > > > > > > you may try setting `ReturnToService=2´ in slurm.conf. > > > > > > > Caveat: A spontaneously rebooting machine may create a "black hole" this > > way. > > > > How do you mean this? Could ReturnToService=2 be a problem? > Hi Rafał, black hole syndr

Re: [slurm-users] After reboot nodes are in state = down

2019-09-27 Thread Juergen Salk
Hi Rafał, you may try setting `ReturnToService=2´ in slurm.conf. Best regards Jürgen -- Jürgen Salk Scientific Software & Compute Services (SSCS) Kommunikations- und Informationszentrum (kiz) Universität Ulm Telefon: +49 (0)731 50-22478 Telefax: +49 (0)731 50-22471 * Rafał Kędziorski [190927

Re: [slurm-users] How to modify the normal QOS

2019-09-26 Thread Juergen Salk
* David Baker [190926 14:12]: > > Currently my normal QOS specifies MaxTRESPU=cpu=1280,nodes=32. I've > tried a number of edits, however I haven't yet found a way of > redefining the MaxTRESPU to be "cpu=1280". In the past I have > resorted to deleting a QOS completely and redefining the whole >

Re: [slurm-users] Advice on setting a partition QOS

2019-09-25 Thread Juergen Salk
* David Baker [190925 15:58]: > Thank you for your reply. So, in respond to your suggestion I > submitted a batch of jobs each asking for 2 cpus. Again I was able > to get 32 jobs running at once. Dear David, this was just meant as a test in order to support the assumption that you do not hit

Re: [slurm-users] Advice on setting a partition QOS

2019-09-25 Thread Juergen Salk
Dear David, as it seems, Slurm counts allocated nodes on a per job basis, i.e. every individual one-core jobs counts as an additional node even if they all run on one and the same node. Can you allocate 64 CPUs at the same time when requesting 2 CPUs per job? We've also had this (somewhat stra

Re: [slurm-users] Heterogeneous HPC

2019-09-19 Thread Juergen Salk
Hallo Mahmood, in our current system (which does not run with Slurm) we have deployed the community edition of Singularity as a software module. https://sylabs.io/singularity/ I have no practical experience yet but from what I've read so far, Singularity is also supposed to work quite well wi

Re: [slurm-users] Maxjobs not being enforced

2019-09-18 Thread Juergen Salk
Dear Tina, probably a stupid question, but is there any other MaxJobs limit defined somewhere else above the user association in resource limit hierarchy? For example, if MaxJobs=1 in the partition/job QOS and MaxJob=100 in the user association, the QOS limit takes precedence over the user

Re: [slurm-users] MPI jobs via mirun vs. srun through PMIx.

2019-09-17 Thread Juergen Salk
* Philip Kovacs [190917 07:43]: > >> I suspect the question, which I also have, is more like: > >> > >>  "What difference does it make whether I use 'srun' or 'mpirun' within > >>    a batch file started with 'sbatch'." > > One big thing would be that using srun gives you resource tracking > an

Re: [slurm-users] MPI jobs via mirun vs. srun through PMIx.

2019-09-17 Thread Juergen Salk
* Loris Bennett [190917 07:46]: > > > >>But I still don't get the point. Why should I favour `srun > >>./my_mpi_program´ > >>over `mpirun ./my_mpi_program´? For me, both seem to do exactly the same > >>thing. No? Did I miss something? > > > >>Best regards > >>Jürgen > > > > Running a single job

[slurm-users] MPI jobs via mirun vs. srun through PMIx.

2019-09-16 Thread Juergen Salk
Dear all, according to https://slurm.schedmd.com/mpi_guide.html I have built Slurm 19.05 with PMIx support enabled and it seems to work for both, OpenMPI and Intel MPI. (I've also set MpiDefault=pmix in slurm.conf.) But I still don't get the point. Why should I favour `srun ./my_mpi_program´ ove

Re: [slurm-users] How can jobs request a minimum available (free) TmpFS disk space?

2019-09-03 Thread Juergen Salk
* Ole Holm Nielsen [190903 11:14]: > How do you dynamically update your gres=localtmp resource according to the > current disk free space? I mean, there is already a TmpFS disk space size > defined in slurm.conf, so how does your gres=localtmp differ from TmpFS? Dear Ole, I think (but please c

Re: [slurm-users] How can jobs request a minimum available (free) TmpFS disk space?

2019-09-03 Thread Juergen Salk
Dear Bjørn-Helge, this is unfortunately no answer to the question but I'd be glad to hear some more thoughts on that, too. We are also going to implement disk quotas for the amount of local scratch space that has been allocated for the job by means of generic resources (e.g. `--gres=scratch:100´

Re: [slurm-users] pam_slurm_adopt and memory constraints?

2019-07-15 Thread Juergen Salk
* Andy Georges [190715 16:17]: > > On Fri, Jul 12, 2019 at 03:21:31PM +0200, Juergen Salk wrote: > > Dear all, > > > > I have configured pam_slurm_adopt in our Slurm test environment by > > following the corresponding documentation: > > > > http

Re: [slurm-users] number of tasks that can run on a node without oversubscribing

2019-07-12 Thread Juergen Salk
Hallo, the cpu vs. cores vs. threads issues also confused me at the very beginning. Although, in general, we do not encourage our users to make use of hyperthreading, we have decided to leave it enabled in the BIOS as there are some use cases that are known to benefit from hyperthreading. I think

[slurm-users] pam_slurm_adopt and memory constraints?

2019-07-12 Thread Juergen Salk
Dear all, I have configured pam_slurm_adopt in our Slurm test environment by following the corresponding documentation: https://slurm.schedmd.com/pam_slurm_adopt.html I've set `PrologFlags=contain´ in slurm.conf and also have task/cgroup enabled along with task/affinity (i.e. `TaskPlugin=task/

Re: [slurm-users] ConstrainRAMSpace=yes and page cache?

2019-06-21 Thread Juergen Salk
* Christopher Samuel [190621 09:59]: > On 6/13/19 5:27 PM, Kilian Cavalotti wrote: > > > I would take a look at the various *KmemSpace options in cgroups.conf, > > they can certainly help with this. > > Specifically I think you'll want: > > ConstrainKmemSpace=no > > to fix this. This happens

Re: [slurm-users] ConstrainRAMSpace=yes and page cache?

2019-06-21 Thread Juergen Salk
e cache for specific > files or directories. > > --- > Sam Gallop > > -Original Message- > From: slurm-users On Behalf Of > Juergen Salk > Sent: 14 June 2019 09:14 > To: Slurm User Community List > Subject: Re: [slurm-users] ConstrainRAMSpace=yes and page

Re: [slurm-users] ConstrainRAMSpace=yes and page cache?

2019-06-14 Thread Juergen Salk
ey can certainly help with this. > > Cheers, -- Kilian > > On Thu, Jun 13, 2019 at 2:41 PM Juergen Salk > wrote: > > > > Dear all, > > > > I'm just starting to get used to Slurm and play around with it in > > a small test environment within our o

[slurm-users] ConstrainRAMSpace=yes and page cache?

2019-06-13 Thread Juergen Salk
Dear all, I'm just starting to get used to Slurm and play around with it in a small test environment within our old cluster. For our next system we will probably have to abandon our current exclusive user node access policy in favor of a shared user policy, i.e. jobs from different users will the