[slurm-users] How are the results produced by 'seff'?

2025-06-12 Thread Loris Bennett via slurm-users
t the memory usage reported by 'seff' is unreliable [2]. Is that indeed the case? Cheers, Loris Footnotes: [1] https://github.com/PrincetonUniversity/jobstats [2] https://doc.dhpc.tudelft.nl/delftblue/Slurm-trouble-shooting/ -- Dr. Loris Bennett (Herr/Mr) FUB-IT, Freie Universit

[slurm-users] Re: Implementing a "soft" wall clock limit

2025-06-11 Thread Loris Bennett via slurm-users
for individual jobs, when requested. We also don't pre-empt any jobs. Apart from that, I imaging implementing your 'soft' limits robustly might be quite challenging and/or time-consuming, as I am not aware that Slurm has anything like that built in. Cheers, Loris > On Wed,

[slurm-users] Re: Implementing a "soft" wall clock limit

2025-06-11 Thread Loris Bennett via slurm-users
n incentive to specify a shorter wallclock limit, if they can. 'sqos' is just an alias for sacctmgr show qos format=name,priority,maxwall,maxjobs,maxsubmitjobs,maxtrespu%20 Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) FUB-IT, Freie Universität Berlin -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Large memory jobs stuck Pending. Should use --time parameter?

2025-05-07 Thread Loris Bennett via slurm-users
tion applies only to Scheduler‐ Type=sched/backfill. Default: 1440 (1 day), Min: 1, Max: 43200 (30 days). Regards Loris Bennett -- Dr. Loris Bennett (Herr/Mr) FUB-IT, Freie Universität Berlin -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: How can we put limits on interactive jobs?

2025-04-25 Thread Loris Bennett via slurm-users
e other partitions for interactive work. This is obviously also even more make-shift :-) Cheers, Loris > Thanks a lot, > Ole > > -- > Ole Holm Nielsen > PhD, Senior HPC Officer > Department of Physics, Technical University of Denmark -- Dr. Loris Bennett (Herr/Mr) FUB-IT, Freie

[slurm-users] Re: Minimum cpu cores per node partition level configuration

2025-04-03 Thread Loris Bennett via slurm-users
resumably, 'small' GPU jobs might potentially have to wait for resources in other partitions, even if resources are free in 'large-gpu'. Do you other policies which ameliorate this? Cheers, Loris [snip (135 lines)] -- Dr. Loris Bennett (Herr/Mr) FUB-IT, Freie Univer

[slurm-users] Unable to receive password reminder

2025-01-14 Thread Loris Bennett via slurm-users
Hi, Over a week ago I sent the message below to the address I found for the list owner, but have not received a response. Does anyone know how to proceed in this case? Cheers, Loris Start of forwarded message From: Loris Bennett To: Subject: Unable

[slurm-users] Re: salloc not starting shell despite LaunchParameters=use_interactive_step

2024-09-06 Thread Loris Bennett via slurm-users
e data points. Cheers, Loris > -Paul Edmon- > > On 9/5/24 10:22 AM, Loris Bennett via slurm-users wrote: >> Jason Simms via slurm-users writes: >> >>> Ours works fine, however, without the InteractiveStepOptions parameter. >> My assumption is also that default v

[slurm-users] Re: salloc not starting shell despite LaunchParameters=use_interactive_step

2024-09-05 Thread Loris Bennett via slurm-users
5a * D-20146 Hamburg * Germany > > Phone: +49 40 460094-221 > Fax:+49 40 460094-270 > Email: be...@dkrz.de > URL:http://www.dkrz.de > > Geschäftsführer: Prof. Dr. Thomas Ludwig > Sitz der Gesellschaft: Hamburg > Amtsgericht Hamburg HRB 39784 > > -

[slurm-users] salloc not starting shell despite LaunchParameters=use_interactive_step

2024-09-05 Thread Loris Bennett via slurm-users
into the compute node: $ ssh c001 [13:39:36] loris@c001 (1000) ~ Is that the expected behaviour or should salloc return a shell directly on the compute node (like srun --pty /bin/bash -l used to do)? Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) FUB-IT, Freie Universität Berlin -- slurm

[slurm-users] Re: Unable to run sequential jobs simultaneously on the same node

2024-08-19 Thread Loris Bennett via slurm-users
a > echo "Running on $(hostname)" > echo "We are in $(pwd)" > > > # run the program > > /home/arkoroy.sps.iitmandi/ferro-detun/input1/a_1.out & You should not write & at the end of the above command. This will ru

[slurm-users] Re: Unable to run sequential jobs simultaneously on the same node

2024-08-18 Thread Loris Bennett via slurm-users
5 005, India > Email: a...@iitmandi.ac.in > Web: https://faculty.iitmandi.ac.in/~arko/ -- Dr. Loris Bennett (Herr/Mr) FUB-IT, Freie Universität Berlin -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: How to exclude master from computing? Set to DRAINED?

2024-06-24 Thread Loris Bennett via slurm-users
ause you are starting 'slurmd' on the node, which implies you do want to run jobs there. Normally you would run only 'slurmctld' and possibly also 'slurmdbd' on your head node. Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) FUB-IT (ex-ZEDAT), Freie Universität Berlin -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: sbatch: Node count specification invalid - when only specifying --ntasks

2024-06-11 Thread Loris Bennett via slurm-users
orge > > -- > George Leaver > Research Infrastructure, IT Services, University of Manchester > http://ri.itservices.manchester.ac.uk | @UoM_eResearch > > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe send an email to slurm-users-le

[slurm-users] Re: sbatch: Node count specification invalid - when only specifying --ntasks

2024-06-10 Thread Loris Bennett via slurm-users
k through release notes back to 22.05.10 but can't see anything > obvious (to me). > > Has this behaviour changed? Or, more likely, what have I missed ;-) ? > > Many thanks, > George > > -- > George Leaver > Research Infrastructure, IT Services, University of Man

[slurm-users] Re: diagnosing why interactive/non-interactive job waits are so long with State=MIXED

2024-06-05 Thread Loris Bennett via slurm-users
tSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s > > grep 672204 /var/log/slurmctld > [2024-06-04T15:50:35.627] sched: _slurm_rpc_allocate_resources JobId=672204 > NodeList=(null) usec=852 > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe s

[slurm-users] Re: GPU GRES verification and some really broad questions.

2024-05-10 Thread Loris Bennett via slurm-users
cluster is not really the fastest so I am planning on having users use the > /tmp/ directory > for speed critical reading and writing, as the OSs have been installed > on NVME drives. Depending on the IO patterns created by a piece of software using the distributed file system might be fine or a

[slurm-users] Re: [EXTERN] Re: scheduling according time requirements

2024-04-30 Thread Loris Bennett via slurm-users
Hi Dietmar, Dietmar Rieder via slurm-users writes: > Hi Loris, > > On 4/30/24 3:43 PM, Loris Bennett via slurm-users wrote: >> Hi Dietmar, >> Dietmar Rieder via slurm-users >> writes: >> >>> Hi Loris, >>> >>> On 4/30/24 2

[slurm-users] Re: [EXTERN] Re: scheduling according time requirements

2024-04-30 Thread Loris Bennett via slurm-users
Hi Dietmar, Dietmar Rieder via slurm-users writes: > Hi Loris, > > On 4/30/24 2:53 PM, Loris Bennett via slurm-users wrote: >> Hi Dietmar, >> Dietmar Rieder via slurm-users >> writes: >> >>> Hi, >>> >>> is it possible to have slur

[slurm-users] Re: scheduling according time requirements

2024-04-30 Thread Loris Bennett via slurm-users
'srun ... --pty bash', as far as I understand, the preferred method is to use 'salloc' as above, and to use 'srun' for starting MPI processes. Cheers, Loris > Thanks so much and sorry for the naive question >Dietmar -- Dr. Loris Bennett

[slurm-users] Re: Avoiding fragmentation

2024-04-08 Thread Loris Bennett via slurm-users
rs are important for us because we have a large number of single core jobs and almost all the users, whether doing MPI or not, significantly overestimate the memory requirements of their jobs. Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) FUB-IT (ex-ZEDAT), Freie Universität Berlin -- slurm-

[slurm-users] Re: Suggestions for Partition/QoS configuration

2024-04-04 Thread Loris Bennett via slurm-users
f our cores. The downside is that very occasionally nodes may idle because a user has reached his or her cap. However, we have usually have enough uncapped users submitting jobs, so that in fact this happens only rarely, such as sometimes at Christmas or New Year. Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) FUB-IT (ex-ZEDAT), Freie Universität Berlin -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] job_submit.lua - uid in Docker cluster

2024-02-14 Thread Loris Bennett via slurm-users
er.uid' has the value 0.0 and is thus not an integer. The only user within the Docker cluster is 'root'. Has anyone come across this issue? Is it to do with the Docker environment or the difference in the OS versions (Lua 5.1.4 vs. 5.3.4, lua-posix 32 vs. 33.3.1)? Cheers, Loris

[slurm-users] Re: Starting a job after a file is created in previous job (dependency looking for soluton)

2024-02-06 Thread Loris Bennett via slurm-users
an specify how many jobs should run simultaneously with the '%' notation: --array=1-200%2 Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) FUB-IT (ex-ZEDAT), Freie Universität Berlin -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: SLURM configuration for LDAP users

2024-02-05 Thread Loris Bennett via slurm-users
cifying > LaunchParameters=enable_nss_slurm in the slurm.conf file and put slurm > keyword in passwd/group > entry in the /etc/nsswitch.conf file. Did these, but didn't help either. > > I am bereft of ideas at present. If anyone has real world experience and can >

[slurm-users] Two jobs each with a different partition running on same node?

2024-01-29 Thread Loris Bennett
be in a single partition. Was this indeed the case and is it still the case with version Slurm 23.02.7? Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) FUB-IT (ex-ZEDAT), Freie Universität Berlin

Re: [slurm-users] slurm-config on NFS-volume

2024-01-24 Thread Loris Bennett
nfigless" Slurm: https://slurm.schedmd.com/configless_slurm.html Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) FUB-IT (ex-ZEDAT), Freie Universität Berlin

Re: [slurm-users] Multifactor fair-share with single account

2024-01-04 Thread Loris Bennett
Hi Kamil, Kamil Wilczek writes: > W dniu 4.01.2024 o 07:56, Loris Bennett pisze: >> Hi Kamil, >> Kamil Wilczek writes: >> >>> Dear All, >>> >>> I have a question regarding the fair-share factor of the multifactor >>> priority a

Re: [slurm-users] Multifactor fair-share with single account

2024-01-03 Thread Loris Bennett
res and thus treated equally by the fair-share mechanism. Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) FUB-IT (ex-ZEDAT), Freie Universität Berlin

Re: [slurm-users] Selecting only a subset of GPU's from all available GPU's

2023-12-17 Thread Loris Bennett
would be much nicer if multiple GPUs types passed to '--gres' were ORed. Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] partition qos without managing users

2023-11-23 Thread Loris Bennett
e to configure some sort of partition QoS so that the number >>> of >>> jobs or cpus is limited for a single user. >>> So far my testing always depends on creating users within the >>> accounting database however I'd like to avoid managing each user and >>> having to create or sync _all_ LDAP users also within Sturm. >>> Or - are there solutions to sync LDAP or AzureAD users to the Slurm >>> accounting database? >>> Thanks for any input. >>> Best - Eg. >>> >> > -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] REST-based CLI tools out there somewhere?

2023-11-10 Thread Loris Bennett
ion in this e-mail or > any attachments. The DRW Companies make no representations that this e-mail > or any attachments are free of computer viruses or other defects. -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] Sinfo options not working in SLURM 23.11

2023-10-29 Thread Loris Bennett
> > > > FPGA* up infinite 1 idle >FPGA01 > > Any pointers will help. Why do you think that the output above is wrong? Cheers, Loris > Regards, > > DJ > -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] Site factor plugin example?

2023-10-24 Thread Loris Bennett
Loris Bennett writes: > Christopher Samuel writes: > >> On 10/13/23 10:10, Angel de Vicente wrote: >> >>> But, in any case, I would still be interested in a site factor >>> plugin example, because I might revisit this in the future. >> >> I don&#x

Re: [slurm-users] Site factor plugin example?

2023-10-17 Thread Loris Bennett
s, I found it. I'll have a go at creating a memory-wasted factor. Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] Site factor plugin example?

2023-10-15 Thread Loris Bennett
Hello Angel, Angel de Vicente writes: > Hello Loris, > > "Loris Bennett" writes: > >> Did you ever find an example or write your own plugin which you could >> provide as a example? > > I'm afraid not (though I didn't persevere, because for the

Re: [slurm-users] Site factor plugin example?

2023-10-13 Thread Loris Bennett
or plugin to start with. > > Do you know of any examples that can set me in the right direction? Did you ever find an example or write your own plugin which you could provide as a example? Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] Fairshare: Penalising unused memory rather than used memory?

2023-10-11 Thread Loris Bennett
o a range of unpreferred behaviour) and >> provides a clear motivation to change. Could be done with QOS unless >> you already use that in a conflicting way. >> Gareth >> Get Outlook for Android <https://aka.ms/ghei36> >>

[slurm-users] Fairshare: Penalising unused memory rather than used memory?

2023-10-10 Thread Loris Bennett
interested in knowing whether one can take into account the *requested but unused memory* when calculating usage. Is this possible? Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] Guarantee minimum amount of GPU resources to a Slurm account

2023-09-13 Thread Loris Bennett
th certain restrictions, such as a shorter maximum run-time. What are the pros and cons of the reservation approach compared with the above partition-based approach? Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] squeue: unrecognized option '--array-unique'

2023-09-12 Thread Loris Bennett
Loris Bennett writes: > Hi, > > Since upgrading to 23.02.5, I am seeing the following error > > $ squeue --array-unique > squeue: unrecognized option '--array-unique' > Try "squeue --help" for more information > > The help for 'squeue

[slurm-users] squeue: unrecognized option '--array-unique'

2023-09-12 Thread Loris Bennett
p array-unique --array-unique display one unique pending job array Is this a regression or is something else going on? Regards Loris Bennett -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] Submitting jobs from machines outside the cluster

2023-08-28 Thread Loris Bennett
ocal machine and then starts jupyter-lab. The users can then can point the browsers on their local machines to a local port and be connected to the session on the compute node. Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] Is there any public scientific-workflow example that can be run through Slurm?

2023-08-20 Thread Loris Bennett
extflow.io/ It is slightly problematic from our point of view, as it does not yet support job arrays. However, there is development activity going on to address this: https://github.com/nextflow-io/nextflow/issues/1477 Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] stopping job array after N failed jobs in row

2023-08-01 Thread Loris Bennett
ophic failure in sbatch-file. If they fail, usually it's bad and >> there is no >> sense to crunch the remaining thousands of job array jobs. >> >> OT: what is the correct terminology for one item in job array... sub-job? >> job-array-job? :) >> >> cheers >> >> josef >-- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] Distribute a single node resources across multiple partitons

2023-07-06 Thread Loris Bennett
any remaining resources on the node are only available via partition A. A second job can only start on N in partition B if no jobs on N are running in partition A. Regards Loris Bennett -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] Slurm commands for launching tasks: salloc and sbatch

2023-07-05 Thread Loris Bennett
se in 30 minutes and you will have to leave. With '--deadline' you can decide case by case. Cheers, Loris > Sent from my iPhone > >> On Jul 5, 2023, at 1:43 AM, Loris Bennett wrote: >> >> Mike Mikailov writes: >> >>> About the last point.

Re: [slurm-users] Slurm commands for launching tasks: salloc and sbatch

2023-07-04 Thread Loris Bennett
lt)|minutes|hours|days|weeks]] [snip (36 lines)] > > Queuing system No Yes > > I am not sure what you mean with the last point, since 'salloc' is also > handled by the queueing system. If the resources requested are > currently not available, 'salloc' will

Re: [slurm-users] Slurm commands for launching tasks: salloc and sbatch

2023-07-04 Thread Loris Bennett
for counting > users tasks and run them. However, I have received different > results in cluster performance for the same task (task execution time is too > long in case of salloc). So my question is what is the difference > between these two commands, that can affect on task performance? Thank you > beforehand. > -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

[slurm-users] Favouring job arrays over individual jobs?

2023-06-29 Thread Loris Bennett
tiple jobs with identical resource requirements :-( Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] Backfill Scheduling

2023-06-27 Thread Loris Bennett
Hi Reed, Reed Dier writes: > On Jun 27, 2023, at 1:10 AM, Loris Bennett > wrote: > > Hi Reed, > > Reed Dier writes: > > Is this an issue with the relative FIFO nature of the priority scheduling > currently with all of the other factors disabled, > or sin

Re: [slurm-users] Backfill Scheduling

2023-06-26 Thread Loris Bennett
array task ID and the way the input files are organised. We are currently not sure about the best way to do this in a suitably generic way. -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] spreading jobs out across the cluster

2023-06-14 Thread Loris Bennett
ering down nodes which are not required. What is your use-case for wanting to spread the jobs out? Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] hi-priority partition and preemption

2023-05-24 Thread Loris Bennett
hat partition. We use QOS to set different priorities, but we don't use preemption. > Since i have jobs thath must run at specific time and must have priority over > all others, is this the correct way to do? For this I would probably use a recurring reservation. Cheers, Loris > Thanks > > FR -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

[slurm-users] Restrictions for new/inefficient users?

2023-05-24 Thread Loris Bennett
er approach? Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread Loris Bennett
gt;> happy about these tools. You're talking about 1 of jobs on one >> hand yet you want fetch the status every 30 seconds? What is the >> point of that other then overloading the scheduler? >> >> We're telling your users not to query the slurm too often and usually >> give 5 minutes as a good interval. You have to let slurm do it's job. >> There is no point in querying in a loop every 30 seconds when we're >> talking about large numbers of jobs. >> >> >> Ward -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] snakemake and slurm in general - correction

2023-02-24 Thread Loris Bennett
Loris Bennett writes: > Hi David, > > (Thanks for changing the subject to something more appropriate). > > David Laehnemann writes: > >> Yes, but only to an extent. The linked conversation ends with this: >> >>>> Do you have any best practice about

Re: [slurm-users] snakemake and slurm in general

2023-02-23 Thread Loris Bennett
Slurm. I have every sympathy for people working on Open Source projects and am very happy to offer assistance and have commented on lack of support for job arrays in Nextflow here: https://github.com/nextflow-io/nextflow/issues/1477 This is in fact where I learned about the potential nega

Re: [slurm-users] snakemake and slurm in general

2023-02-23 Thread Loris Bennett
s: Slurm's job limits are configurable, see this Wiki page: >> https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#maxjobcount-limit >> >> /Ole >> -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-23 Thread Loris Bennett
ly can't create job arrays, and so generates large numbers of jobs with identical resource requirements, which can prevent backfill from working properly. Skimming the documentation for Snakemake, I also could not find any reference to Slurm job arrays, so this could also be an issue. Jus

Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-23 Thread Loris Bennett
y quicker? > > 2) Slurm developers, whether `scontrol` is expected to be quicker from > its implementation and whether using `scontrol` would also be the > option that puts less strain on the scheduler in general? > > Many thanks and best regards, > David -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

[slurm-users] job_res_rm_job: plugin still initializing

2023-02-07 Thread Loris Bennett
SchedMD employee writes I don't think this should ever happen. Has anyone else seen this issue? Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] Debian dist-upgrade?

2023-01-24 Thread Loris Bennett
ou could also dump your database, find a (virtual) machine running some appropriate RedHat-like OS, create the RPMs for the three versions of Slurm you need, install the first one, import your database and then do the updates. Finally you can dump the database again and import it on your Debian 11 system. That would still be a bit of a faff and so still may not be worth it. Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] Using oversubscribe to hammer a node

2023-01-19 Thread Loris Bennett
g to take place, with each process getting a quarter of a core on average. It is not clear that you will actually increase throughput this way. I would probably first turn on hyperthreading to deal with jobs which have intermittent CPU-usage. Still, since Slurm offers the possibility of oversubscrip

Re: [slurm-users] srun --mem issue

2022-12-08 Thread Loris Bennett
Ryan Novosielski writes: >> On Dec 8, 2022, at 21:30, Kilian Cavalotti >> wrote: >> >> Hi Loris, >> >> On Thu, Dec 8, 2022 at 12:59 AM Loris Bennett >> wrote: >>> However, I do have a chronic problem with users requesting too much >>&

Re: [slurm-users] srun --mem issue

2022-12-08 Thread Loris Bennett
Ryan Novosielski writes: > On Dec 8, 2022, at 03:57, Loris Bennett wrote: > > Loris Bennett writes: > > Moshe Mergy writes: > > Hi Sandor > > I personnaly block "--mem=0" requests in file job_submit.lua (slurm 20.02): > > if (job_desc.min_mem_

Re: [slurm-users] srun --mem issue

2022-12-08 Thread Loris Bennett
jobs, even if --mem=0 is specified (I guess). Cheers, Loris > ------- > From: slurm-users on behalf of Loris > Bennett > Se

Re: [slurm-users] srun --mem issue

2022-12-08 Thread Loris Bennett
Loris Bennett writes: > Moshe Mergy writes: > >> Hi Sandor >> >> I personnaly block "--mem=0" requests in file job_submit.lua (slurm 20.02): >> >> if (job_desc.min_mem_per_node == 0 or job_desc.min_mem_per_cpu == 0) then >> s

Re: [slurm-users] srun --mem issue

2022-12-08 Thread Loris Bennett
single job. How can I block a --mem=0 request? > > We are running: > > * OS: RHEL 7 > * cgroups version 1 > * slurm: 19.05 > > Thank you, > > Sandor Felho > > Sr Consultant, Data Science & Analytics > -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] GPU-node not waking up after power-save

2022-10-13 Thread Loris Bennett
the non-GPUs, which do wake up properly. Thanks for confirming that there is no fundamental issue. Cheers, Loris > Best > > Ümit > > > > From: slurm-users on behalf of Loris > Bennett > Date: Thursday, 13. October 2022 at 08:14 > To: Slurm Users Mailing List

[slurm-users] GPU-node not waking up after power-save

2022-10-12 Thread Loris Bennett
ent energy situation, I was wondering whether this a problem others have (had). So does power-saving work in general for GPU nodes and, if so, are there any extra steps one needs to take in order to set things up properly? Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin

Re: [slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Loris Bennett
"Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)" writes: > On Sep 29, 2022, at 10:34 AM, Steffen Grunewald > wrote: > > Hi Noam, > > I'm wondering why one would want to know that - given that there are > approaches to multi-node operation beyond MPI (Charm++ comes to mind)? > > The

Re: [slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Loris Bennett
Hi Ole, Ole Holm Nielsen writes: > Hi Loris, > > On 9/29/22 09:26, Loris Bennett wrote: >> Has anyone already come up with a good way to identify non-MPI jobs which >> request multiple cores but don't restrict themselves to a single node, >> leaving cores idle

Re: [slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Loris Bennett
t does not help you much, but perhaps something to think about > > On Thu, Sep 29, 2022 at 1:29 AM Loris Bennett > wrote: >> >> Hi, >> >> Has anyone already come up with a good way to identify non-MPI jobs which >> request multiple cores but don't restri

[slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Loris Bennett
only one core is actually being used. Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

Re: [slurm-users] What is the complete logic to calculate node number in job_submit.lua

2022-09-26 Thread Loris Bennett
Hi Ole, Ole Holm Nielsen writes: > Hi Loris, > > On 9/26/22 12:51, Loris Bennett wrote: >>> When designing restriction in job_submit.lua, I found there is no member in >>> job_desc struct can directly be used to determine the node number finally >>> alloca

Re: [slurm-users] What is the complete logic to calculate node number in job_submit.lua

2022-09-26 Thread Loris Bennett
writes: > Hi all: > > > > When designing restriction in job_submit.lua, I found there is no member in > job_desc struct can directly be used to determine the node number finally > allocated to a job. The job_desc.min_nodes seem to > be a close answer, but it will be 0xFFFE when user not s

Re: [slurm-users] Providing users with info on wait time vs. run time

2022-09-16 Thread Loris Bennett
ng thousands of jobs. Once we get them to use job array, such problems generally disappear. Cheers, Loris > Regards, > Hermann > > On 9/16/22 9:09 AM, Loris Bennett wrote: >> Hi Hermann, >> Sebastian Potthoff writes: >> >>> Hi Hermann, >>> >>&g

Re: [slurm-users] Providing users with info on wait time vs. run time

2022-09-16 Thread Loris Bennett
e normal Epilog since we wanted to > avoid running slurm as root and I don’t have to worry > about ownership of the output file. Yes, good point. We should look into that. Cheers, Loris > Sebastian > > Am 16.09.2022 um 09:09 schrieb Loris Bennett : > > Hi Hermann, > &

Re: [slurm-users] Providing users with info on wait time vs. run time

2022-09-16 Thread Loris Bennett
Hi Hermann, Sebastian Potthoff writes: > Hi Hermann, > > I happened to read along this conversation and was just solving this issue > today. I added this part to the epilog script to make it work: > > # Add job report to stdout > StdOut=$(/usr/bin/scontrol show job=$SLURM_JOB_ID | /usr/bin/grep

[slurm-users] Providing users with info on wait time vs. run time

2022-09-15 Thread Loris Bennett
be to aggregate the times over, say, a month and provide a the absolute totals and maybe a run-to-wait ratio. Has anyone already done anything like this? Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

Re: [slurm-users] slurmctld hanging

2022-07-29 Thread Loris Bennett
nsure that the priorities of this user's jobs are always higher than everyone else's? Cheers, Loris > On Fri, Jul 29, 2022 at 7:00 AM Loris Bennett > wrote: > > Hi Byron, > > byron writes: > > > Hi Loris - about a second > > What is the use-cas

Re: [slurm-users] slurmctld hanging

2022-07-28 Thread Loris Bennett
is causing your slurmdbd to timeout and that is the error you are seeing. Regards Loris > On Thu, Jul 28, 2022 at 2:47 PM Loris Bennett > wrote: > > Hi Byron, > > byron writes: > > > Hi > > > > We recently upgraded slurm from 19.05.7 to 20.11.

Re: [slurm-users] slurmctld hanging

2022-07-28 Thread Loris Bennett
found in the slurmctld log. > > Can anyone suggest how to even start troubleshooting this? Without anything > in the logs I dont know where to start. > > Thanks Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

Re: [slurm-users] sreport time units explanation

2022-06-22 Thread Loris Bennett
il Wilczek pisze: >> Yes, it is possible, we have 63 GPUs. But I have a problem with >> the interpretation of this value. Specifically, I would like >> to know how it is calculated. I couldn't find it in the docs >> (or I'm just bad at searching :)). >> --

Re: [slurm-users] sreport time units explanation

2022-06-22 Thread Loris Bennett
not? How may GPU cards do you have? We have 24 and our top user for the same time period is 4382(27.41%). This seems reasonable to me. As there are 513 hours in the period, your user would have had to have used around 15 cards fairly continuously. Is that not possible? Cheers, Loris >

Re: [slurm-users] multifactor priority calculation

2022-06-14 Thread Loris Bennett
t; settings are for example: >>>> >>>> PriorityType=priority/multifactor >>>> PriorityWeightJobSize=10 >>>> AccountingStorageTRES=cpu,mem,gres/gpu >>>> PriorityWeightTRES=cpu=1000,mem=2000,gres/gpu=3000 >>>> >>>>

Re: [slurm-users] Performance tracking of array tasks

2022-05-16 Thread Loris Bennett
in an array and on fairshare to do the rest. Cheers, Loris > Thanks, > > William Dear > > ------ > From: slurm-

Re: [slurm-users] Performance tracking of array tasks

2022-05-16 Thread Loris Bennett
nient wrapper around a bunch of jobs. Each element of a job array still has its own job ID, so you can extract job data the same way you do for a non-array job. Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

Re: [slurm-users] non-historical scheduling

2022-04-12 Thread Loris Bennett
lso set PriorityWeightFairshare=0 to remove even the effect of the CPU-usage over the day. Cheers, Loris > > > -Original Message- > From: slurm-users On Behalf Of Loris > Bennett > Sent: Tuesday, April 12, 2022 12:06 PM > To: Slurm User Community List > Subject:

Re: [slurm-users] non-historical scheduling

2022-04-12 Thread Loris Bennett
u have received this email in error, kindly delete > it from your computer > system and notify us at the telephone number or email address appearing > above. The writer asserts in respect of this message and attachments all > rights for confidentiality, privilege or privacy to the fulle

Re: [slurm-users] sreport outputs invalid values due to corrupted data

2022-03-09 Thread Loris Bennett
it relates to the jobs table. Is there a way to fix the data ? Run scontrol show runawayjobs If any are found you should be offered the option of fixing them. Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

Re: [slurm-users] Add partition to existing user association

2022-01-24 Thread Loris Bennett
    U = thekla >> >> However, I cannot set a partition: >> >> sacctmgr modify user thekla account=ops set partition=gpu >>  Unknown option: partition=gpu >>  Use keyword 'where' to modify condition >> >> This is not possible? >> >> The only solution I found to that is to delete the association and create it >> again with the partition: >> >> sacctmgr del user thekla account=ops >> >> sacctmgr add user thekla account=ops partition=gpu >> >> Thank you, >> >> Thekla >> >> > -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

Re: [slurm-users] Does setting 'job_desc.mail_user' in job_submit.lua work?

2022-01-10 Thread Loris Bennett
already > an > enhancement (Bug 11591) but nothing happened so far... > > Regards, > > Alexander > > > Am 10.01.2022 um 11:14 schrieb Loris Bennett: >> Hi, >> >> Does setting 'mail_user' in job_submit.lua actually work in Slurm >> 21.08

Re: [slurm-users] Does setting 'job_desc.mail_user' in job_submit.lua work?

2022-01-10 Thread Loris Bennett
_get_job_req_field() contains 'mail_user'. Cheers, Loris Marcus Boden writes: > Hi Loris, > > I can confirm the problem: I am not able to modify the job_desc.mail_user. > Other > values can be modified, though. > > We are also on 21.08.5 > > Best, > Mar

[slurm-users] Does setting 'job_desc.mail_user' in job_submit.lua work?

2022-01-10 Thread Loris Bennett
rts of the plugin work, but they only read other elements of job_desc and do not modify anything. Am I doing something wrong? Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

Re: [slurm-users] Updated "pestat" tool for printing Slurm nodes status including GRES/GPU

2021-12-17 Thread Loris Bennett
Hi Diego, Diego Zuccato writes: > Hi Loris. > > Il 14/12/2021 14:16, Loris Bennett ha scritto: > >> spectrum, today, via our Zabbix monitoring, I spotted some jobs with an >> unusually high GPU-efficiencies which turned out to be doing >> cryptomining :-/ > W

[slurm-users] Nastygramming (was: Prevent users from updating their jobs)

2021-12-17 Thread Loris Bennett
ou use some kind of framework to automate the actual sending of the nastygrams? 2. What metrics do you use for deciding whether a nastygram regarding resource usage needs to be sent? Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

Re: [slurm-users] Updated "pestat" tool for printing Slurm nodes status including GRES/GPU

2021-12-14 Thread Loris Bennett
them very much. At the opposite end of the usage spectrum, today, via our Zabbix monitoring, I spotted some jobs with an unusually high GPU-efficiencies which turned out to be doing cryptomining :-/ Cheers, Loris -- Dr. Loris Bennett (Herr/Mr) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de

Re: [slurm-users] Updated "pestat" tool for printing Slurm nodes status including GRES/GPU

2021-12-13 Thread Loris Bennett
Hi Ole, The new version looks good to me. Cheers, Loris Ole Holm Nielsen writes: > Hi Loris, > > I fixed errors in the hostnamelength calculation and formatting. > Could you grab the latest pestat and test it? > > Thanks, > Ole > > On 12/13/21 13:56, Loris Bennett

  1   2   3   >