[slurm-users] The hostname resolution case sensitive

2024-11-06 Thread Bill via slurm-users
Hi, I want to confirm that the hostname resolution is case sensitive in SLURM ? Many thanks, Bill -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: With slurm, how to allocate a whole node for a single multi-threaded process?

2024-08-01 Thread Bill via slurm-users
a nodelist for say those 28 core nodes and then those 64 core nodes. But going back to the original answer, --exclusive, is the answer here. You DO know how many cores you need right? (Scaling study should give you that). And you DO know the memory footprint by past jobs with similar inputs

[slurm-users] Re: Node (anti?) Feature / attribute

2024-06-14 Thread Bill via slurm-users
We've done this though with job_submit.lua. Mostly with OS updates. We add a feature to everything then proceed. Telling users that adding a feature gets you on the "new" nodes. I can send you the snippet if you're using the job_submit.lua script. Bill On 6/14/24 2:18

[slurm-users] Slurm node history / log ?

2023-07-05 Thread Bill Benedetto
Does anything like that already exist in Slurm? Thanks! - Bill +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ Bill Benedetto bbenede...@goodyear.com<mailto:bbenede...@goodyear.com> The Goodyear Tire & Rubber Co. I don't speak for Goodye

Re: [slurm-users] srun --mem issue

2022-12-08 Thread Bill
MEMORY end Bill On 12/7/22 12:03 PM, Felho, Sandor wrote: TransUnion is running a ten-node site using slurm with multiple queues. We have an issue with --mem parameter. The is one user who has read the slurm manual and found the --mem=0. This is giving the maximum memory on the node (500 GiB's)

Re: [slurm-users] Prevent users from updating their jobs

2021-12-16 Thread Bill Wichser
Indeed. We use this and BELIEVE that it works, lol! Bill function slurm_job_modify ( job_desc, job_rec, part_list, modify_uid ) if modify_uid == 0 then return 0 end if job_desc.qos ~= nil then return 1 end return 0 end

Re: [slurm-users] Changing DefaultAccount for user

2021-11-23 Thread Bill Wichser
I usually add "withassoc" for a show user sacctmgr show user loris withassoc Bill On 11/23/21 9:07 AM, Loris Bennett wrote: sacctmgr show user loris accounts

Re: [slurm-users] sreport question when specifying partitions=

2021-11-10 Thread Bill Wichser
Dammit! Completely forgot that I have these right here in my home directory! And I probably used your tools last year when I generated the report. Thank you Ole for making me remember! Bill On 11/10/21 3:08 PM, Ole Holm Nielsen wrote: On 10-11-2021 16:56, Bill Wichser wrote: I can't

Re: [slurm-users] sreport question when specifying partitions=

2021-11-10 Thread Bill Wichser
Thanks. You are right, now that I understand the heading in the man page. Not quite what I was hoping for here! Oh well, back to the drawing board. Thanks all. Bill On 11/10/21 12:04 PM, Michael Gutteridge wrote: My read of the sreport manpage on our currently installed version (21.08

[slurm-users] sreport question when specifying partitions=

2021-11-10 Thread Bill Wichser
I can't seem to figure out how to do a query against a partition. sreport cluster AccountUtilizationByUser user=bill cluster=della, no issues. Works as expected. sreport cluster AccountUtilizationByUser Partitions=cpu cluster=della gives me Unknown condition: Partitions=cpu and

Re: [slurm-users] Weird one - deleting a user

2021-07-27 Thread Bill Wichser
The cluster doesn't exist though. This was what I tried first. [root@della5 bill]# sacctmgr show RunawayJobs cluster=tukey sacctmgr: error: Slurmctld running on cluster tukey is not up, can't check running jobs Bill On 7/27/21 4:59 PM, Carlos Fenoy wrote: Hi, You can cleanup

[slurm-users] Weird one - deleting a user

2021-07-27 Thread Bill Wichser
[root@della5 bill]# sacctmgr -i delete user mable Error with request: Job(s) active, cancel job(s) before remove JobID = 602995 C = tukey A = politics U = mable Yup, when a user has an active job they cannot be deleted from the database. The thing is, this cluster tukey has been

[slurm-users] %x in job names

2021-05-28 Thread Bill Barth
x27;d give a heads up. I don't think our user was being malicious, and their actual -J was #SBATCH -J sd-PBEpvw9040%x Probably a hash and probably machine-generated/unlucky. I hope this helps and is actually a problem report. We're on 18.08.5, so I hope we don't have to go back

Re: [slurm-users] Do not upgrade mysql to 5.7.30!

2020-05-07 Thread Bill Broadley
On 5/6/20 11:30 AM, Dustin Lang wrote: Hi, Ubuntu has made mysql 5.7.30 the default version.  At least with Ubuntu 16.04, this causes severe problems with Slurm dbd (v 17.x, 18.x, and 19.x; not sure about 20). I can confirm that kills slurmdbd on ubuntu 18.04 as well. I had compiled slurm

Re: [slurm-users] tie a reservation to a QoS?

2019-10-28 Thread Bill Wichser
s something we have not had to deal with as CPU time per 30 day sliding window has been accepted, can be quantitatively shown, and just is a much easier way to schedule when ALL resources can be used. Bill On 10/28/19 11:11 AM, Tina Friedrich wrote: Hello, is there a possibility to tie a r

Re: [slurm-users] 19.05 and GPUs vs GRES

2019-09-05 Thread Bill Broadley
Anyone know if the new GPU support allows having a different number of GPUs per node? I found: https://www.ch.cam.ac.uk/computing/slurm-usage Which mentions "SLURM does not support having varying numbers of GPUs per node in a job yet." I have a user with a particularly flexible code that would

Re: [slurm-users] increasing timelimit on array jobs no longer supported?

2019-06-13 Thread Bill Wichser
Thanks. Had no problem setting the individual element of the array. Just thought that it worked differently in the past! Memory apparently isn't what it used to be! Thanks again, Bill On 6/13/19 10:25 AM, Jacob Jenson wrote: scontrol show job

[slurm-users] increasing timelimit on array jobs no longer supported?

2019-06-13 Thread Bill Wichser
# scontrol update jobid=3136818 timelimit+=30-00:00:00 scontrol: error: TimeLimit increment/decrement not supported for job arrays This is new to 18.08.7 it appears. Am I just missing something here? Bill

Re: [slurm-users] Nodes not responding... how does slurm track it?

2019-05-15 Thread Bill Broadley
On 5/15/19 12:34 AM, Barbara Krašovec wrote: > It could be a problem with ARP cache. > > If the number of devices approaches 512, there is a kernel limitation in > dynamic > ARP-cache size and it can result in the loss of connectivity between nodes. We have 162 compute nodes, a dozen or so file

[slurm-users] Nodes not responding... how does slurm track it?

2019-05-14 Thread Bill Broadley
My latest addition to a cluster results in a group of the same nodes periodically getting listed as "not-responding" and usually (but not always) recovering. I increased logging up to debug3 and see messages like: [2019-05-14T17:09:25.247] debug: Spawning ping agent for bigmem[1-9],bm[1,7,9-13

[slurm-users] Power9 ACC922

2019-04-16 Thread Bill Wichser
deployment. Danny says he has heard of no problems but that doesn't mean the folks in the trenches haven't seen issues! Thanks, Bill

Re: [slurm-users] Slurm doesn't call mpiexec or mpirun when run through a GUI app

2019-03-22 Thread Bill Barth
h the others who think that the environment inside the script is likely screwed up. Throwing in a printenv and saving that can't hurt. Bill. -- Bill Barth, Ph.D., Director, HPC bba...@tacc.utexas.edu| Phone: (512) 232-7069 Office: ROC 1.435| Fax: (512) 475-9445

Re: [slurm-users] Lua Job Submit - Setting Features/Constraints

2018-12-19 Thread Bill Wichser
Yes. We use something like this if job_desc.features == nil then job_desc.features = "special" else job_desc.features = job_desc.features .. ",special" end Bill On 12/19/2018 09:27 AM, Kevin Manal

Re: [slurm-users] Slurmctld 18.08.1 and 18.08.3 segfault

2018-11-14 Thread Bill Broadley
On 11/13/18 9:39 PM, Kilian Cavalotti wrote: > Hi Bill, > There are a couple mentions of the same backtrace on the bugtracker, > but that was a long time ago (namely > https://bugs.schedmd.com/show_bug.cgi?id=1557 and > https://bugs.schedmd.com/show_bug.cgi?id=1660, for Slurm 14.

[slurm-users] Slurmctld 18.08.1 and 18.08.3 segfault

2018-11-13 Thread Bill Broadley
After being up since the second week in Oct or so, yesterday our slurm controller started segfaultings. It was compiled/run on ubuntu 16.04.1. Nov 12 14:31:48 nas-11-1 kernel: [2838306.311552] srvcn[9111]: segfault at 58 ip 004b51fa sp 7fbe270efb70 error 4 in slurmctld[40+eb000

Re: [slurm-users] Cgroups and swap with 18.08.1?

2018-10-19 Thread Bill Broadley
On 10/16/18 3:38 AM, Bjørn-Helge Mevik wrote: > Just a tip: Make sure that the kernel has support for constraining swap > space. I believe we once had to reinstall one of our clusters once > because we had forgotten to check that. I tried starting slurmd with -D -v -v -v and got: slurmd: debug:

[slurm-users] Cgroups and swap with 18.08.1?

2018-10-15 Thread Bill Broadley
a 3GB process with --mem=1000: $ ps acux USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND bill 17698 11.1 1.5 2817020 1015392 ? D20:40 0:13 stream\ $ smem User Count Swap USS PSS RSS bill 1 1795552 1017048 1017076

Re: [slurm-users] "fatal: can't stat gres.conf"

2018-07-23 Thread Bill
Hi Alex, Try run nvidia-smi before start slurmd, I also found this issue. I have to run nvidia-smi before slurmd when I reboot system. Regards, Bill -- Original -- From: Alex Chekholko Date: Tue,Jul 24,2018 6:10 AM To: Slurm User Community List Subject: Re

Re: [slurm-users] siesta jobs with slurm, an issue

2018-07-22 Thread Bill Barth
All I can suggest is to check that all the paths you have provided SIESTA are correct (the path to the executable is clearly fine b/c SIESTA starts, but can it fine prime.fdf?). Otherwise start with your local support team. Best, Bill. -- Bill Barth, Ph.D., Director, HPC bba...@tacc.

Re: [slurm-users] default memory request

2018-07-19 Thread Bill
Thank you Peter, Bill -- Original -- From: Peter Kjellström Date: Thu,Jul 19,2018 9:51 PM To: Bill Cc: Slurm User Community List Subject: Re: [slurm-users] default memory request On Thu, 19 Jul 2018 18:57:09 +0800 "Bill" wrote: > Hi , > &

Re: [slurm-users] default memory request

2018-07-19 Thread Bill
Hi , I just found the way , set "DefMemPerCPU=4096" for partition in slurm.conf It will use 4G memory request. Regards, Bill -- Original -- From: "Bill"; Date: Thu, Jul 19, 2018 06:39 PM To: "Slurm User Community&q

[slurm-users] default memory request

2018-07-19 Thread Bill
hanks, Bill

Re: [slurm-users] Is It Possible to change the node order fordifferent partition

2018-06-26 Thread Bill
Thank you Brain, Another question is can one node has different weight for different partition? E.G node1 0.8 in Partition high but 0.5 in partition low? Best regards, Bill -- Original -- From: Brian Andrus Date: Wed,Jun 27,2018 0:44 PM To: slurm-users Subject

[slurm-users] Is It Possible to change the node order for different partition

2018-06-26 Thread Bill
advance, Bill

[slurm-users] PMIX and slurm failure (and fix).

2018-05-17 Thread Bill Broadley
Greetings all, Just wanted to mention I build building the newest slurm on Ubuntu 18.04. Gcc-7.3 is the default compiler, which means that the various dependencies (munge, libevent, hwloc, netloc, pmix, etc) are already available and built with gcc-7.3. I carefully built slurm-17.11.6 + openmpi

Re: [slurm-users] How to check if there's a reservation

2018-05-11 Thread Bill Wichser
2T06:00:00 StartTime=2018-06-12T06:00:00 You'd need more code around that, obviously, to determine if this starttime might hold up the job. Bill On 05/10/2018 04:23 PM, Prentice Bisbal wrote: Dear Slurm Users, We've started using maintenance reservations. As you would expect, this c

Re: [slurm-users] Slurm-17.11.5 + Pmix-2.1.1/Debugging

2018-05-08 Thread Bill Broadley
On 05/08/2018 05:33 PM, Christopher Samuel wrote: > On 09/05/18 10:23, Bill Broadley wrote: > >> It's possible of course that it's entirely an openmpi problem, I'll >> be investigating and posting there if I can't find a solution. > > One of the cha

[slurm-users] Slurm-17.11.5 + Pmix-2.1.1/Debugging

2018-05-08 Thread Bill Broadley
Greetings all, I have slurm-17.11.5, pmix-1.2.4, and openmpi-3.0.1 working on several clusters. I find srun handy for things like: bill@headnode:~/src/relay$ srun -N 2 -n 2 -t 1 ./relay 1 c7-18 c7-19 size= 1, 16384 hops, 2 nodes in 0.03 sec ( 2.00 us/hop) 1953 KB/sec Building was

Re: [slurm-users] Slurm overhead

2018-04-24 Thread Bill Barth
How do you start it? If you use Sys V style startup scripts, then likely /etc/Init.d/slurm stop, but if you;re using systemd, then probably systemctl stop slurm.service (but I don’t do systemd). Best, Bill. Sent from my phone > On Apr 24, 2018, at 11:15 AM, Mahmood Naderan wrote: >

Re: [slurm-users] Slurm overhead

2018-04-22 Thread Bill Barth
handle it for them. Maybe you should look into that after you eliminate direct interference from Slurm. Best, Bill. -- Bill Barth, Ph.D., Director, HPC bba...@tacc.utexas.edu| Phone: (512) 232-7069 Office: ROC 1.435| Fax: (512) 475-9445 On 4/22/18, 1:06 AM, "

Re: [slurm-users] ulimit in sbatch script

2018-04-15 Thread Bill Barth
memory that the node has (minus some padding for the OS, etc.). IS UsePAM enabled in your slurm.conf, maybe that’s doing it. Best, Bill. -- Bill Barth, Ph.D., Director, HPC bba...@tacc.utexas.edu| Phone: (512) 232-7069 Office: ROC 1.435| Fax: (512) 475-9445 On 4/15

Re: [slurm-users] ulimit in sbatch script

2018-04-15 Thread Bill Barth
wants (cgroups, perhaps?). Best, Bill. -- Bill Barth, Ph.D., Director, HPC bba...@tacc.utexas.edu| Phone: (512) 232-7069 Office: ROC 1.435| Fax: (512) 475-9445 On 4/15/18, 1:41 PM, "slurm-users on behalf of Mahmood Naderan" wrote: Excuse me... I

Re: [slurm-users] ulimit in sbatch script

2018-04-15 Thread Bill Barth
/pam.d/sshd file has pam_limits.so in it, that’s probably where the unlimited setting for root is coming from. Best, Bill. -- Bill Barth, Ph.D., Director, HPC bba...@tacc.utexas.edu| Phone: (512) 232-7069 Office: ROC 1.435| Fax: (512) 475-9445 On 4/15/18, 1:26 PM

Re: [slurm-users] What's the best way to suppress core dump files from jobs?

2018-03-21 Thread Bill Barth
better forms of these, but they’re working for us. I guess this counts now as being documented in a public place! Best, Bill. -- Bill Barth, Ph.D., Director, HPC bba...@tacc.utexas.edu| Phone: (512) 232-7069 Office: ROC 1.435| Fax: (512) 475-9445 On 3/21/18, 7:49 AM

Re: [slurm-users] What's the best way to suppress core dump files from jobs?

2018-03-21 Thread Bill Barth
, Bill. -- Bill Barth, Ph.D., Director, HPC bba...@tacc.utexas.edu| Phone: (512) 232-7069 Office: ROC 1.435| Fax: (512) 475-9445 On 3/21/18, 6:08 AM, "slurm-users on behalf of Ole Holm Nielsen" wrote: We experience problems with MPI jobs dumping lots

Re: [slurm-users] #SBATCH options as Bash script parameters

2018-03-18 Thread Bill Barth
going to depend on the shebang line (as to what’s being invoked) bash? csh? python? perl? /usr/bin/env X? So, I’d be surprised if there was a mode for this. Also, would you expect Slurm to delete any options it used from your command line or leave them? Best, Bill. -- Bill Barth, Ph.D., Director

Re: [slurm-users] Automatically setting OMP_NUM_THREADS=SLURM_CPUS_PER_TASK?

2018-03-06 Thread Bill Barth
We do the same at TACC in our base module (which happens to be called “TACC”), and then we document it. Best, Bill. -- Bill Barth, Ph.D., Director, HPC bba...@tacc.utexas.edu| Phone: (512) 232-7069 Office: ROC 1.435| Fax: (512) 475-9445 On 3/6/18, 5:13 PM, "

Re: [slurm-users] Over-riding array limits

2018-02-24 Thread Bill Barth
ThatParameter=100’ or whatever you like to change it. -- Bill Barth, Ph.D., Director, HPC bba...@tacc.utexas.edu| Phone: (512) 232-7069 Office: ROC 1.435| Fax: (512) 475-9445 On 2/23/18, 11:13 PM, "slurm-users on behalf of ~Stack~" wrote: Greetings,

Re: [slurm-users] How to deal with user running stuff in frontend node?

2018-02-15 Thread Bill Barth
We kick them off and lock them out until they respond. Disconnections are common enough that it doesn’t always get their attention. Inability to log back in always does. Best, Bill. Sent from my phone. > On Feb 15, 2018, at 9:25 AM, Patrick Goetz wrote: > > The simple solution i

Re: [slurm-users] Single user consuming all resources of the cluster

2018-02-07 Thread Bill Barth
e probably other ways to do this, but the infrastructure is now historical and set in some stone. Best, Bill. -- Bill Barth, Ph.D., Director, HPC bba...@tacc.utexas.edu| Phone: (512) 232-7069 Office: ROC 1.435| Fax: (512) 475-9445 On 2/7/18, 12:28 AM, "slu

Re: [slurm-users] Single user consuming all resources of the cluster

2018-02-06 Thread Bill Barth
file with job records which our local accounting system consumes to decrement allocation balances, if you care to know). Best, Bill. -- Bill Barth, Ph.D., Director, HPC bba...@tacc.utexas.edu| Phone: (512) 232-7069 Office: ROC 1.435| Fax: (512) 475-9445 On 2/6/18

Re: [slurm-users] Slurm and available libraries

2018-01-17 Thread Bill Barth
use Lmod to make it available and visible to our users. There are more strategies for this than you can imagine, so settle on a few and keep it simple for you! Best, Bill. -- Bill Barth, Ph.D., Director, HPC bba...@tacc.utexas.edu| Phone: (512) 232-7069 Office: ROC 1.435

Re: [slurm-users] lmod and slurm

2017-12-20 Thread Bill Barth
://sourceforge.net/p/lmod/mailman/) which is very active and monitored by the author and a very knowledgeable community. Best, Bill. -- Bill Barth, Ph.D., Director, HPC bba...@tacc.utexas.edu| Phone: (512) 232-7069 Office: ROC 1.435| Fax: (512) 475-9445 On 12/19/17, 8:43 AM

Re: [slurm-users] introduce short delay starting multiple parallel jobs with srun

2017-11-10 Thread Bill Barth
install is much more recent and does support them) for internal reasons, so we provide the Launcher for folks who have similar needs to you. Best, Bill. -- Bill Barth, Ph.D., Director, HPC bba...@tacc.utexas.edu| Phone: (512) 232-7069 Office: ROC 1.435| Fax: (512) 475