Re: [slurm-users] sacct --name --status filtering

2024-01-11 Thread Drucker, Daniel
Yes, that makes sense. Thank you! The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Mass General Brigham Compliance HelpLine at https://www.m

Re: [slurm-users] sacct --name --status filtering

2024-01-10 Thread Christopher Samuel
On 1/10/24 19:39, Drucker, Daniel wrote: What am I misunderstanding about how sacct filtering works here? I would have expected the second command to show the exact same results as the first. You need to specify --end NOW for this to work as expected. From the man page: WITHOUT --jobs AN

Re: [slurm-users] sacct --name --status filtering

2024-01-10 Thread Drucker, Daniel
> All I can say is that this has to do with --starttime and that you have to > read the manual really carefully about how they interact, including when you > have --endtime set. It’s a bit fiddly and annoying, IMO, and I can never > quite remember how it works. Oh, I think I understand. --start

Re: [slurm-users] sacct --name --status filtering

2024-01-10 Thread Ryan Novosielski
All I can say is that this has to do with --starttime and that you have to read the manual really carefully about how they interact, including when you have --endtime set. It’s a bit fiddly and annoying, IMO, and I can never quite remember how it works. -- #BlackLivesMatter || \\UTGERS,

Re: [slurm-users] sacct runtime performance varies on job status codes

2023-09-01 Thread Michael DiDomenico
i can't directly answer you're question, but i suspect there's a missing index somewhere. what i would do is turn on the mysql query log and look at the sql and explain plan associated. it's also possible that since you're a few rev's behind it's already been fixed in a later version, so you coul

Re: [slurm-users] sacct output in tabular form

2021-08-25 Thread Killian Murphy
sacct --parsable2 | column -s '|' -t | less -S has always been useful to me for a glance at it. On Wed, 25 Aug 2021 at 13:41, Jeffrey T Frey wrote: > You've confirmed my suspicion — no one seems to care for Slurm's standard > output formats :-) At UD we did a Python curses wrapper around the >

Re: [slurm-users] sacct output in tabular form

2021-08-25 Thread Jeffrey T Frey
You've confirmed my suspicion — no one seems to care for Slurm's standard output formats :-) At UD we did a Python curses wrapper around the parseable output to turn the terminal window into a navigable spreadsheet of output: https://gitlab.com/udel-itrci/slurm-output-wrappers > On Aug 25,

Re: [slurm-users] sacct output in tabular form

2021-08-25 Thread Ole Holm Nielsen
Hi Sven, On 8/25/21 7:41 AM, Sternberger, Sven wrote: this is a simple wrapper for sacct which prints the output from sacct as table. So you can make a "sacctml -j foo --long" even without two 8k displays ;-) This script works nicely, thanks! However, in stead of an extremely wide display on

Re: [slurm-users] sacct

2020-06-02 Thread Ole Holm Nielsen
Ole Holm Nielsen Sent: 02 June 2020 10:08 To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] sacct On 6/2/20 10:16 AM, Sidhu, Khushwant wrote: When a job is running & I use the command: Sacct –format “AveCPU, AveDiskRead, AveDiskWrite,user” –j 12345 I get values for all paramet

Re: [slurm-users] sacct

2020-06-02 Thread Sidhu, Khushwant
Sent: 02 June 2020 10:08 To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] sacct On 6/2/20 10:16 AM, Sidhu, Khushwant wrote: > When a job is running & I use the command: > > Sacct –format “AveCPU, AveDiskRead, AveDiskWrite,user” –j 12345 > > I get values for all paramet

Re: [slurm-users] sacct

2020-06-02 Thread Ole Holm Nielsen
On 6/2/20 10:16 AM, Sidhu, Khushwant wrote: When a job is running & I use the command: Sacct –format “AveCPU, AveDiskRead, AveDiskWrite,user” –j 12345 I get values for all parameters. However, when a job is completed, the same command returns no values for all but ‘user’. Is there a reason

Re: [slurm-users] sacct returns nothing after reboot

2020-05-13 Thread Roger Mason
Hello, Marcus Boden writes: > the default time window starts at 00:00:00 of the current day: > -S, --starttime > Select jobs in any state after the specified time. Default > is 00:00:00 of the current day, unless the '-s' or '-j' > options are used. If the '

Re: [slurm-users] sacct returns nothing after reboot

2020-05-13 Thread Marcus Boden
Hi, the default time window starts at 00:00:00 of the current day: -S, --starttime Select jobs in any state after the specified time. Default is 00:00:00 of the current day, unless the '-s' or '-j' options are used. If the '-s' option is used, then the

Re: [slurm-users] sacct -c not honor -M clusrername

2020-04-26 Thread Fred Liu
This way is an alternative for “-c”. Is it possible make “-c“ work with “-M”? Thanks. Fred 发件人: Sudeep Narayan Banerjee 发送时间: 星期一, 四月 27, 2020 12:33 上午 收件人: Slurm User Community List; Fred Liu 主题: Re: [slurm-users] sacct -c not honor -M clusrername Dear Fred

Re: [slurm-users] sacct -c not honor -M clusrername

2020-04-26 Thread Sudeep Narayan Banerjee
Dear Fred: should be possible sacct --format=user,state --starttime=04/01/19 --endtime=03/31/20 | grep COMPLETED Please let us know if this helps. Thanks & Regards, Sudeep Narayan Banerjee System Analyst | Scientist B Information System Technology Facility Academic Block 5 | Room 110 Indian I

Re: [slurm-users] sacct does always print all jobs regardless filter parameters with accounting_storage/filetxt

2020-02-02 Thread Dr. Thomas Orgis
Am Fri, 31 Jan 2020 20:57:16 -0800 schrieb Chris Samuel : > You're using a very very very old version of slurm there (15.08) Well, that's what happens when an application gets into the mainstream and is included in the OS distribution. On this cluster, we just try to run with what Ubuntu LTS giv

Re: [slurm-users] sacct does always print all jobs regardless filter parameters with accounting_storage/filetxt

2020-01-31 Thread Chris Samuel
On 30/1/20 10:20 am, Dr. Thomas Orgis wrote: Matching for user (-u) and Job ID (-j) works, but not -N/-S/-E. So is this just the current state and it's up to me to provide a patch to enable it if I want that behaviour? You're using a very very very old version of slurm there (15.08), you shou

Re: [slurm-users] sacct does always print all jobs regardless filter parameters with accounting_storage/filetxt

2020-01-30 Thread Dr. Thomas Orgis
Am Thu, 30 Jan 2020 19:03:38 +0100 schrieb "Dr. Thomas Orgis" : > batch 1548429637 1548429637 - - 0 1 4294536312 > 48 node[09-15,22] (null) > > So, matching for job ID, user name (via numerical uid lookup), > timestamps and the nodes should be possible, it's all there. > > Can someone conf

Re: [slurm-users] sacct does always print all jobs regardless filter parameters with accounting_storage/filetxt

2020-01-30 Thread Dr. Thomas Orgis
Am Thu, 30 Jan 2020 19:07:59 +0300 schrieb mercan : >  Note: The filetxt plugin records only a limited subset of accounting > information and will prevent some sacct options from proper operation. Thank you for looking this up. But since the filetxt does contain the start/end timestamps and th

Re: [slurm-users] sacct does always print all jobs regardless filter parameters with accounting_storage/filetxt

2020-01-30 Thread mercan
hi; From the slurm.conf documentation web page:  Note: The filetxt plugin records only a limited subset of accounting information and will prevent some sacct options from proper operation. regards; Ahmet M. 29.01.2020 21:47 tarihinde Dr. Thomas Orgis yazdı: Hi, I happen to run a small cl

Re: [slurm-users] sacct: job state code CANCELLED+

2019-11-16 Thread Uwe Seher
Hello! I thought it has a deeper meaning, because for node states exists some extensions. Thank you all! Uwe Seher Am Sa., 16. Nov. 2019 um 07:39 Uhr schrieb Chris Samuel : > On Friday, 15 November 2019 2:13:15 AM PST Loris Bennett wrote: > > > If the contents of the column are wider than the c

Re: [slurm-users] sacct: job state code CANCELLED+

2019-11-15 Thread Chris Samuel
On Friday, 15 November 2019 2:13:15 AM PST Loris Bennett wrote: > If the contents of the column are wider than the column, they > will be truncated - this is indicated by the '+'. You can also use the -p option to sacct to make it parseable (which outputs the full width of fields too). --

Re: [slurm-users] sacct: job state code CANCELLED+

2019-11-15 Thread Loris Bennett
Loris Bennett writes: > Hi Uwe, > > Uwe Seher writes: > >> Hello! >> Whats the meaning of the plus sign? I can not fand anything in the >> documentation. This is the full output when a job is cancelled: >> >> 277 1808_Modell_107vh1 CANCELLED+ >> UNLIMITED 2019

Re: [slurm-users] sacct: job state code CANCELLED+

2019-11-15 Thread Loris Bennett
Hi Uwe, Uwe Seher writes: > Hello! > Whats the meaning of the plus sign? I can not fand anything in the > documentation. This is the full output when a job is cancelled: > > 277 1808_Modell_107vh1 CANCELLED+ UNLIMITED > 2019-11-14T11:28:39 2019-11-14T13:12:06

Re: [slurm-users] Sacct selecting jobs outside range

2019-10-17 Thread Bjørn-Helge Mevik
Brian Andrus writes: > When running a report to try and get jobs that start during a particular > day, sacct is returning a number of jobs that show as starting/ending > outside the range. > What could cause this? sacct selects jobs that were eligible to run (including actually running) between

Re: [slurm-users] Sacct selecting jobs outside range

2019-10-16 Thread mercan
Hi; Starttime and Endtime are for any states include PENDING. If you want to restrict only working jobs between start and end time, you should give which states you want using -s parameter. Ahmet M. 16.10.2019 20:31 tarihinde Brian Andrus yazdı: All, When running a report to try and get j

Re: [slurm-users] sacct command to show time for node to start

2019-09-21 Thread Brian Andrus
Lyn, That was it, thanks! sacct -o reserved Brian On 9/21/2019 9:26 AM, Lyn Gerner wrote: Hey Brian, I think the discussion was in the context of suspend/resume, and it was the Reserved value that effectively represents that time. Regards, Lyn On Sat, Sep 21, 2019 at 9:15 AM Brian Andrus

Re: [slurm-users] sacct command to show time for node to start

2019-09-21 Thread Lyn Gerner
Hey Brian, I think the discussion was in the context of suspend/resume, and it was the Reserved value that effectively represents that time. Regards, Lyn On Sat, Sep 21, 2019 at 9:15 AM Brian Andrus wrote: > There was a command shared at the SLUG that showed how long it took a > node to go fro

Re: [slurm-users] sacct thinks slurmctld is not up

2019-07-18 Thread Riebs, Andy
Brian, FWIW, we just restart slurmctld when this happens. I’ll be interested to hear if there’s a proper fix. Andy From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of Brian Andrus Sent: Thursday, July 18, 2019 11:01 AM To: Slurm User Community List Subject: [slurm-use

Re: [slurm-users] sacct issue: jobs staying in "RUNNING" state

2019-07-17 Thread Will Dennis
lurm-users-boun...@lists.schedmd.com] On Behalf Of Will Dennis Sent: Wednesday, July 17, 2019 12:56 PM To: Slurm User Community List Subject: Re: [slurm-users] sacct issue: jobs staying in "RUNNING" state Not thinking that the server (which runs both the Slurm controller daemon and the DB) is the issue

Re: [slurm-users] sacct issue: jobs staying in "RUNNING" state

2019-07-17 Thread Will Dennis
y performance problems with either the OS or MySQL... -Original Message- From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of Brian W. Johanson Sent: Wednesday, July 17, 2019 10:44 AM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] sacct issue: jobs s

Re: [slurm-users] sacct issue: jobs staying in "RUNNING" state

2019-07-17 Thread Brian W. Johanson
On 7/17/19 12:26 AM, Chris Samuel wrote: On 16/7/19 11:43 am, Will Dennis wrote: [2019-07-16T09:36:51.464] error: slurmdbd: agent queue is full (20140), discarding DBD_STEP_START:1442 request So it looks like your slurmdbd cannot keep up with the rate of these incoming steps and is having

Re: [slurm-users] sacct issue: jobs staying in "RUNNING" state

2019-07-16 Thread Chris Samuel
On 16/7/19 11:43 am, Will Dennis wrote: [2019-07-16T09:36:51.464] error: slurmdbd: agent queue is full (20140), discarding DBD_STEP_START:1442 request So it looks like your slurmdbd cannot keep up with the rate of these incoming steps and is having to throw away messages. [2019-07-16T09:40:

Re: [slurm-users] sacct issue: jobs staying in "RUNNING" state

2019-07-16 Thread Will Dennis
A few more things to note: - (Should have mentioned this earlier) running Slurm 17.11.7 ( via https://launchpad.net/~jonathonf/+archive/ubuntu/slurm ) - Restarted slurmctld and slurmdbd, but still getting the slurmdbd errors as before in slurmctld.log - Ran "mysqlcheck --databases slurm_acct_db

Re: [slurm-users] sacct end time for failed jobs

2019-03-06 Thread Paul Edmon
Odds are the new version won't help for that.  You will have to do some mysql work to fix it then. -Paul Edmon- On 3/6/2019 1:23 PM, Brian Andrus wrote: I am running the latest and did that, but it didn't change anything. The jobs stay in the runaway state and no changes are made to the dat

Re: [slurm-users] sacct end time for failed jobs

2019-03-06 Thread Brian Andrus
I am running the latest and did that, but it didn't change anything. The jobs stay in the runaway state and no changes are made to the database. Using 18.08.2-1. Maybe try updating to 19.05.0-0pre1? Brian On 3/6/2019 10:06 AM, Paul Edmon wrote: A lot of this is automated in the new version

Re: [slurm-users] sacct end time for failed jobs

2019-03-06 Thread Paul Edmon
A lot of this is automated in the new versions of slurm.  You should just need to run: sacctmgr show runawayjobs It will then give you an option to clean them and slurm will handle the rest.  If you add the -i option it will just clean them automatically. -Paul Edmon- On 3/6/2019 11:58 AM,

Re: [slurm-users] sacct end time for failed jobs

2019-03-06 Thread Cyrus Proctor
Hi Brian, Others probably have better suggestions before going the route I'm about to detail. If you do go this route, be warned, you definitely have the ability to irrevocably lose data or destroy your Slurm accounting database. Do so at your own risk. I got here with Google-foo after being ou

Re: [slurm-users] sacct end time for failed jobs

2019-03-06 Thread Brian Andrus
It shows several jobs that all have "Unknown" for end_time. Some are PENDING and some are RUNNING (none are truly in either state). It asked to fix them, which I did, but nothing seems to have changed. They still show up with that command and in reports. Brian On 3/5/2019 10:34 PM, Chris

Re: [slurm-users] sacct end time for failed jobs

2019-03-05 Thread Chris Samuel
On Tuesday, 5 March 2019 10:07:30 AM PST Brian Andrus wrote: > Does anyone have a process they use to handle empty (aka "Unknown") end > times for jobs that are not running? What does: sacctmgr list runawayjobs say? -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] sacct end time for failed jobs

2019-03-05 Thread Brian Andrus
Hmm. I have it as an issue as well as several jobs that are in the db without an end time, even though they are not running. Not sure how that happened, but I do want to find a good way to clean it up. Without and end time, sacct reports the jobs as if they continue to run and the total elapsed tim

Re: [slurm-users] sacct end time for failed jobs

2019-02-27 Thread Chris Samuel
On Tuesday, 26 February 2019 10:03:34 AM PST Brian Andrus wrote: > One thing I have noticed is that the END field for jobs with a state of > FAILED is "Unknown" but the ELAPSED field has the time it ran. That shouldn't happen, it works fine here (and where I've used Slurm in Australia). $ sacct

Re: [slurm-users] sacct fields AllocCPUS and ReqMem are empty

2018-08-01 Thread Coulter, John Eric
This reply is a little late, but  I filed a bug report a while back for this problem...  https://bugs.schedmd.com/show_bug.cgi?id=4808 This related issue actually describes which fields are available in JobComp/filetxt, which I *assume* is similar to AccountingStorage/filetxt: https://bugs.schedmd

Re: [slurm-users] sacct: error

2018-05-08 Thread Marcel Sommer
Thanks for the hint, Chris! Best regards, Marcel Am 04.05.2018 um 16:06 schrieb Chris Samuel: > On Friday, 4 May 2018 4:25:04 PM AEST Marcel Sommer wrote: > >> Does anyone have an explanation for this? > > I think you're asking for functionality that is only supported with slurmdbd. > > All the b

Re: [slurm-users] sacct: error

2018-05-07 Thread Eric F. Alemany
Thank you Chris, Marcus, Patrick and Ray. I guess i am still a bit confused. We will se what happen when we run a job asking for the CPU’s of the cluster. _ Eric F. Alemany System Administrator

Re: [slurm-users] sacct: error

2018-05-07 Thread Chris Samuel
On Monday, 7 May 2018 5:41:27 PM AEST Marcus Wagner wrote: > To me it looks like CPUs is the synonym for hardware threads. Interesting, at ${JOB-1} we experimented with HT on a system back in 2013 and I didn't do the slurm.conf side at that time, but then you could only request physical cores a

Re: [slurm-users] sacct: error

2018-05-07 Thread Marcus Wagner
Hi Chris, this is not correct. From the slurm.conf manpage: CPUs: Number of logical processors on the node (e.g. "2").  CPUs and Boards are mutually exclusive. It can be set to the total number of sockets, cores or threads. This can be useful when you want to schedule only  the  cores on a hy

Re: [slurm-users] sacct: error

2018-05-06 Thread Chris Samuel
On Sunday, 6 May 2018 2:58:26 PM AEST Chris Samuel wrote: > Very very interesting - both slurmd and lscpu report 32 cores, but with > differing interpretations of the number of the layout. Meanwhile the AMD > website says these are 16 core CPUs, which means both Slurm and lscpu are > wrong! Of c

Re: [slurm-users] sacct: error

2018-05-05 Thread Chris Samuel
On Sunday, 6 May 2018 2:00:44 AM AEST Eric F. Alemany wrote: > Working on weekends - hey ? [...] This isn't my work. ;-) > It seems as the commands give different result (?) - What do you think ? Very very interesting - both slurmd and lscpu report 32 cores, but with differing interpretations

Re: [slurm-users] sacct: error

2018-05-05 Thread Eric F. Alemany
Hi Chris, Working on weekends - hey ? when i do "slurmd -C” on one of my execute node, i get: eric@radonc01:~$ slurmd -C NodeName=radonc01 slurmd: Considering each NUMA node as a socket CPUs=32 Boards=1 SocketsPerBoard=4 CoresPerSocket=8 ThreadsPerCore=1 RealMemory=64402 UpTime=2-17:35:12 Al

Re: [slurm-users] sacct: error

2018-05-05 Thread Chris Samuel
On Saturday, 5 May 2018 2:45:19 AM AEST Eric F. Alemany wrote: > With Ray suggestion i have a error message for each nodes. Here i am giving > you only one error message from a node. > sacct: error: NodeNames=radonc01 CPUs=32 doesn't match > Sockets*CoresPerSocket*ThreadsPerCore (16), resetting CP

Re: [slurm-users] sacct fields AllocCPUS and ReqMem are empty

2018-05-05 Thread Chris Samuel
On Saturday, 5 May 2018 12:43:32 AM AEST Benjamin Rampe wrote: > I haven't found anything in the documentation that talks about > limitations regarding job accounting. Yeah, the documentation is pretty poor on this. :-( The best I can find is this email to the old slurm-dev list from 6 years ago

Re: [slurm-users] sacct: error

2018-05-04 Thread Eric F. Alemany
Hi Patrick Hi Ray Happy Friday! Thank you both for your quick reply. This is what i found out. With Patrick one liner it works fine. NodeName=radonc[01-04] CPUs=32 RealMemory=64402 Sockets=2 CoresPerSocket=8 ThreadsPerCore=2 With Ray suggestion i have a error message for each nodes. Here i am g

Re: [slurm-users] sacct fields AllocCPUS and ReqMem are empty

2018-05-04 Thread Benjamin Rampe
Hello everybody, On 04/05/18 16:06, Chris Samuel wrote: > On Friday, 4 May 2018 4:25:04 PM AEST Marcel Sommer wrote: >> Does anyone have an explanation for this? > I think you're asking for functionality that is only supported with slurmdbd. I'm interested in that problem too. I haven't found an

Re: [slurm-users] sacct fields AllocCPUS and ReqMem are empty

2018-05-04 Thread Chris Samuel
On Friday, 4 May 2018 4:25:04 PM AEST Marcel Sommer wrote: > Does anyone have an explanation for this? I think you're asking for functionality that is only supported with slurmdbd. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC

Re: [slurm-users] sacct: error

2018-05-04 Thread Patrick Goetz
I concur with this. Make sure your nodes are in the /etc/hosts file on the SMS. Also, if you name them by base + numerical sequence, you can configure them with a single line in Slurm (using the example below): NodeName=radonc[01-04] CPUs=32 RealMemory=64402 Sockets=2 CoresPerSocket=8 Thread

Re: [slurm-users] sacct: error

2018-05-03 Thread Raymond Wan
Hi Eric, On Fri, May 4, 2018 at 6:04 AM, Eric F. Alemany wrote: > # COMPUTE NODES > NodeName=radonc[01-04] NodeAddr=10.112.0.5 10.112.0.6 10.112.0.14 > 10.112.0.16 CPUs=32 RealMemory=64402 Sockets=2 CoresPerSocket=8 > ThreadsPerCore=2 State=UNKNOWN > PartitionName=debug Nodes=radonc[01-04] Def

Re: [slurm-users] sacct not shows user

2018-04-27 Thread Chris Samuel
On Friday, 27 April 2018 9:56:14 PM AEST sysadmin.caos wrote: > I'm using AccountingStorageType=accounting_storage/filetxt because I'm > running some tests. With "filetxt", could I get "account" (username) > with sacct? I can't answer definitively (I've not used filetxt) but I think that is just

Re: [slurm-users] sacct not shows user

2018-04-27 Thread sysadmin.caos
I'm using AccountingStorageType=accounting_storage/filetxt because I'm running some tests. With "filetxt", could I get "account" (username) with sacct?

Re: [slurm-users] sacct not shows user

2018-04-26 Thread Chris Samuel
On Thursday, 26 April 2018 8:20:51 PM AEST sysadmin.caos wrote: > It seems "Account" column always shows "(null)" value. Is it normal or > my SLURM has a wrong configuration? Have you defined any accounts and, if so, added people to them? If you haven't set AccountingStorageEnforce to anything t

Re: [slurm-users] sacct not shows user

2018-04-26 Thread Ole Holm Nielsen
Hi, Did you set up Slurm accounting? Some information is in my Wiki https://wiki.fysik.dtu.dk/niflheim/Slurm_accounting /Ole On 04/26/2018 12:20 PM, sysadmin.caos wrote: Hello, when I run "sacct", output is this:    JobID    JobName  Partition    Account  AllocCPUS State ExitCode