Re: [slurm-users] [EXT]Re: srun with &&, |, and > oh my!

2023-01-23 Thread Chandler
Williams, Gareth (IM&T, Black Mountain) wrote on 1/23/23 7:55 PM: Be brave and experiment! How far wrong can you go? Hmm I do love breaking and re-fixing things... srun bash -c "cmd1 infile1 | cmd2 opt2 arg2 | cmd3 opt3 arg3 -- > outfile && cmd4 opt4 arg4" Yes this will work! Thanks!

[slurm-users] srun with &&, |, and > oh my!

2023-01-23 Thread Chandler
I want to run a command like: cmd1 infile1 | cmd2 opt2 arg2 | cmd3 opt3 arg3 -- > outfile && cmd4 opt4 arg4 Which runs fine at any prompt. I'm afraid to just put `srun` at the beginning though, would it run the whole set of commands on the compute node? I don't want to try it because it invol

[slurm-users] Jobs stuck with BeginTime and prolog exit status 99:0

2023-01-09 Thread Chandler Sobel-Sorenson
ed out to be caused by our Bright Cluster Manager license expiring, which is involved with managing the slurm demons, among many other things.  Since we didn't need paid support any longer, I just opted for the free license.  After renewing it, slurm began operating correctly again. Bes

Re: [slurm-users] [EXT]Re: How to read job accounting data long output? `sacct -l`

2022-12-16 Thread Chandler Sobel-Sorenson
Bjørn-Helge Mevik wrote on 12/14/22 12:19 AM: Chandler Sobel-Sorenson writes: Perhaps there is a way to import it into a spreadsheet? You can use `sacct -P -l`, which gives you a '|' separated output, which should be possible to import in a spread sheet. Awesome thanks!

Re: [slurm-users] [EXT]Re: How to read job accounting data long output? `sacct -l`

2022-12-16 Thread Chandler Sobel-Sorenson
Awesome thanks! Will Furnass wrote on 12/14/22 1:23 AM: *External Email* If you pipe output into 'less -S' then you get horizontal scrolling. Will

[slurm-users] How to read job accounting data long output? `sacct -l`

2022-12-13 Thread Chandler Sobel-Sorenson
Is there a recommended way to read output from `sacct` involving `-l` or `--long` option?  I have dual monitors and shrunk the terminal's font down to 6 pt or so until I could barely read it, giving me 675 columns.  This was still not enough... Perhaps there is a way of displaying it so the li

[slurm-users] Jobs stuck with BeginTime and prolog exit status 99:0

2022-05-17 Thread Chandler
-- Chandler Sobel-Sorenson (he/him) / Systems Administrator Arizona Genomics Institute www.genome.arizona.edu

Re: [slurm-users] [EXT]Re: only 1 job running

2021-01-28 Thread Chandler
is allocated, leaving none for any other job to run on the unallocated cpus. Brian Andrus On 1/28/2021 2:15 PM, Chandler wrote: Brian Andrus wrote on 1/28/21 13:59: What are the specific requests for resources from a job? Nodes, Cores, Memory, threads, etc? Well the jobs are only asking for

Re: [slurm-users] [EXT]Re: only 1 job running

2021-01-28 Thread Chandler
Brian Andrus wrote on 1/28/21 13:59: What are the specific requests for resources from a job? Nodes, Cores, Memory, threads, etc? Well the jobs are only asking for 16 CPUs each. The 255 threads is weird though, seems to be related to this, https://askubuntu.com/questions/1182818/dual-amd-ep

Re: [slurm-users] [EXT]Re: only 1 job running

2021-01-28 Thread Chandler
OK I'm getting this same output on nodes n[011-013]: # slurmd -C NodeName=n011 slurmd: error: FastSchedule will be removed in 20.02, as will the FastSchedule=0 functionality. Please consider removing this from your configuration now. slurmd: Considering each NUMA node as a socket slurmd: error:

Re: [slurm-users] only 1 job running

2021-01-28 Thread Chandler
Christopher Samuel wrote on 1/28/21 12:50: Did you restart the slurm daemons when you added the new node?  Some internal data structures (bitmaps) are build based on the number of nodes and they need to be rebuild with a restart in this situation. https://slurm.schedmd.com/faq.html#add_nodes

Re: [slurm-users] [EXT]Re: only 1 job running

2021-01-28 Thread Chandler
Andy Riebs wrote on 1/28/21 07:53: If the only changes to your system have been the slurm.conf configuration and the addition of a new node, the easiest way to track this down is probably to show us the diffs between the previous and current versions of slurm.conf, and a note about what's differe

Re: [slurm-users] [EXT]Re: only 1 job running

2021-01-28 Thread Chandler
Brian Andrus wrote on 1/28/21 12:07: scontrol update state=resume nodename=n[011-013] I tried that but got, slurm_update error: Invalid node state specified

Re: [slurm-users] only 1 job running

2021-01-27 Thread Chandler
Made a little bit of progress by running sinfo: PARTITION AVAIL TIMELIMIT NODES STATE NODELIST defq*up infinite 3 drain n[011-013] defq*up infinite 1 alloc n010 not sure why n[011-013] are in drain state, that needs to be fixed. After some searching, I ran: s

[slurm-users] only 1 job running

2021-01-27 Thread Chandler
to get these other jobs started so our task can be completed in a timely manner, and figure out why only 1 job is running when they all should be running. Thanks -- Chandler Sobel-Sorenson / Systems Administrator Arizona Genomics Institute University of Arizona