Re: [slurm-users] gres names

2020-12-15 Thread Loris Bennett
Hi Erik, Erik Bryer writes: > Thanks for your reply. I can't find NVML in the logs going back to > 11/22. dmesg goes back to the last boot, but has no mention of > NVML. Regarding make one up on my own, how does Slurm know string > "xyzzy" corresponds to a tesla gpu, e.g.? As I understand it, S

Re: [slurm-users] gres names

2020-12-15 Thread Erik Bryer
Thanks for your reply. I can't find NVML in the logs going back to 11/22. dmesg goes back to the last boot, but has no mention of NVML. Regarding make one up on my own, how does Slurm know string "xyzzy" corresponds to a tesla gpu, e.g.? Thanks, Erik From: slurm-

Re: [slurm-users] Scripts run slower in slurm?

2020-12-15 Thread Christopher Samuel
On 12/14/20 11:20 pm, Alpha Experiment wrote: It is called using the following submission script: #!/bin/bash #SBATCH --partition=full #SBATCH --job-name="Large" source testenv1/bin/activate python3 multithread_example.py You're not asking for a number of cores, so you'll likely only be getti

Re: [slurm-users] Scripts run slower in slurm?

2020-12-15 Thread Krieger, Donald N.
The times for the two runs suggest that the version run through slurm is using only one core. Best – Don Don Krieger, PhD Research Scientist Department of Neurological Surgery University of Pittsburgh From: slurm-users On Behalf Of Williams, Gareth (IM&T, Black Mountain) Sent: Tuesday, Decemb

Re: [slurm-users] Scripts run slower in slurm?

2020-12-15 Thread Williams, Gareth (IM&T, Black Mountain)
Hi John, I’ll volunteer an opinion. There are circumstances where slurm could contribute to slower overall times for tasks, as slurm can be configured to do pre-job setup and post-job followup (prologue/epilogue). However, you are reporting within-task timing, not overall timing so this is besi

Re: [slurm-users] Burst to AWS cloud

2020-12-15 Thread Alex Chekholko
Hey Sajesh, Each public cloud vendor provides a standard way to create a virtual private network in their infrastructure and connect that private network to your existing private network for your cluster. The devil is in the networking details. So in that case, you can just treat it as a new rac

Re: [slurm-users] gres names

2020-12-15 Thread Michael Di Domenico
you can either make them up on your own or they get spit out by NVML in the slurmd.log file On Tue, Dec 15, 2020 at 12:55 PM Erik Bryer wrote: > > Hi, > > Where do I get the gres names, e.g. "rtx2080ti", to use for my gpus in my > node definitions in slurm.conf? > > Thanks, > Erik

Re: [slurm-users] Burst to AWS cloud

2020-12-15 Thread Sajesh Singh
Brian, Thank you for the info. Will definitely keep you recommendations handy while putting this together. -SS- From: slurm-users On Behalf Of Brian Andrus Sent: Tuesday, December 15, 2020 3:14 PM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] Burst to AWS cloud EXTERNAL SEND

Re: [slurm-users] Burst to AWS cloud

2020-12-15 Thread Brian Andrus
I have done that for several clients. 1. Staging data is a pain. The simplest thing was to have it as part of the job script, or have the job itself be dependent upon a separate staging job. Where bandwidth is an issue, we have implemented bbcp 2. Depending on size and connectivity, you can

[slurm-users] Burst to AWS cloud

2020-12-15 Thread Sajesh Singh
We are currently investigating the use of the cloud scheduling features within an on-site Slurm installation and was wondering if anyone had any experiences that they wish to share of trying to use this feature. In particular I am interested to know: https://slurm.schedmd.com/elastic_computing.

Re: [slurm-users] slurmctld daemon error

2020-12-15 Thread mercan
Oh, yes! sorry, I confuse with the your email and Alpha Experiment's emails. Ahmet M. 15.12.2020 21:59 tarihinde Avery Grieve yazdı: Hi Ahmet, Thank you for your suggestion. I assume you're talking about the SlurmctldHost field in the slurm.conf file? If so, I've got that variable defined

Re: [slurm-users] slurmctld daemon error

2020-12-15 Thread mercan
Hi; I dont know the problem is this, but, I think the setting "ControlMachine=localhost" and not setting a hostname for slurm master node are not good decisions. How compute nodes decide the ip address of the slurm masternode from "localhost". Also, I suggest not using capital letters for any

Re: [slurm-users] slurmctld daemon error

2020-12-15 Thread Avery Grieve
I changed my .service file to write to a log. The slurm daemons are running (manual start) on the compute nodes. I get this on startup with the service enabled: [2020-12-15T18:09:06.412] slurmctld version 20.11.1 started on cluster cluster [2020-12-15T18:09:06.539] No memory enforcing mechanism co

[slurm-users] gres names

2020-12-15 Thread Erik Bryer
Hi, Where do I get the gres names, e.g. "rtx2080ti", to use for my gpus in my node definitions in slurm.conf? Thanks, Erik

Re: [slurm-users] slurmctld daemon error

2020-12-15 Thread Avery Grieve
Maybe a silly question, but where do you find the daemon logs or specify their location? ~Avery Grieve They/Them/Theirs please! University of Michigan On Mon, Dec 14, 2020 at 7:22 PM Alpha Experiment wrote: > Hi, > > I am trying to run slurm on Fedora 33. Upon boot the slurmd daemon is > runni

[slurm-users] slurm/munge problem: invalid credentials

2020-12-15 Thread Olaf Gellert
Hi all, we are setting up a new test cluster to test some features for our next HPC system. On one of the compute nodes we get these messages in the log: [2020-12-15T10:00:21.753] error: Munge decode failed: Invalid credential [2020-12-15T10:00:21.753] auth/munge: _print_cred: ENCODED: Thu Jan 0