Re: [slurm-users] error: Couldn't find the specified plugin name for cred/munge looking at all files

2024-01-23 Thread Ryan Novosielski
Ah, I see — no, it’s 24.08. That’s why I didn’t find any reference to it. Carry on! :-D -- #BlackLivesMatter || \\UTGERS, |---*O*--- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 97

Re: [slurm-users] error: Couldn't find the specified plugin name for cred/munge looking at all files

2024-01-23 Thread Jesse Aiton
Yeah, 24.0.8 is the bleeding edge version. I wanted to try the latest in case it was a bug in 20.x.x. I’m happy to go back to any older Slurm version but I don’t think that will matter much if the issue occurs on both Slurm 20 and Slurm 24. git clone https://github.com/SchedMD/slurm.git Thank

Re: [slurm-users] error: Couldn't find the specified plugin name for cred/munge looking at all files

2024-01-23 Thread Ryan Novosielski
On Jan 23, 2024, at 18:14, Jesse Aiton wrote: This is on Ubuntu 20.04 and happens both with Slurm 20.11.09 and 24.0.8 Thank you, Jesse I’m not sure what version you’re actually running, but I don’t believe there is a 24.0.8. The latest version I’m aware of is 23.11.2. -- #BlackLivesMatter __

Re: [slurm-users] error

2024-01-18 Thread Ole Holm Nielsen
On 1/18/24 17:42, Felix wrote: I started a new AMD node, and the error is as follows: "CPU frequency setting not configured for this node" extended looks like this: [2024-01-18T18:28:06.682] CPU frequency setting not configured for this node [2024-01-18T18:28:06.691] slurmd started on Thu, 18

Re: [slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-04-03 Thread Dr. Thomas Orgis
Am Wed, 29 Mar 2023 15:51:51 +0200 schrieb Ole Holm Nielsen : > As for job scheduling, slurmctld may allocate a job to some powered-off > nodes and then calls the ResumeProgram defined in slurm.conf. From this > point it may indeed take 2-3 minutes before a node is up and running > slurmd, dur

Re: [slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-03-29 Thread Ole Holm Nielsen
Hi Thomas, I think the Slurm power_save is not problematic for us with bare-metal on-premise nodes, in contrast to the problems you're having. We use power_save with on-premise nodes where we control the power down/up by means of IPMI commands as provided in the scripts which you will find i

Re: [slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-03-29 Thread Dr. Thomas Orgis
Am Wed, 29 Mar 2023 14:42:33 +0200 schrieb Ben Polman : > I'd be interested in your kludge, we face a similar situation where the > slurmctld node > does not have access to the ipmi network and can not ssh to machines > that have access. > We are thinking on creating a rest interface to a contro

Re: [slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-03-29 Thread Ben Polman
I'd be interested in your kludge, we face a similar situation where the slurmctld node does not have access to the ipmi network and can not ssh to machines that have access. We are thinking on creating a rest interface to a control server which would be running the ipmi commands Ben On 29-

Re: [slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-03-29 Thread Dr. Thomas Orgis
Am Mon, 27 Mar 2023 13:17:01 +0200 schrieb Ole Holm Nielsen : > FYI: Slurm power_save works very well for us without the issues that you > describe below. We run Slurm 22.05.8, what's your version? I'm sure that there are setups where it works nicely;-) For us, it didn't, and I was faced with h

Re: [slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-03-27 Thread Ole Holm Nielsen
Hi Thomas, FYI: Slurm power_save works very well for us without the issues that you describe below. We run Slurm 22.05.8, what's your version? I've documented our setup in this Wiki page: https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_cloud_bursting/#configuring-slurm-conf-for-power-saving T

Re: [slurm-users] error: power_save module disabled, NULL SuspendProgram

2023-03-27 Thread Dr. Thomas Orgis
Am Mon, 06 Mar 2023 13:35:38 +0100 schrieb Stefan Staeglich : > But this fixed not the main error but might have reduced the frequency of > occurring. Has someone observed similar issues? We will try a higher > SuspendTimeout. We had issues with power saving. We powered the idle nodes off, caus

Re: [slurm-users] Error " slurm_receive_msg_and_forward: Zero Bytes were transmitted or received"

2021-12-01 Thread Christopher Samuel
On 12/1/21 5:51 am, Gestió Servidors wrote: I can’t syncronize before with “ntpdate” because when I run “ntpdate -s my_NTP_server”, I only received message “ntpdate: no server suitable for synchronization found”… Yeah, you'll need to make sure your NTP infrastructure is working first. There

Re: [slurm-users] Error " slurm_receive_msg_and_forward: Zero Bytes were transmitted or received"

2021-12-01 Thread Gestió Servidors
Hi, I can't syncronize before with "ntpdate" because when I run "ntpdate -s my_NTP_server", I only received message "ntpdate: no server suitable for synchronization found"... Thanks.-- [cid:image001.jpg@01D7E6C2.E78DE900] Daniel Ruiz Molina Tècnic Mitjà Informàtic Arquitec

Re: [slurm-users] Error " slurm_receive_msg_and_forward: Zero Bytes were transmitted or received"

2021-11-30 Thread Nicolas Greneche
Hi,I had the same issue with ntpd. My ntp service on clients did not synchronize because the drift with the ntp server was too large.Maybe you can synchronize with ntpdate before using ntp service on your clients.Regards,Le 30 nov. 2021 12:23, Gestió Servidors a écrit : Hello,   In last days,

Re: [slurm-users] Error when upgrading to 21.08.1

2021-09-23 Thread Hoot Thompson
Ok, a fresh start after installing the two recommended packages and things appear to be working. Thank for the help! On 9/23/21, 3:04 PM, "slurm-users on behalf of Hoot Thompson" wrote: Do I need to specify the json path in the configure process? On 9/23/21, 2:45 PM, "slurm-users o

Re: [slurm-users] Error when upgrading to 21.08.1

2021-09-23 Thread Hoot Thompson
Do I need to specify the json path in the configure process? On 9/23/21, 2:45 PM, "slurm-users on behalf of Hoot Thompson" wrote: If this useful, note that there's no attempt to build anything in the serializer/json directory. Making all in serializer make[4]: Entering directory

Re: [slurm-users] Error when upgrading to 21.08.1

2021-09-23 Thread Hoot Thompson
If this useful, note that there's no attempt to build anything in the serializer/json directory. Making all in serializer make[4]: Entering directory '/home/ubuntu/slurm-21.08.1/src/plugins/serializer' Making all in url-encoded make[5]: Entering directory '/home/ubuntu/slurm-21.08.1/src/plugins/

Re: [slurm-users] Error when upgrading to 21.08.1

2021-09-23 Thread Hoot Thompson
What's getting built is serializer_url_encoded.a serializer_url_encoded.la serializer_url_encoded.so if this helps. On 9/23/21, 2:10 PM, "slurm-users on behalf of Hoot Thompson" wrote: On Ubuntu 20.04 I installed ... libjson-c-dev Libhttp-parser-dev That work? No joy

Re: [slurm-users] Error when upgrading to 21.08.1

2021-09-23 Thread Hoot Thompson
On Ubuntu 20.04 I installed ... libjson-c-dev Libhttp-parser-dev That work? No joy if so. On 9/23/21, 1:30 PM, "slurm-users on behalf of Ole Holm Nielsen" wrote: On 23-09-2021 16:01, Hoot Thompson wrote: > In upgrading to 21.08.1, slurmctld status reports: > > Sep 23 13:49

Re: [slurm-users] Error when upgrading to 21.08.1

2021-09-23 Thread Ole Holm Nielsen
On 23-09-2021 16:01, Hoot Thompson wrote: In upgrading to 21.08.1, slurmctld status reports: Sep 23 13:49:52 ip-10-10-7-17 systemd[1]: Started Slurm controller daemon. Sep 23 13:49:52 ip-10-10-7-17 slurmctld[1323]: fatal: Unable to find plugin: serializer/json Sep 23 13:49:52 ip-10-10-7-17 s

Re: [slurm-users] error: user not found

2020-09-30 Thread Diego Zuccato
Il 30/09/20 12:33, Marcus Wagner ha scritto: > the submission process runs on the slurmctld, so the user must be known > there. It is. The frontend is the node users use to submit jobs and it's where slurmctld runs. The user is known (he's logged in via ssh). His home is available (NFS share visib

Re: [slurm-users] error: user not found

2020-09-30 Thread Marcus Wagner
Hi Diego, the submission process runs on the slurmctld, so the user must be known there. Best Marcus Am 30.09.2020 um 08:37 schrieb Diego Zuccato: Il 30/09/20 03:49, Brian Andrus ha scritto: Tks for the answer. That means the system has no idea who that user is. But which system? Being a

Re: [slurm-users] error: user not found

2020-09-29 Thread Diego Zuccato
Il 30/09/20 03:49, Brian Andrus ha scritto: Tks for the answer. > That means the system has no idea who that user is. But which system? Being a message generated by slurmctld, I thought it must be the frontend node. But, as I wrote, that system correctly identifies the user (he's logged in, 'id'

Re: [slurm-users] error: user not found

2020-09-29 Thread Brian Andrus
That means the system has no idea who that user is. If you are using /etc/passwd, that file is not synched on the slurm master node(s) If you are part of a domain or other shared directory (ldap, etc), your master is likely not configured right. If you are using SSSD, it is also possible yo

Re: [slurm-users] Error when running srun: error: task X launch failed: Invalid MPI plugin name

2020-04-27 Thread Josep Guerrero
Hi again, > > So does someone have any suggestion about what I could try? > > Please have a look at: > > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=954272 This seems to have worked. Thanks a lot! Just in case someone else is interested, that debian bug thread suggests the following wor

Re: [slurm-users] Error when running srun: error: task X launch failed: Invalid MPI plugin name

2020-04-27 Thread Gennaro Oliva
Hi Josep, On Mon, Apr 27, 2020 at 12:26:56PM +0200, Josep Guerrero wrote: > does not seem to have support for pmix. There seems to be an "openmpi" > option, > but I haven't been able to find documentation on how it is supposed to work. > So, as I understand the situation, Debian openmpi package

Re: [slurm-users] Error buildind rpm on Centos 7

2020-04-08 Thread Alfonso Núñez Slagado
Thanks guys, it took me a while to check the solutions you proposed and both of them works. The mariadb downgrade is a bit tricky using "rpm -e --nodeps" and the solution Ole proposed keep the system updated to the MariaDB 10.4. @Ole, thanks for the guide, is really usefull Alfonso El 7/4/20

Re: [slurm-users] Error buildind rpm on Centos 7

2020-04-07 Thread Ole Holm Nielsen
Hi Alfonso, You just need to get the CentOS 7 prerequisites right, check out my Slurm installation Wiki page: https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#build-slurm-rpms HTH, Ole On 07-04-2020 13:07, Alfonso Núñez Slagado wrote:     I'm trying to build rpm packages running follow

Re: [slurm-users] Error buildind rpm on Centos 7

2020-04-07 Thread William Brown
Search the list archive, I had the same and it was because I had MariaDB installed but as the packaging of MariaDB changed I was missing a required RPM. They split it differently and there is another RPM prerequisite. Can't recall the name just now, but search the archive. William On Tue, 7 Apr

Re: [slurm-users] Error upgrading slurmdbd from 19.05 to 20.02

2020-03-13 Thread Steininger, Herbert
Hi, i guess i found the Problem. It seems to come from this file: src/plugins/accounting_storage/mysql/as_mysql_convert.c in particular from here: --- code --- static int _convert_job_table_pre(mysql_conn_t *mysql_conn, char *cluster_name) { int rc = SLURM_SUCCESS; char *query =

Re: [slurm-users] error: persistent connection experienced an error

2019-12-13 Thread Chris Samuel
On 13/12/19 12:19 pm, Christopher Benjamin Coffey wrote: error: persistent connection experienced an error Looking at the source code that comes from here: if (ufds.revents & POLLERR) { error("persistent connection experienced an error");

Re: [slurm-users] Error when the stdout or stderror path does not exist

2019-03-25 Thread Antonio Knight
I have created a small group of 4 nodes using my lab mates computers to perform calculations overnight. The algorithm has a random component. I have to run the same program with the same input data several thousand times. To distinguish the executions I have tried to create a folder with the co

Re: [slurm-users] Error in job_submit.lua conditional?

2019-02-13 Thread Prentice Bisbal
I managed to figure out why this conditional wasn't working: shortly before this conditional,  I had another conditional that checked for my user_id. If I was submitting a job, it would skip the rest of the job_submit.lua file. I had added this so I could test some new features out that would h

Re: [slurm-users] Error in job_submit.lua conditional?

2019-02-06 Thread Prentice Bisbal
Thanks! Prentice On 2/6/19 11:00 AM, Marcus Wagner wrote: Hi Prentice, there, I might help. I've created a table, e.g.: local userflags = {    --  "" = {    -- "bypass"  = 1, # optional, if you want to bypass the submit_plugin    -- "debug"   = 1, # optional, if you want to

Re: [slurm-users] Error in job_submit.lua conditional?

2019-02-06 Thread Marcus Wagner
Hi Prentice, there, I might help. I've created a table, e.g.: local userflags = {    --  "" = {    -- "bypass"  = 1, # optional, if you want to bypass the submit_plugin    -- "debug"   = 1, # optional, if you want to get debug messages    -- "param"   = 1, # optional,

Re: [slurm-users] Error in job_submit.lua conditional?

2019-02-06 Thread Prentice Bisbal
"Dirty debugging" I like that. I'm going to use that from now on. I have tried that method in the past while debugging other issues. I try not to use it too much, since I don't want these "dirty debugging" messages being seen by users (I don't have a test environment, so I have to test debug in

Re: [slurm-users] Error in job_submit.lua conditional?

2019-02-06 Thread Prentice Bisbal
Whew! I have use 'user_id' in a dozen other conditionals that I tested exhaustively. After reading your first e-mail, I thought I was going crazy. I suspect the issue is some sort of subtle typo or syntax error. I use similar conditionals throughout my job_submit.lua script, and they all work

Re: [slurm-users] Error in job_submit.lua conditional?

2019-02-05 Thread mercan
Hi; I think dirty debugging is required using printf (slurm.log_user), because the lua of our slurm installation returns a lot of variables as nil. You can limit the output to a specific user as below: if job_desc.user_name == "mercan" then     slurm.log_user("job_desc.user_id=")     slurm.l

Re: [slurm-users] Error in job_submit.lua conditional?

2019-02-05 Thread Marcus Wagner
Hmm..., no, I was wrong. IT IS 'user_id'. Now I'm a bit dazzled Marcus On 2/4/19 11:27 PM, Prentice Bisbal wrote: Can anyone see an error in this conditional in my job_submit.lua?     if ( job_desc.user_id == 28922 or job_desc.user_id == 41266 ) and ( job_desc.partition == 'general' or job

Re: [slurm-users] Error in job_submit.lua conditional?

2019-02-05 Thread Marcus Wagner
Hi Prentice, I also hate lua sometimes, as it does not complain, when you hope it would complain. It is called 'userid', not 'user_id', so the first part is all the time false ;) Best Marcus On 2/4/19 11:27 PM, Prentice Bisbal wrote: Can anyone see an error in this conditional in my jo

Re: [slurm-users] Error running jobs with srun

2017-11-09 Thread Elisabetta Falivene
I'll surely produce documentation as soon as I understand how all the cluster is working. (It was something kinda "Here it is the root password and the key to the room. You don't need anything else, don't you?" :) ) Thank to your precious suggestions I was able to get that the common shared space

Re: [slurm-users] Error running jobs with srun

2017-11-08 Thread Lachlan Musicman
On 9 November 2017 at 10:54, Elisabetta Falivene wrote: > I am the admin and I have no documentation :D I'll try The third option. > Thank you very much > Ah. Yes. Well, you will need some sort of drive shared between all the nodes so that they can read and write from a common space. Also, I re

Re: [slurm-users] Error running jobs with srun

2017-11-08 Thread Elisabetta Falivene
I am the admin and I have no documentation :D I'll try The third option. Thank you very much Il giovedì 9 novembre 2017, Lachlan Musicman ha scritto: > On 9 November 2017 at 10:35, Elisabetta Falivene > wrote: > >> Wow, thank you. There's a way to check which directories the master and >> The n

Re: [slurm-users] Error running jobs with srun

2017-11-08 Thread Lachlan Musicman
On 9 November 2017 at 10:35, Elisabetta Falivene wrote: > Wow, thank you. There's a way to check which directories the master and > The nodes share? > There's no explicit way. 1. Check the cluster documentation written by the cluster admins 2. Ask the cluster admins 3. Run "mount" or "cat /etc/m

Re: [slurm-users] Error running jobs with srun

2017-11-08 Thread Elisabetta Falivene
Wow, thank you. There's a way to check which directories the master and The nodes share? Il mercoledì 8 novembre 2017, Lachlan Musicman ha scritto: > On 9 November 2017 at 09:19, Elisabetta Falivene > wrote: > >> I'm getting this message anytime I try to execute any job on my cluster. >> (node

Re: [slurm-users] Error running jobs with srun

2017-11-08 Thread Lachlan Musicman
On 9 November 2017 at 09:19, Elisabetta Falivene wrote: > I'm getting this message anytime I try to execute any job on my cluster. > (node01 is the name of my first of eight nodes and is up and running) > > Trying a python simple script: > *root@mycluster:/tmp# srun python test.py * > *slurmd[nod