I found the problem. It was not that this node was trying to reach some
machine. It was the other way around, some other machine (running
controller) had this node in the config there, and hence that controller
was trying to reach to this. It was a different slurm cluster. I removed
the config from there and all is fine now.

On Wed, Jun 5, 2024 at 1:12 PM Arnuld <arn...@aganitha.ai> wrote:

> I have built Slurm 23.11.7 on two machines. Both are running Ubuntu 22.04.
> While Slurm runs fine on one machine, on the 2nd machine it does not. First
> machine is both a controller and a node while the 2nd machine is just a
> node. On both machines, I built the Slurm debian package as per the Slurm
> docs instructions. Slurmd logs show this:
>
>  error: unpack_header: protocol_version 9472 not supported
>  error: unpacking header
>  error: destroy_forward: no init
>  error: slurm_receive_msg_and_forward: [[host-4.attlocal.net]:38960]
> failed: Message receive failure
>  error: service_connection: slurm_receive_msg: Message receive failure
>  debug:  _service_connection: incomplete message
>
>
>
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to