IIRC, that is because it is trying to do the 'configless' feature of slurm 20 where it uses DNS entries to find the config.

This will happen if /etc/slurm.conf does not exist on the node.

Check that you have that and that it is the same as the one on the master.

Brian Andrus

On 8/24/2020 7:03 AM, Lars Kloo wrote:

Dear Sean,

’/usr/local/sbin/slurmd -D –vvvv’ gave the following error (same as when running from systemctl):

slurmd: error: _fetch_child: failed to fetch remote configs

I have debug level 5 for both slurmctld and slurmd in slurm.conf, so there may be little more to extract in form of messages.

I am starting to think that the error is in the set-up files of named, alternatively in the network interface scripts. They should work, but slurmd seems to require more.

-/etc/resolv.conf looks correct with both internal and external nameservers and domains on the master, and only the internal on the client

-However, tracking the master named log file while starting slurmd on the client, it looks like slurmd is not offered the internal domain. When those attempts are exhausted, slurmd is directed to the external nameserver/domain (which will not give the information necessary). The difference is that the external domain is explicitly given in the named log file, whereas the internal domain is not.

-Disabling IPv6 in /etc/named.conf removes the error messages in the named log file, but the above slurmd error persists.

Possibly, my approach to solving the DNS/SRV problem may be too primitive.

Best regards,

Lars

*Från:*slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *För *Sean Crosby
*Skickat:* den 24 augusti 2020 13:44
*Till:* Slurm User Community List <slurm-users@lists.schedmd.com>
*Ämne:* Re: [slurm-users] [EXT] Slurmd problem on client

Make sure slurmd on the client is stopped, and then run it in verbose mode in the foreground

e.g.

/usr/local/slurm/latest/sbin/slurmd -D -vvvvv

Then post the output

--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
Research Computing Services | Business Services
The University of Melbourne, Victoria 3010 Australia

On Mon, 24 Aug 2020 at 21:11, Lars Kloo <la...@kth.se <mailto:la...@kth.se>> wrote:

    *UoM notice: *External email. Be cautious of links, attachments,
    or impersonation attempts

    ------------------------------------------------------------------------

    Thanks Sean,

    Yes, the regular slurm commands work from the client.

    The firewalld daemon have been stopped/disabled, and iptables are
    set to let everything through, on both the master and the client.
    I should have mentioned that in the list of prerequisites in my
    initial e-mail.

    Best regards,

    Lars

    *Från:*slurm-users [mailto:slurm-users-boun...@lists.schedmd.com
    <mailto:slurm-users-boun...@lists.schedmd.com>] *För *Sean Crosby
    *Skickat:* den 24 augusti 2020 12:45
    *Till:* Slurm User Community List <slurm-users@lists.schedmd.com
    <mailto:slurm-users@lists.schedmd.com>>
    *Ämne:* Re: [slurm-users] [EXT] Slurmd problem on client

    Hi Lars,

    Do the regular slurm commands work from the client?

    e.g.

    squeue

    scontrol show part

    If they don't, it would be a sign of communication problems.

    Is there a software firewall running on the master/client?

    Sean

    --
    Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
    Research Computing Services | Business Services
    The University of Melbourne, Victoria 3010 Australia

    On Mon, 24 Aug 2020 at 20:02, Lars Kloo <la...@kth.se
    <mailto:la...@kth.se>> wrote:

        *UoM notice: *External email. Be cautious of links,
        attachments, or impersonation attempts

        ------------------------------------------------------------------------

        Hello,

        I have a client slurmd problem, that I cannot really figure
        out how to solve. I would be grateful for any suggestions on
        how to move forward.

        The master computer on a small local calculational cluster is
        getting quite old, and therefore I am currently in the process
        of exchanging it. I also use one calculational node for the
        basic master-client set-up of all programs, including slurm.
        Some basic data: CentOS 7.7, slurm 20.02.4.

        Setting up the systemctld on the master node is (seemingly)
        straightforward. Getting slurmd to work on the client appears
        more complicated. I get the following error message
        (journalctl –xe) when starting slurmd on the client:

        Aug 24 11:01:34 cpu3.calc.cluster slurmd[9002]: error:
        _fetch_child: failed to fetch remote configs

        No useful error messages are obtained from ‘systemctl –l
        status slurmd.service’ on the client, slurmd.log on the
        client, nor slurmctld.log on the master.

        In this context, the following should be noted:

        -root and test user exist on the master and client; same uid
        and gid on both machines

        -ping works in both directions (master <-> client)

        -passphrase-free ssh login work in both directions for both
        root and for a test user

        -munged is running and with the same key on both machines

        -the same slurm.conf is read from the master and from the client

        -named (bind) has been set up on the master, and nslookup and
        dig work properly on the client

        -the ‘forward’ zone file of named on the master (DNS) contains
        the recommended SRV record directing slurmctld requests to
        port 6817 on the master (syntax seems ok, i.e. no error messages)

        I have also tried to start slurmd in a config-less mode
        (slurm.conf edited on the master) with the suggested
        environment variable set (slurmd on the client). Then, slurmd
        starts without error messages, but slurmctld on the master
        cannot communicate with slurmd on the client.

        Has anyone encountered a similar problem --- and how did you
        solve it? Or, do you have any suggestions where to start looking?

        Many thanks for input, and best regards,

        Lars

        //////////////////////////////~~~_/)~~~\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

        Lars Kloo, Prof.

            Tillämpad fysikalisk kemi        Applied Physical Chemistry

            Institutionen för kemi           Dept. of Chemistry

            Kungliga Tekniska högskolan      Royal Inst. of Technology
        (KTH)

        100 44  STOCKHOLM                SE-100 44 Stockholm

        SWEDEN

        Tel: 08-790 9343                 Tel: +46-8-790 9343

        Fax: 08-790 9349                 Fax: +46-8-790 9349

        E-post: lak...@kth.se <mailto:lak...@kth.se>E-mail:
        lak...@kth.se <mailto:lak...@kth.se>

        WWW: http://www.kth.se/che/divisions/tfk

        \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\~~~_/)~~~//////////////////////////////

Reply via email to