Yes I tried it but whit the same result 
openmpi@4.0.3 -cuda +cxx_exceptions fabrics=ucx  -java -legacylaunchers 
-memchecker +pmi schedulers=slurm  -sqlite3 -thread_multiple +vt

You can compile wrf , when you sbatch your job it is running but it doesn´t do 
anything and we get the same, with  WCHAN=hrtime
            0 S  4556  87383  87361  0  80   0 - 126676 hrtime ?       00:05:25 
real.exe

    ------------------------------

    Message: 2
    Date: Mon, 1 Jun 2020 16:56:05 +0000
    From: "Pritchard Jr., Howard" <howa...@lanl.gov>
    To: Slurm User Community List <slurm-users@lists.schedmd.com>
    Subject: Re: [slurm-users] [EXTERNAL]  problems with OpenMPI 4.0.3
    Message-ID: <20dc51ae-9f58-4b1c-b619-1a2077d5c...@lanl.gov>
    Content-Type: text/plain; charset="utf-8"

    HI Angelines,

    Could you try reinstalling with fabric=ucx and rerunning?  
    UCX is the preferred way to use Infiniband in the Open MPI 4.0.x release 
stream.

    Howard

    ?On 6/1/20, 10:29 AM, "slurm-users on behalf of Alberto Morillas, 
Angelines" <slurm-users-boun...@lists.schedmd.com on behalf of 
angelines.albe...@ciemat.es> wrote:

        Hello     Howard

        I installed it with spack: 
        openmpi@4.0.3 -cuda +cxx_exceptions fabrics=verbs -java 
-legacylaunchers -memchecker  +pmi schedulers=slurm -sqlite3 -thread_multiple 
+vt
        where - --> not enable
                    + --> enable

        Thanks in advance.
        ________________________________________________

        Angelines Alberto Morillas

        Unidad de Arquitectura Inform?tica
        Despacho: 22.1.32
        Telf.: +34 91 346 6119
        Fax:   +34 91 346 6537

        skype: angelines.alberto

        CIEMAT
        Avenida Complutense, 40
        28040 MADRID
        ________________________________________________ 




            ------------------------------

            Message: 2
            Date: Mon, 1 Jun 2020 16:13:11 +0000
            From: "Pritchard Jr., Howard" <howa...@lanl.gov>
            To: Slurm User Community List <slurm-users@lists.schedmd.com>
            Subject: Re: [slurm-users] [EXTERNAL]  problems with OpenMPI 4.0.3
            Message-ID: <ca7fe91c-8104-476f-b9a2-528d23ed3...@lanl.gov>
            Content-Type: text/plain; charset="utf-8"

            Hello Angelines,

            Do you know how the Open MPI 4.0.3 package was configured and 
built?   That information would be useful to help diagnose the problem.

            Thanks,

            Howard


            From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf 
of "Alberto Morillas, Angelines" <angelines.albe...@ciemat.es>
            Reply-To: Slurm User Community List <slurm-users@lists.schedmd.com>
            Date: Friday, May 29, 2020 at 4:25 AM
            To: "slurm-users@lists.schedmd.com" <slurm-users@lists.schedmd.com>
            Subject: [EXTERNAL] [slurm-users] problems with OpenMPI 4.0.3

            Good morning,

            We have a cluster with two kind of infiniband cards, one connectx-4 
and the other connectx-6.
            Openmpi-3.1.3 works fine, but when we start with connectx-6 we 
started to use openmpi-4.0.3 (that support connectx-6) and the programs that 
have several parts, first a call to a secuencial program and inside it a call 
to a parallel program, ? (in our case the program is WRF, but we have others 
like this with the same problem),  this kind of programs suddenly stop,

            ?..
            0 S  4556  87383  87361  0  80   0 - 126676 hrtime ?       00:05:25 
real.exe
            0 S  4556  87384  87361  0  80   0 - 126677 hrtime ?       00:05:33 
real.exe
            0 S  4556  87385  87361  0  80   0 - 126675 hrtime ?       00:05:28 
real.exe
            ??
            The WCHAN=hrtime, and it looks that it is running, but really it 
doesn?t work

            We don?t know if it could be  problem with slurm and this version 
of openmpi? Any idea?


            ________________________________________________

            Angelines Alberto Morillas

            Unidad de Arquitectura Inform?tica
            Despacho: 22.1.32
            Telf.: +34 91 346 6119
            Fax:   +34 91 346 6537

            skype: angelines.alberto

            CIEMAT
            Avenida Complutense, 40
            28040 MADRID
            ________________________________________________


            -------------- next part --------------
            An HTML attachment was scrubbed...
            URL: 
<http://lists.schedmd.com/pipermail/slurm-users/attachments/20200601/e0e1cbee/attachment-0001.htm>

            ------------------------------

            Message: 3
            Date: Mon, 1 Jun 2020 16:16:00 +0000
            From: Songpon Srisawai <songpons_...@vistec.ac.th>
            To: Slurm User Community List <slurm-users@lists.schedmd.com>
            Subject: Re: [slurm-users] Slurm Job Count Credit system
            Message-ID: <9666f3be-d648-4ee9-9ad2-80df973f87cc@Spark>
            Content-Type: text/plain; charset="utf-8"

            Greatly appreciated for your help. I will try to implement 
following your suggestion.
            On 1 Jun 2020 22:23 +0700, Renfro, Michael <ren...@tntech.edu>, 
wrote:
            Even without the slurm-bank system, you can enforce a limit on 
resources with a QOS applied to those users. Something like:

            =====

            sacctmgr add qos bank1 flags=NoDecay,DenyOnLimit
            sacctmgr modify qos bank1 set grptresmins=cpu=1000

            sacctmgr add account bank1
            sacctmgr modify account name=bank1 set qos+=bank1

            sacctmgr add user someuser account=bank1
            sacctmgr modify user someuser set qos+=bank1

            =====

            You can do lots with a QOS, including limiting the number of 
simultaneous running jobs, simultaneous running/queued jobs, etc. 
Unfortunately, the NoDecay flag is only documented to work on GrpTRESMins, 
GrpWall, and UsageRaw, not on the job count.

            So if you can live with limiting the number of simultaneous jobs 
instead of a total number of jobs per time period, that?s possible with QOS. 
Otherwise, maybe someone else will have an idea.

            --
            Mike Renfro, PhD / HPC Systems Administrator, Information 
Technology Services
            931 372-3601 / Tennessee Tech University

            On May 31, 2020, at 11:35 AM, Songpon Srisawai 
<songpons_...@vistec.ac.th> wrote:

            Hello all,

            I?m Slurm beginner who try to implement our cluster. I would like 
to know whether there are any Slurm credit/token system plugin such as the 
number of job count.

            I found Slurm-bank that deposit hour to an account. But, I would 
like to deposit the jobs token instead of hours.

            Thanks for any recommendation
            Songpon

            -------------- next part --------------
            An HTML attachment was scrubbed...
            URL: 
<http://lists.schedmd.com/pipermail/slurm-users/attachments/20200601/76ebd6f5/attachment.htm>

            End of slurm-users Digest, Vol 32, Issue 2
            ******************************************




    End of slurm-users Digest, Vol 32, Issue 3
    ******************************************

Reply via email to