On 2022-02-22 19:49, Alastair McKinstry wrote:
I can disable pmix support in mpich.

Thanks Alastair.  My tests are now passing with mpich 4.0-3

Pmix is working fine in openmpi, so it’s an mpich/pmix issue of some
sort (or maybe ch4)

"Working fine" might not be quite the word for it. There's some strange stuff going on with openmpi, at last in multinode execution (RMA). https://github.com/open-mpi/ompi/issues/10026

That's actually the reason why I was rebuilding with mpich. nwchem is completely useless with openmpi in a multi-node job. (well, they hope the situation will be better with openmpi 5).

Drew



Regards
Alastair

On 22/02/2022, 14:57, "debian-science-maintainers on behalf of Drew
Parsons"
<debian-science-maintainers-bounces+mckinstry=debian....@alioth-lists.debian.net
on behalf of dpars...@debian.org> wrote:

    Package: mpich
    Followup-For: Bug #1004556

My guess is that this bug is ongoing in 4.0-2 because of pmix support.

4.0-2 is still configured --with-pmix=/usr/lib/x86_64-linux-gnu/pmix2
    and still Depends: libpmix2 (>= 4.1.2)

    pmix support was added in 4.0~b1-2 along with ucx.

I gather ucx was deactivated in 4.0-2 but pmix was not. Looks like pmix also needs to go (unless the problem is ch4, which was also added in
    4.0~b1-2). But the error message references PMIX.


    Ironically, I find that an executable compiled against mpich 4.0-1
fails with mpiexec.mpich (as raised in this bug) but actually passes
    when run with mpiexec.openmpi.  Awkward.

    --
    debian-science-maintainers mailing list
    debian-science-maintain...@alioth-lists.debian.net

https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/debian-science-maintainers

Reply via email to