Ugh. Thanks Drew.
What are the contents of  /etc/openmpi/openmpi-mca-params.conf on the node?
Does a simple hello world (see Debian/tests/hello* ) work without errors in the 
environment ?

Regards
Alastair

On 15/01/2021, 08:39, "Drew Parsons" <dpars...@debian.org> wrote:

    Package: libopenmpi3
    Version: 4.1.0-5
    Followup-For: Bug #979041

    There's evidence this libfabric bug is not fully fixed.

    pytest-mpi (0.4-3) is failing tests:

      A process has executed an operation involving a call
      to the fork() system call to create a child process.

      As a result, the libfabric EFA provider is operating in
      a condition that could result in memory corruption or
      other system errors.

      For the libfabric EFA provider to work safely when fork()
      is called, you will need to set the following environment
      variable:
                RDMAV_FORK_SAFE

      However, setting this environment variable can result in
      signficant performance impact to your application due to
      increased cost of memory registration.

      You may want to check with your application vendor to see
      if an application-level alternative (of not using fork)
      exists.

      Your job will now abort.
      Fatal Python error: Aborted

      Current thread 0x00007f2978647740 (most recent call first):
        File "/usr/lib/python3.9/subprocess.py", line 1756 in _execute_child
        File "/usr/lib/python3.9/subprocess.py", line 951 in __init__
        File "/usr/lib/python3/dist-packages/_pytest/pytester.py", line 1193 in 
popen
        File "/usr/lib/python3/dist-packages/_pytest/pytester.py", line 1234 in 
run
        File "/tmp/autopkgtest.5dpwa6/build.XvQ/real-tree/tests/conftest.py", 
line 44 in runpytest_subprocess
        File "/tmp/autopkgtest.5dpwa6/build.XvQ/real-tree/tests/conftest.py", 
line 52 in runpytest
        File 
"/tmp/autopkgtest.5dpwa6/build.XvQ/real-tree/tests/test_markers.py", line 113 
in test_mpi_xfail_under_mpi
        File "/usr/lib/python3/dist-packages/_pytest/python.py", line 180 in 
pytest_pyfunc_call



    pytest-mpi is also affected by the UCX bug and possibly a pytest6 problem, 
but
    I guess they would not trigger this libfabric RDMAV_FORK_SAFE error.


    -- System Information:
    Debian Release: bullseye/sid
      APT prefers unstable
      APT policy: (500, 'unstable'), (1, 'experimental')
    Architecture: amd64 (x86_64)
    Foreign Architectures: i386

    Kernel: Linux 5.10.0-1-amd64 (SMP w/8 CPU threads)
    Kernel taint flags: TAINT_OOT_MODULE
    Locale: LANG=en_AU.UTF-8, LC_CTYPE=en_AU.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_AU:en
    Shell: /bin/sh linked to /usr/bin/dash
    Init: systemd (via /run/systemd/system)
    LSM: AppArmor: enabled

    Versions of packages libopenmpi3 depends on:
    ii  libc6                    2.31-9
    ii  libevent-core-2.1-7      2.1.12-stable-1
    ii  libevent-pthreads-2.1-7  2.1.12-stable-1
    ii  libfabric1               1.11.0-2
    ii  libgcc-s1                10.2.1-6
    ii  libhwloc-plugins         2.4.0+dfsg-3
    ii  libhwloc15               2.4.0+dfsg-3
    ii  libibverbs1              33.0-1
    ii  libnl-3-200              3.4.0-1+b1
    ii  libpmix2                 4.0.0-3
    ii  libpsm-infinipath1       3.3+20.604758e7-6.1
    ii  libpsm2-2                11.2.185-1
    ii  libstdc++6               10.2.1-6
    ii  libucx0                  1.10.0~rc1-2
    ii  zlib1g                   1:1.2.11.dfsg-2

    libopenmpi3 recommends no packages.

    libopenmpi3 suggests no packages.

    -- no debconf information

Reply via email to