Ok I think I made some headway but I would welcome some insight from
somebody more knowledgeable

I think the problem is a potential mixup of the internal vs the external
pmix library in openmpi.

In my setup the call to
` rc = PMIx_Get(&p, key, pinfo, sz, &pval);
`
at ext3x_client.c:656 fills the pval buffer with 3 entries.
```
(gdb) p ((pmix_info_t*) pval->data.darray->array)[1].value.type
$2 = 56
```
The second value (index  == 1) has value.type = 56 (pmix.topo2) which is
outside the range of supported value types. I think the last entry is
PMIX_REGEX 46 at
./debian/build-gfortran/opal/mca/pmix/pmix3x/pmix/include/pmix_common.h

However in /usr/lib/x86_64-linux-gnu/pmix2/include/pmix_common.h the list
goes further ending in PMIX_COMPRESSED_BYTE_OBJECT 59 with 56 being
PMIX_TOPO

If I hack a bit

bill@odin:~/src/openmpi-4.1.0$ diff -u
./debian/build-gfortran/opal/mca/pmix/ext3x/ext3x.c~
./debian/build-gfortran/opal/mca/pmix/ext3x/ext3x.c
--- ./debian/build-gfortran/opal/mca/pmix/ext3x/ext3x.c~        2021-05-07
11:21:38.000000000 +0300
+++ ./debian/build-gfortran/opal/mca/pmix/ext3x/ext3x.c 2021-05-07
16:51:48.223653488 +0300
@@ -1239,6 +1239,8 @@
         }
         kv->data.envar.separator = v->data.envar.separator;
         break;
+    case 56:
+        break;
     default:
         /* silence warnings */
         rc = OPAL_ERROR;

I can get further with the following error.

bill@odin:~/src/openmpi-4.1.0$ mpirun.openmpi -v -v -v --host thor ls
[odin:442452] PACK-OPAL-VALUE: UNSUPPORTED TYPE 0 FOR KEY pmix.topo2
[odin:442452] [[61465,0],0] ORTE_ERROR_LOG: Error in file
../../../../../orte/mca/odls/base/odls_base_default_fns.c at line 250
[odin:442452] [[61465,0],0] ORTE_ERROR_LOG: Error in file
../../../../../orte/mca/plm/base/plm_base_launch_support.c at line 552
--------------------------------------------------------------------------
An internal error has occurred in ORTE:

[[61465,0],0] FORCE-TERMINATE AT (null):1 - error
../../../../../orte/mca/plm/base/plm_base_launch_support.c(553)

This is something that should be reported to the developers.
--------------------------------------------------------------------------

which is a more reasonable error anyway.

-- 
Vassilis Virvilis

Reply via email to