I am running openmpi-4.0.2 (self-compiled with GDS patches) on up-to-date 6.6 stable with a Go program that calls Clang MPI routines. With particular hardware (details provided if desired), readv and writev calls randomly fail with respectively "Timeout" and "Permission denied" errors for calls from one machine to another across the ethernet. The errors don't occur between cores on the same machine. The man pages for readv and writev don't document the possibility of such errors. Modifying the MPI code to retry readv and writev doesn't help.
Any ideas on what is going on here and how it could be fixed? The problem doesn't occur with Linux, but I would really rather stay with OpenBSD. Details can be provided as needed. Dave Raymond PS -- Martin Reindl has been very helpful in getting me this far. On 12/22/19, Martin Reindl <[email protected]> wrote: > It shouldn't run the GDS components with the enviroment set up like > this. But who knows with a beast like openmpi. > > I'll file a bug report with the PMIx developers, for now I've commited > the update with your input. Thanks! > > Martin > > Am 19.12.19 um 13:48 schrieb Raymond, David: >> Hello Martin, >> >> It is odd that I already had "PMIX_MCA_gds=hash" and still got the >> problem on a beowolf with multiple boxes connected by ethernet. >> >> Since the lack of appropriate "#ifdefs" seems like an oversight on the >> part of the openmpi developers, I think it would be appropriate to >> push it upstream. Are you prepared to do that? I could try, but it >> would take me a while to educate myself on this. >> >> Best, >> >> Dave >> >> On 12/19/19, Martin Reindl <[email protected]> wrote: >>> [moved to ports@] >>> >>> On Tue, Dec 17, 2019 at 04:16:25PM -0700, Raymond, David wrote: >>>> Martin, >>>> >>>> I have been using openmpi 4.0.2 on my computer system and I found a >>>> bug that is provoked by running a job (a Go program interfaced to the >>>> Clang MPI package) on multiple machines connected by ethernet. This >>>> crashes the program with the following output: >>> [...] >>>> >>>> I traced this to the fact that OpenBSD's version of pthreads doesn't >>>> have "pthread_mutexattr_setpshared". It turns out that the >>>> configuration file undefines a flag if this is so, but the actual code >>>> doesn't pay any attention to this. I fixed the problem by putting >>>> appropriate ifdefs around the code generating the error, which itself >>>> is simple error checking code. This seems to work. I have attached >>>> two patches for the 4.0.2 source. >>> >>> Hello Dave, >>> >>> Thanks for your input, I've updated the 4.0.2 diff. >>> >>> We already were aware of the problem with 4.0.1 back in June and worked >>> around the problem by setting PMIX_MCA_gds=hash before execution to >>> avoid >>> GDS/ds21 and GDS/12. >>> >>> Your diff is of course a much better way, do you want to try to push it >>> upstream? >>> >>> -m >>> >>> Index: Makefile >>> =================================================================== >>> RCS file: /cvs/ports/devel/openmpi/Makefile,v >>> retrieving revision 1.28 >>> diff -u -p -u -p -r1.28 Makefile >>> --- Makefile 28 Jun 2019 11:05:11 -0000 1.28 >>> +++ Makefile 19 Dec 2019 07:18:30 -0000 >>> @@ -2,9 +2,8 @@ >>> >>> COMMENT = open source MPI-3.1 implementation >>> >>> -V = 4.0.1 >>> +V = 4.0.2 >>> DISTNAME = openmpi-$V >>> -REVISION = 0 >>> >>> SHARED_LIBS += mca_common_dstore 0.0 # 1.0 >>> SHARED_LIBS += mca_common_monitoring 0.0 # 60.0 >>> Index: distinfo >>> =================================================================== >>> RCS file: /cvs/ports/devel/openmpi/distinfo,v >>> retrieving revision 1.4 >>> diff -u -p -u -p -r1.4 distinfo >>> --- distinfo 27 Jun 2019 13:52:00 -0000 1.4 >>> +++ distinfo 19 Dec 2019 07:18:30 -0000 >>> @@ -1,2 +1,2 @@ >>> -SHA256 (openmpi-4.0.1.tar.gz) = >>> 5V4hP+CaIUq58scirP2L97ObvBgA5LekZNON8V5wf1k= >>> -SIZE (openmpi-4.0.1.tar.gz) = 17513706 >>> +SHA256 (openmpi-4.0.2.tar.gz) = >>> ZigFhw6GoUceWXObDDTG+QBODHoi2waFYtU4jsRCGQQ= >>> +SIZE (openmpi-4.0.2.tar.gz) = 17373487 >>> Index: >>> patches/patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_ds12_gds_ds12_lock_pthread_c >>> =================================================================== >>> RCS file: >>> patches/patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_ds12_gds_ds12_lock_pthread_c >>> diff -N >>> patches/patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_ds12_gds_ds12_lock_pthread_c >>> --- /dev/null 1 Jan 1970 00:00:00 -0000 >>> +++ >>> patches/patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_ds12_gds_ds12_lock_pthread_c >>> 19 >>> Dec 2019 07:18:30 -0000 >>> @@ -0,0 +1,20 @@ >>> +$OpenBSD$ >>> + >>> +Index: >>> opal/mca/pmix/pmix3x/pmix/src/mca/gds/ds12/gds_ds12_lock_pthread.c >>> +--- >>> opal/mca/pmix/pmix3x/pmix/src/mca/gds/ds12/gds_ds12_lock_pthread.c.orig >>> ++++ opal/mca/pmix/pmix3x/pmix/src/mca/gds/ds12/gds_ds12_lock_pthread.c >>> +@@ -132,12 +132,14 @@ pmix_status_t >>> pmix_gds_ds12_lock_init(pmix_common_dsto >>> + PMIX_ERROR_LOG(rc); >>> + goto error; >>> + } >>> ++#ifdef HAVE_PTHREAD_SHARED >>> + if (0 != pthread_rwlockattr_setpshared(&attr, >>> PTHREAD_PROCESS_SHARED)) { >>> + pthread_rwlockattr_destroy(&attr); >>> + rc = PMIX_ERR_INIT; >>> + PMIX_ERROR_LOG(rc); >>> + goto error; >>> + } >>> ++#endif >>> + #ifdef HAVE_PTHREAD_SETKIND >>> + if (0 != pthread_rwlockattr_setkind_np(&attr, >>> + >>> PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP)) { >>> Index: >>> patches/patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_ds21_gds_ds21_lock_pthread_c >>> =================================================================== >>> RCS file: >>> patches/patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_ds21_gds_ds21_lock_pthread_c >>> diff -N >>> patches/patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_ds21_gds_ds21_lock_pthread_c >>> --- /dev/null 1 Jan 1970 00:00:00 -0000 >>> +++ >>> patches/patch-opal_mca_pmix_pmix3x_pmix_src_mca_gds_ds21_gds_ds21_lock_pthread_c >>> 19 >>> Dec 2019 07:18:30 -0000 >>> @@ -0,0 +1,21 @@ >>> +$OpenBSD$ >>> + >>> +Index: >>> opal/mca/pmix/pmix3x/pmix/src/mca/gds/ds21/gds_ds21_lock_pthread.c >>> +--- >>> opal/mca/pmix/pmix3x/pmix/src/mca/gds/ds21/gds_ds21_lock_pthread.c.orig >>> ++++ opal/mca/pmix/pmix3x/pmix/src/mca/gds/ds21/gds_ds21_lock_pthread.c >>> +@@ -182,12 +182,15 @@ pmix_status_t >>> pmix_gds_ds21_lock_init(pmix_common_dsto >>> + PMIX_ERROR_LOG(rc); >>> + goto error; >>> + } >>> ++ >>> ++#ifdef HAVE_PTHREAD_MUTEXATTR_SETPSHARED >>> + if (0 != pthread_mutexattr_setpshared(&attr, >>> PTHREAD_PROCESS_SHARED)) { >>> + pthread_mutexattr_destroy(&attr); >>> + rc = PMIX_ERR_INIT; >>> + PMIX_ERROR_LOG(rc); >>> + goto error; >>> + } >>> ++#endif >>> + >>> + segment_hdr_t *seg_hdr = >>> (segment_hdr_t*)lock_item->seg_desc->seg_info.seg_base_addr; >>> + seg_hdr->num_locks = local_size; >>> Index: pkg/PLIST >>> =================================================================== >>> RCS file: /cvs/ports/devel/openmpi/pkg/PLIST,v >>> retrieving revision 1.5 >>> diff -u -p -u -p -r1.5 PLIST >>> --- pkg/PLIST 27 Jun 2019 13:52:00 -0000 1.5 >>> +++ pkg/PLIST 19 Dec 2019 07:18:30 -0000 >>> @@ -143,15 +143,6 @@ lib/openmpi/mca_compress_gzip.so >>> lib/openmpi/mca_crs_none.a >>> lib/openmpi/mca_crs_none.la >>> lib/openmpi/mca_crs_none.so >>> -lib/openmpi/mca_dfs_app.a >>> -lib/openmpi/mca_dfs_app.la >>> -lib/openmpi/mca_dfs_app.so >>> -lib/openmpi/mca_dfs_orted.a >>> -lib/openmpi/mca_dfs_orted.la >>> -lib/openmpi/mca_dfs_orted.so >>> -lib/openmpi/mca_dfs_test.a >>> -lib/openmpi/mca_dfs_test.la >>> -lib/openmpi/mca_dfs_test.so >>> lib/openmpi/mca_errmgr_default_app.a >>> lib/openmpi/mca_errmgr_default_app.la >>> lib/openmpi/mca_errmgr_default_app.so >>> @@ -221,9 +212,6 @@ lib/openmpi/mca_iof_tool.so >>> lib/openmpi/mca_mpool_hugepage.a >>> lib/openmpi/mca_mpool_hugepage.la >>> lib/openmpi/mca_mpool_hugepage.so >>> -lib/openmpi/mca_notifier_syslog.a >>> -lib/openmpi/mca_notifier_syslog.la >>> -lib/openmpi/mca_notifier_syslog.so >>> lib/openmpi/mca_odls_default.a >>> lib/openmpi/mca_odls_default.la >>> lib/openmpi/mca_odls_default.so >>> @@ -288,6 +276,9 @@ lib/openmpi/mca_reachable_weighted.so >>> lib/openmpi/mca_regx_fwd.a >>> lib/openmpi/mca_regx_fwd.la >>> lib/openmpi/mca_regx_fwd.so >>> +lib/openmpi/mca_regx_naive.a >>> +lib/openmpi/mca_regx_naive.la >>> +lib/openmpi/mca_regx_naive.so >>> lib/openmpi/mca_regx_reverse.a >>> lib/openmpi/mca_regx_reverse.la >>> lib/openmpi/mca_regx_reverse.so >>> @@ -315,9 +306,6 @@ lib/openmpi/mca_rml_oob.so >>> lib/openmpi/mca_routed_binomial.a >>> lib/openmpi/mca_routed_binomial.la >>> lib/openmpi/mca_routed_binomial.so >>> -lib/openmpi/mca_routed_debruijn.a >>> -lib/openmpi/mca_routed_debruijn.la >>> -lib/openmpi/mca_routed_debruijn.so >>> lib/openmpi/mca_routed_direct.a >>> lib/openmpi/mca_routed_direct.la >>> lib/openmpi/mca_routed_direct.so >>> >> >> > > -- David J. Raymond [email protected] http://physics.nmt.edu/~raymond

