On 2020-04-06 09:56, Drew Parsons wrote:
On 2020-04-06 01:48, Gilles Filippini wrote:
Drew Parsons a écrit le 05/04/2020 à 18:57 :
Another option is to create an environment variable to force h5py to
load the mpi version even when run in a serial environment without
mpirun. Easy enough to set up, though I'm interested to see if
"mpirun
-n 1 dh_auto_build" or a variation of that is viable. Maybe
%:
mpirun -n 1 dh $@ --with python3 --buildsystem=pybuild
This, way the test cases run against python3.7 is OK, but it fails
against python3.8 with:
I: pybuild base:217: cd
/build/bitshuffle-z2ZvpN/bitshuffle-0.3.5/.pybuild/cpython3_3.8_bitshuffle/build;
python3.8 -m unittest discover -v
[pinibrem15:43725] OPAL ERROR: Unreachable in file ext3x_client.c at
line 112
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now
abort,
*** and potentially your MPI job)
[pinibrem15:43725] Local abort before MPI_INIT completed completed
successfully, but am not able to aggregate error messages, and not
able
to guarantee that all other processes were killed!
E: pybuild pybuild:352: test: plugin distutils failed with: exit
code=1:
cd
/build/bitshuffle-z2ZvpN/bitshuffle-0.3.5/.pybuild/cpython3_3.8_bitshuffle/build;
python3.8 -m unittest discover -v
dh_auto_test: error: pybuild --test -i python{version} -p "3.7 3.8"
returned exit code 13
But the HDF5 error is no more present with python3.7. So it seems a
good
point.
Strange again. I would have expected the same behaviour in python3.8
and python3.7, whether successful or unsuccessful.
Putting dh into mpirun seems to be interfering with process spawning.
Once MPI is initialised (for the python3.7 test) it's not reinitialised
for the python3.8 and so it's in a bad state for the test. Something
like that.
It's only in the tests where h5py is invoked that we get the problems.
This variant works, applying mpirun separately for each test run:
override_dh_auto_test:
set -e; \
for py in `py3versions -s -v`; do \
mpirun -n 1 pybuild --test -i python{version} -p $$py; \
done
(could use mpirun -n $(NPROC) for real mpi testing).
Do we want to use this as a solution? Or would you prefer an environment
variable that h5py can check to allow mpi invocation on a serial
process?
Note that this means bitshuffle as built now is expressly tied in with
hdf5-mpi and h5py-mpi (this seems intentional by debian/rules and
debian/control, though the Build-Depends must be updated to
python3-h5py-mpi). It's a separate question whether it's desirable to
also support a hdf5-serial build of bitshuffle. Likewise we need to
think about what we want to happen when bitshuffle is invoked in a
serial process.
I think part of the confusion here is that bitshuffle (at least in the
tests) is double-handling the HDF5 library, with direct calls on the one
hand, but indirectly through h5py as well, on the other hand.