Dear PETSc users,

I’ve been an happy PETSc user since version 3.3, using it both under Ubuntu 
(from 14.04 up to 20.04) and CentOS (from 5 to 8).

I use it as an optional component for a parallel Fortran code (that, BTW, also 
uses metis) and, wherever allowed, I used to install myself MPI (both MPICH and 
OpenMPI) and PETSc on top of it without any trouble ever (besides being, 
myself, as dumb as one can be in this).

I did this on top of gnu compilers and, less extensively, intel compilers, both 
on a range of different systems (from virtual machines, to workstations to 
actual clusters).

So far so good.

Today I find myself in the need of deploying my application to Windows 10 
users, which means giving them a folder with all the executables and libraries 
to make them run in it, including the mpi runtime. Unfortunately, I also have 
to rely on free tools (can’t afford Intel for the moment).

To the best of my knowledge, considering also far from optimal solutions, my 
options would then be: Virtual machines and WSL1, Cygwin, MSYS2-MinGW64, Cross 
compiling with MinGW64 from within Linux, PGI + Visual Studio + Cygwin (not 
sure about this one)

I know this is largely unsupported, but I was wondering if there is, 
nonetheless, some general (and more official) knowledge available on the 
matter. What I tried so far:


  1.  Virtual machines and WSL1: both work like a charm, just like in the 
native OS, but very far from ideal for the distribution purpose


  1.  Cygwin with gnu compilers (as opposed to using Intel and Visual Studio): 
I was unable to compile myself MPI as I am used to on Linux, so I just tried 
going all in and let PETSc do everything for me (using static linking): 
download and install MPICH, BLAS, LAPACK, METIS and HYPRE. Everything just 
worked (for now compiling and making trivial tests) and I am able to use 
everything from within a cygwin terminal (even with executables and 
dependencies outside cygwin). Still, even within cygwin, I can’t switch to use, 
say, the cygwin ompi mpirun/mpiexec for an mpi program compiled with PETSc 
mpich (things run but not as expected). Some troubles start when I try to use 
cmd.exe (which I pictured as the more natural way to launch in Windows). In 
particular, using (note that \ is in cmd.exe, / was used in cygwin terminal):

.\mpiexec.hydra.exe -np 8 .\my.exe

Nothing happens unless I push Enter a second time. Things seem to work then, 
but if I try to run a serial executable with the command above I get the 
following errors (which, instead, doesn’t happen using the cygwin terminal):

[proxy:0:0@Dell7540-Paolo] HYDU_sock_write (utils/sock/sock.c:286): write error 
(No such process)
[proxy:0:0@Dell7540-Paolo] HYD_pmcd_pmip_control_cmd_cb 
(pm/pmiserv/pmip_cb.c:935): unable to write to downstream stdin
[proxy:0:0@Dell7540-Paolo] HYDT_dmxu_poll_wait_for_event 
(tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:0@Dell7540-Paolo] main (pm/pmiserv/pmip.c:206): demux engine error 
waiting for event
[mpiexec@Dell7540-Paolo] control_cb (pm/pmiserv/pmiserv_cb.c:200): assert 
(!closed) failed
[mpiexec@Dell7540-Paolo] HYDT_dmxu_poll_wait_for_event 
(tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@Dell7540-Paolo] HYD_pmci_wait_for_completion 
(pm/pmiserv/pmiserv_pmci.c:198): error waiting for event
[mpiexec@Dell7540-Paolo] main (ui/mpich/mpiexec.c:336): process manager error 
waiting for completion

Just for the sake of completeness, I also tried using the Intel and Microsoft 
MPI redistributables, which might be more natural candidates, instead of the 
petsc compiled version of the MPI runtime (and they are MPICH derivatives, 
after all). But, running with:

mpiexec -np 1 my.exe

I get the following error with Intel:

[cli_0]: write_line error; fd=440 buf=:cmd=init pmi_version=1 pmi_subversion=1
:
system msg for write_line failure : Bad file descriptor
[cli_0]: Unable to write to PMI_fd
[cli_0]: write_line error; fd=440 buf=:cmd=get_appnum
:
system msg for write_line failure : Bad file descriptor
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(467):
MPID_Init(140).......: channel initialization failed
MPID_Init(421).......: PMI_Get_appnum returned -1
[cli_0]: aborting job:
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(467):
MPID_Init(140).......: channel initialization failed
MPID_Init(421).......: PMI_Get_appnum returned -1

And the following error with MS-MPI:

[unset]: unable to decode hostport from 44e5747b-d19e-4ea8-ac7a-ec2102cabb21
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(467):
MPID_Init(140).......: channel initialization failed
MPID_Init(403).......: PMI_Init returned -1
[unset]: aborting job:
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(467):
MPID_Init(140).......: channel initialization failed
MPID_Init(403).......: PMI_Init returned -1

independently from the number of processes, but more processes produce more 
copies of this. However, both Intel and MS-MPI are able to run a serial fortran 
executable built with cygwin. I think I made everything correctly and adding 
-localhost didn’t help (actually, it caused more problems to the interpretation 
of the cmd line arguments for mpiexec)


  1.  Cygwin with MinGW64 compilers. Never managed to compile MPI, not even 
trough PETSc.



  1.  MSYS2+MinGW64 compilers. I understood that MinGW is not well supported, 
probably because of how it handles paths, but I wanted to give it a try, 
because it should be more “native” and there seems to be relevant examples out 
there that managed to do it. I first tried with the msys2 mpi distribution, 
produced the .mod file out of the mpi.f90 file in the distribution (I tried my 
best with different hacks from known limitations of this file as also present 
in the official MS-MPI distribution) and tried with my code without petsc, but 
it failed in compiling the code with some strange MPI related error (argument 
mismatch between two unrelated MPI calls in the code, which is non sense to 
me). In contrast, simple mpi tests (hello world like) worked as expected. Then 
I decided to follow this:



https://doc.freefem.org/introduction/installation.html#compilation-on-windows



but the exact same type of error came up (MPI calls in my code were different, 
but the error was the same). Trying again from scratch (i.e., without all the 
things I did in the beginning to compile my code) the same error came up in 
compiling some of the freefem dependencies (this time not even mpi calls).



As a side note, there seems to be an official effort in porting petsc to msys2 
(https://github.com/okhlybov/MINGW-packages/tree/whpc/mingw-w64-petsc), but it 
didn’t get into the official packages yet, which I interpret as a warning



  1.  Didn’t give a try to cross compiling with MinGw from Linux, as I tought 
it couldn’t be any better than doing it from MSYS2
  2.  Didn’t try PGI as I actually didn’t know if I would then been able to 
make PETSc work.

So, here there are some questions I have with respect to where I stand now and 
the points above:


     *   I haven’t seen the MSYS2-MinGw64 toolchain mentioned at all in 
official documentation/discussions. Should I definitely abandon it (despite 
someone mentioning it as working) because of known issues?
     *   What about the PGI route? I don’t see it mentioned as well. I guess it 
would require some work on win32fe
     *   For my Cygwin-GNU route (basically what is mentioned in PFLOTRAN 
documentation), am I expected to then run from the cygwin terminal or should 
the windows prompt work as well? Is the fact that I require a second Enter hit 
and the mismanagement of serial executables the sign of something wrong with 
the Windows prompt?
     *   More generally, is there some known working, albeit non official, 
route given my constraints (free+fortran+windows+mpi+petsc)?

Thanks for your attention and your great work on PETSc

Best regards

Paolo Lampitella

Reply via email to