Dear Jeff, Ralf and Manuel
There are some good news,
I added -pthread to both the compilation and link for running
az_tutorial_with_MPI.f, and I also compiled aztec with -pthread
Now the code runs O.K for np=1,2.
Now bad news: when I try running with 3,4 or more processors I get a
similar error message:
mpirun -np 3 sample
[cluster:25805] *** Process received signal ***
[cluster:25805] Signal: Segmentation fault (11)
[cluster:25805] Signal code: (128)
[cluster:25805] Failing at address: (nil)
[cluster:25805] [ 0] /lib/libpthread.so.0 [0x7fbe20cb5a80]
[cluster:25805] [ 1] /shared/lib/libmpi.so.0 [0x7fbe221325f7]
[cluster:25805] [ 2] /shared/lib/libmpi.so.0(PMPI_Wait+0x38)
[0x7fbe22160a48]
[cluster:25805] [ 3] sample(md_wrap_wait+0x17) [0x41ccba]
[cluster:25805] [ 4] sample(AZ_find_procs_for_externs+0x5bf) [0x4177e7]
[cluster:25805] [ 5] sample(AZ_transform+0x1c3) [0x418372]
[cluster:25805] [ 6] sample(az_transform_+0x84) [0x407943]
[cluster:25805] [ 7] sample(MAIN__+0x19a) [0x407708]
[cluster:25805] [ 8] sample(main+0x2c) [0x44e00c]
[cluster:25805] [ 9] /lib/libc.so.6(__libc_start_main+0xe6)
[0x7fbe209721a6]
[cluster:25805] [10] sample [0x4073b9]
[cluster:25805] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 25805 on node cluster exited
on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
When I try running on 4 4pcessors I get a double message (from 2
processors).
mpirun -np 4 sample
[cluster:25946] *** Process received signal ***
[cluster:25946] Signal: Segmentation fault (11)
[cluster:25946] Signal code: (128)
[cluster:25946] Failing at address: (nil)
[cluster:25947] *** Process received signal ***
[cluster:25947] Signal: Segmentation fault (11)
[cluster:25947] Signal code: (128)
[cluster:25947] Failing at address: (nil)
[cluster:25946] [ 0] /lib/libpthread.so.0 [0x7f4ae4c6ba80]
[cluster:25946] [ 1] /shared/lib/libmpi.so.0 [0x7f4ae60e85f7]
[cluster:25946] [ 2] /shared/lib/libmpi.so.0(PMPI_Wait+0x38)
[0x7f4ae6116a48]
[cluster:25946] [ 3] sample(md_wrap_wait+0x17) [0x41ccba]
[cluster:25946] [ 4] sample(AZ_find_procs_for_externs+0x5bf) [0x4177e7]
[cluster:25947] [ 0] /lib/libpthread.so.0 [0x7f7dc5350a80]
[cluster:25946] [ 5] sample(AZ_transform+0x1c3) [0x418372]
[cluster:25946] [ 6] sample(az_transform_+0x84) [0x407943]
[cluster:25946] [ 7] sample(MAIN__+0x19a) [0x407708]
[cluster:25946] [ 8] sample(main+0x2c) [0x44e00c]
[cluster:25946] [ 9] /lib/libc.so.6(__libc_start_main+0xe6)
[0x7f4ae49281a6]
[cluster:25946] [10] sample [0x4073b9]
[cluster:25946] *** End of error message ***
[cluster:25947] [ 1] /shared/lib/libmpi.so.0 [0x7f7dc67cd5f7]
[cluster:25947] [ 2] /shared/lib/libmpi.so.0(PMPI_Wait+0x38)
[0x7f7dc67fba48]
[cluster:25947] [ 3] sample(md_wrap_wait+0x17) [0x41ccba]
[cluster:25947] [ 4] sample(AZ_find_procs_for_externs+0x5bf) [0x4177e7]
[cluster:25947] [ 5] sample(AZ_transform+0x1c3) [0x418372]
[cluster:25947] [ 6] sample(az_transform_+0x84) [0x407943]
[cluster:25947] [ 7] sample(MAIN__+0x19a) [0x407708]
[cluster:25947] [ 8] sample(main+0x2c) [0x44e00c]
[cluster:25947] [ 9] /lib/libc.so.6(__libc_start_main+0xe6)
[0x7f7dc500d1a6]
[cluster:25947] [10] sample [0x4073b9]
[cluster:25947] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 25946 on node cluster exited
on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Attached is the file found in AZTEC named: md_wrap_mpi_c.c
This might give you some further hint.
Rachel
Dr. Rachel Gordon
Senior Research Fellow Phone: +972-4-8293811
Dept. of Aerospace Eng. Fax: +972 - 4 - 8292030
The Technion, Haifa 32000, Israel email: rgor...@tx.technion.ac.il
On Thu, 2 Sep 2010, Ralf Wildenhues wrote:
Hello Rachel, Jeff,
* Rachel Gordon wrote on Thu, Sep 02, 2010 at 01:35:37PM CEST:
The cluster I am trying to run on has only the openmpi MPI version.
So, mpif77 is equivalent to mpif77.openmpi and mpicc is equivalent
to mpicc.openmpi
I changed the Makefile, replacing gfortran by mpif77 and gcc by mpicc.
The compilation and linkage stage ran with no problem:
mpif77 -O -I../lib -DMAX_MEM_SIZE=16731136 -DCOMM_BUFF_SIZE=200000
-DMAX_CHUNK_SIZE=200000 -c -o az_tutorial_with_MPI.o
az_tutorial_with_MPI.f
mpif77 az_tutorial_with_MPI.o -O -L../lib -laztec -o sample
Can you retry but this time add -pthread to both compile and link
command?
There were other reports on the OpenMPI devel list that some pthread
flags have gone missing somewhere. It might well be that that caused
its libraries to already be built wrongly, or just the application,
I'm not sure. But the segfault inside libpthread is suspicious.
Thanks,
Ralf
But again when I try to run 'sample' I get:
mpirun -np 1 sample
[cluster:24989] *** Process received signal ***
[cluster:24989] Signal: Segmentation fault (11)
[cluster:24989] Signal code: Address not mapped (1)
[cluster:24989] Failing at address: 0x100000098
[cluster:24989] [ 0] /lib/libpthread.so.0 [0x7f5058036a80]
[cluster:24989] [ 1] /shared/lib/libmpi.so.0(MPI_Comm_size+0x6e)
[0x7f50594ce34e]
[cluster:24989] [ 2] sample(parallel_info+0x24) [0x41d2ba]
[cluster:24989] [ 3] sample(AZ_set_proc_config+0x2d) [0x408417]
[cluster:24989] [ 4] sample(az_set_proc_config_+0xc) [0x407b85]
[cluster:24989] [ 5] sample(MAIN__+0x54) [0x407662]
[cluster:24989] [ 6] sample(main+0x2c) [0x44e8ec]
[cluster:24989] [ 7] /lib/libc.so.6(__libc_start_main+0xe6)
[0x7f5057cf31a6]
[cluster:24989] [ 8] sample [0x407459]
[cluster:24989] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 24989 on node cluster
exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
/*====================================================================
* ------------------------
* | CVS File Information |
* ------------------------
*
* $RCSfile: md_wrap_mpi_c.c,v $
*
* $Author: tuminaro $
*
* $Date: 1998/12/21 19:36:24 $
*
* $Revision: 5.3 $
*
* $Name: $
*====================================================================*/
#ifndef lint
static char *cvs_wrapmpi_id =
"$Id: md_wrap_mpi_c.c,v 5.3 1998/12/21 19:36:24 tuminaro Exp $";
#endif
/*******************************************************************************
* Copyright 1995, Sandia Corporation. The United States Government retains a *
* nonexclusive license in this software as prescribed in AL 88-1 and AL 91-7. *
* Export of this program may require a license from the United States *
* Government. *
******************************************************************************/
#include <stdlib.h>
#include <stdio.h>
#include <mpi.h>
int gl_rbuf = 3;
int gl_sbuf = 3;
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
int the_proc_name = -1;
void get_parallel_info(int *proc, int *nprocs, int *dim)
{
/* local variables */
int i;
MPI_Comm_size(MPI_COMM_WORLD, nprocs);
MPI_Comm_rank(MPI_COMM_WORLD, proc);
*dim = 0;
the_proc_name = *proc;
} /* get_parallel_info */
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
int md_read(char *buf, int bytes, int *source, int *type, int *flag)
{
int err, buffer = 1;
MPI_Status status;
if (*type == -1) *type = MPI_ANY_TAG;
if (*source == -1) *source = MPI_ANY_SOURCE;
if (bytes == 0) {
err = MPI_Recv(&gl_rbuf, 1, MPI_BYTE, *source, *type, MPI_COMM_WORLD,
&status);
}
else {
err = MPI_Recv(buf, bytes, MPI_BYTE, *source, *type, MPI_COMM_WORLD,
&status);
}
if (err != 0) (void) fprintf(stderr, "MPI_Recv error = %d\n", err);
MPI_Get_count(&status,MPI_BYTE,&buffer);
*source = status.MPI_SOURCE;
*type = status.MPI_TAG;
if (bytes != 0) bytes = buffer;
return bytes;
} /* md_read */
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
int md_write(char *buf, int bytes, int dest, int type, int *flag)
{
int err;
if (bytes == 0) {
err = MPI_Send(&gl_sbuf, 1, MPI_BYTE, dest, type, MPI_COMM_WORLD);
}
else {
err = MPI_Send(buf, bytes, MPI_BYTE, dest, type, MPI_COMM_WORLD);
}
if (err != 0) (void) fprintf(stderr, "MPI_Send error = %d\n", err);
return 0;
} /* md_write */
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
int md_wrap_iread(void *buf, int bytes, int *source, int *type,
MPI_Request *request)
/*******************************************************************************
Machine dependent wrapped message-reading communication routine for MPI.
Author: Scott A. Hutchinson, SNL, 9221
=======
Return code: int
============
Parameter list:
===============
buf: Beginning address of data to be sent.
bytes: Length of message in bytes.
source: Source processor number.
type: Message type
*******************************************************************************/
{
int err = 0;
if (*type == -1) *type = MPI_ANY_TAG;
if (*source == -1) *source = MPI_ANY_SOURCE;
if (bytes == 0) {
err = MPI_Irecv(&gl_rbuf, 1, MPI_BYTE, *source, *type, MPI_COMM_WORLD,
request);
}
else {
err = MPI_Irecv(buf, bytes, MPI_BYTE, *source, *type, MPI_COMM_WORLD,
request);
}
return err;
} /* md_wrap_iread */
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
int md_wrap_write(void *buf, int bytes, int dest, int type, int *flag)
/*******************************************************************************
Machine dependent wrapped message-sending communication routine for MPI.
Author: Scott A. Hutchinson, SNL, 9221
=======
Return code: int
============
Parameter list:
===============
buf: Beginning address of data to be sent.
bytes: Length of message in bytes.
dest: Destination processor number.
type: Message type
flag:
*******************************************************************************/
{
int err = 0;
if (bytes == 0) {
err = MPI_Send(&gl_sbuf, 1, MPI_BYTE, dest, type, MPI_COMM_WORLD);
}
else {
err = MPI_Send(buf, bytes, MPI_BYTE, dest, type, MPI_COMM_WORLD);
}
return err;
} /* md_wrap_write */
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
int md_wrap_wait(void *buf, int bytes, int *source, int *type, int *flag,
MPI_Request *request)
/*******************************************************************************
Machine dependent wrapped message-wait communication routine for MPI.
Author: Scott A. Hutchinson, SNL, 9221
=======
Return code: int
============
Parameter list:
===============
buf: Beginning address of data to be sent.
bytes: Length of message in bytes.
dest: Destination processor number.
type: Message type
flag:
*******************************************************************************/
{
int err, count;
MPI_Status status;
if ( MPI_Wait(request, &status) ) {
(void) fprintf(stderr, "MPI_Wait error\n");
exit(-1);
}
MPI_Get_count(&status, MPI_BYTE, &count);
*source = status.MPI_SOURCE;
*type = status.MPI_TAG;
/* return the count, which is in bytes */
return count;
} /* md_wrap_wait */
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
int md_wrap_iwrite(void *buf, int bytes, int dest, int type, int *flag,
MPI_Request *request)
/*******************************************************************************
Machine dependent wrapped message-sending (nonblocking) communication
routine for MPI.
Author: Scott A. Hutchinson, SNL, 9221
=======
Return code: int
============
Parameter list:
===============
buf: Beginning address of data to be sent.
bytes: Length of message in bytes.
dest: Destination processor number.
type: Message type
flag:
*******************************************************************************/
{
int err = 0;
if (bytes == 0) {
err = MPI_Isend(&gl_sbuf, 1, MPI_BYTE, dest, type, MPI_COMM_WORLD,
request);
}
else {
err = MPI_Isend(buf, bytes, MPI_BYTE, dest, type, MPI_COMM_WORLD,
request);
}
return err;
} /* md_wrap_write */
/********************************************************************/
/* NEW WRAPPERS to handle MPI Communicators */
/********************************************************************/
void parallel_info(int *proc,int *nprocs,int *dim, MPI_Comm comm)
{
/* local variables */
int i;
MPI_Comm_size(comm, nprocs);
MPI_Comm_rank(comm, proc);
*dim = 0;
the_proc_name = *proc;
} /* get_parallel_info */
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
int md_mpi_iread(void *buf, int bytes, int *source, int *type,
MPI_Request *request, int *icomm)
/*******************************************************************************
Machine dependent wrapped message-reading communication routine for MPI.
Author: Scott A. Hutchinson, SNL, 9221
=======
Return code: int
============
Parameter list:
===============
buf: Beginning address of data to be sent.
bytes: Length of message in bytes.
source: Source processor number.
type: Message type
icomm: MPI Communicator
*******************************************************************************/
{
int err = 0;
MPI_Comm *comm;
comm = (MPI_Comm *) icomm;
if (*type == -1) *type = MPI_ANY_TAG;
if (*source == -1) *source = MPI_ANY_SOURCE;
if (bytes == 0) {
err = MPI_Irecv(&gl_rbuf, 1, MPI_BYTE, *source, *type, *comm,
request);
}
else {
err = MPI_Irecv(buf, bytes, MPI_BYTE, *source, *type, *comm,
request);
}
return err;
} /* md_mpi_iread */
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
int md_mpi_write(void *buf, int bytes, int dest, int type, int *flag,
int *icomm)
/*******************************************************************************
Machine dependent wrapped message-sending communication routine for MPI.
Author: Scott A. Hutchinson, SNL, 9221
=======
Return code: int
============
Parameter list:
===============
buf: Beginning address of data to be sent.
bytes: Length of message in bytes.
dest: Destination processor number.
type: Message type
flag:
icomm: MPI Communicator
*******************************************************************************/
{
int err = 0;
MPI_Comm *comm;
comm = (MPI_Comm *) icomm;
if (bytes == 0) {
err = MPI_Send(&gl_sbuf, 1, MPI_BYTE, dest, type, *comm);
}
else {
err = MPI_Send(buf, bytes, MPI_BYTE, dest, type, *comm);
}
return err;
} /* md_wrap_write */
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
int md_mpi_wait(void *buf, int bytes, int *source, int *type, int *flag,
MPI_Request *request, int *icomm)
/*******************************************************************************
Machine dependent wrapped message-wait communication routine for MPI.
Author: Scott A. Hutchinson, SNL, 9221
=======
Return code: int
============
Parameter list:
===============
buf: Beginning address of data to be sent.
bytes: Length of message in bytes.
dest: Destination processor number.
type: Message type
flag:
icomm: MPI Communicator
*******************************************************************************/
{
int err, count;
MPI_Status status;
if ( MPI_Wait(request, &status) ) {
(void) fprintf(stderr, "MPI_Wait error\n");
exit(-1);
}
MPI_Get_count(&status, MPI_BYTE, &count);
*source = status.MPI_SOURCE;
*type = status.MPI_TAG;
/* return the count, which is in bytes */
return count;
} /* md_mpi_wait */
/******************************************************************************/
/******************************************************************************/
/******************************************************************************/
int md_mpi_iwrite(void *buf, int bytes, int dest, int type, int *flag,
MPI_Request *request, int *icomm)
/*******************************************************************************
Machine dependent wrapped message-sending (nonblocking) communication
routine for MPI.
Author: Scott A. Hutchinson, SNL, 9221
=======
Return code: int
============
Parameter list:
===============
buf: Beginning address of data to be sent.
bytes: Length of message in bytes.
dest: Destination processor number.
type: Message type
flag:
icomm: MPI Communicator
*******************************************************************************/
{
int err = 0;
MPI_Comm *comm;
comm = (MPI_Comm *) icomm ;
if (bytes == 0)
err = MPI_Isend(&gl_sbuf, 1, MPI_BYTE, dest, type, *comm, request);
else
err = MPI_Isend(buf, bytes, MPI_BYTE, dest, type, *comm, request);
return err;
} /* md_mpi_write */