Machine details:
     Fedora Core 30:  5.6.13-100.fc30.x86_64
     gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2)
     GNU Fortran (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2)

Lapack/BLAS: Are whatever came with the machine and are in /usr/lib64.  I did not compile them myself

I'll try two things: (1) Rebuil with a different BLAS/LAPACK and (2) set a stop in ieee_handler( ) to see when and where it is getting called.

Also just for completeness here are the rest of the error messages from the run:

   Thread 1 "feap" received signal SIGFPE, Arithmetic exception.
   0x00007f0fe77e5be1 in ieeeck_ () from /lib64/liblapack.so.3


   [0]PETSC ERROR:
   ------------------------------------------------------------------------
   [0]PETSC ERROR: Caught signal number 8 FPE: Floating Point
   Exception,probably divide by zero
   [0]PETSC ERROR: Try option -start_in_debugger or
   -on_error_attach_debugger
   [0]PETSC ERROR: or see
   https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
   [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple
   Mac OS X to find memory corruption errors
   [0]PETSC ERROR: likely location of problem given in stack below
   [0]PETSC ERROR: ---------------------  Stack Frames
   ------------------------------------
   [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
   available,
   [0]PETSC ERROR:       INSTEAD the line number of the start of the
   function
   [0]PETSC ERROR:       is given.
   [0]PETSC ERROR: [0] LAPACKgesvd line 32
   /home/sg/petsc-3.13.2/src/ksp/ksp/impls/gmres/gmreig.c
   [0]PETSC ERROR: [0] KSPComputeExtremeSingularValues_GMRES line 14
   /home/sg/petsc-3.13.2/src/ksp/ksp/impls/gmres/gmreig.c
   [0]PETSC ERROR: [0] KSPComputeExtremeSingularValues line 57
   /home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c
   [0]PETSC ERROR: [0] PCGAMGOptProlongator_AGG line 1107
   /home/sg/petsc-3.13.2/src/ksp/pc/impls/gamg/agg.c
   [0]PETSC ERROR: User provided function() line 0 in  unknown file (null)

   Program received signal SIGABRT: Process abort signal.


On 6/13/20 9:04 AM, Barry Smith wrote:

   The LAPACK routine ieeeck_ intentionally does a divide by zero to check if the system can handle it without generating an exception. It doesn't have anything to do
with the particular matrix data passed to LAPACK.

    In KSPComputeExtremeSingularValues_GMRES() we have the code structure

  ierr = PetscFPTrapPush(PETSC_FP_TRAP_OFF);CHKERRQ(ierr);
#if !defined(PETSC_USE_COMPLEX)
PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("N","N",&bn,&bn,R,&bN,realpart,&sdummy,&idummy,&sdummy,&idummy,work,&lwork,&lierr));
#else
PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("N","N",&bn,&bn,R,&bN,realpart,&sdummy,&idummy,&sdummy,&idummy,work,&lwork,realpart+N,&lierr));
#endif
  if (lierr) SETERRQ1(PETSC_COMM_SELF,PETSC_ERR_LIB,"Error in SVD Lapack routine %d",(int)lierr);
  ierr = PetscFPTrapPop();CHKERRQ(ierr);

   So PETSc tries to turn off trapping of floating point exceptions before calling the LAPACK routines that eventually lead to the exception.

PetscErrorCode PetscFPTrapPush(PetscFPTrap trap)
{
  PetscErrorCode         ierr;
  struct PetscFPTrapLink *link;

  PetscFunctionBegin;
  ierr           = PetscNew(&link);CHKERRQ(ierr);
  link->trapmode = _trapmode;
  link->next     = _trapstack;
  _trapstack     = link;
  if (trap != _trapmode) {ierr = PetscSetFPTrap(trap);CHKERRQ(ierr);}
  PetscFunctionReturn(0);
}

PetscErrorCode PetscSetFPTrap(PetscFPTrap flag)
{
  char *out;

  PetscFunctionBegin;
  /* Clear accumulated exceptions.  Used to suppress meaningless messages from f77 programs */
  (void) ieee_flags("clear","exception","all",&out);
  if (flag == PETSC_FP_TRAP_ON) {
    /*
      To trap more fp exceptions, including underflow, change the line below to
      if (ieee_handler("set","all",PetscDefaultFPTrap)) {
    */
    if (ieee_handler("set","common",PetscDefaultFPTrap))  (*PetscErrorPrintf)("Can't set floatingpoint handler\n");   } else if (ieee_handler("clear","common",PetscDefaultFPTrap)) (*PetscErrorPrintf)("Can't clear floatingpoint handler\n");

  _trapmode = flag;
  PetscFunctionReturn(0);
}

  So either the ieee_handler clear is not working for your system or some other code, AFTER PETSc calls ieee_handler sets the  ieee_handler to trap divide by zero.

  A git grep -i ieee_handler  shows that the reference BLAS/LAPACK and OpenBLAS never seem to call the ieee_handler.

  We need to know what lapack/blas you are using and how they were compiled.

  Some Fortran compilers/linkers set nonstandard exception handlers, but since PETSc clears them I don't know how they could get set again

  You could try in gdb to put a break point in ieee_handler and find all the places it gets called, maybe this will lead to the location of the cause.

  Barry


On Jun 13, 2020, at 1:30 AM, Sanjay Govindjee <s...@berkeley.edu <mailto:s...@berkeley.edu>> wrote:

I have a FEA problem that I am trying to solve with GAMG.  The problem solves just fine with direct solvers (mumps, superlu) and iterative solvers (gmres, ml, hypre-boomer) etc.

However with GAMG I am getting a divide by zero that I am having trouble tracking down.  Below
is the gdb stack trace and the source lines going up the stack.

When I run in valgrind the problem runs fine (and gets the correct answer). Valgrind reports nothing of note (just lots of indirectly lost blocks  related to PMP_INIT).

I'm only running on one processor.

Any suggestions on where to start to trace the problem?

-sanjay

    #0  0x00007fb262dc5be1 in ieeeck_ () from /lib64/liblapack.so.3
    #1  0x00007fb262dc5332 in ilaenv_ () from /lib64/liblapack.so.3
    #2  0x00007fb262dbbcef in dlasq2_ () from /lib64/liblapack.so.3
    #3  0x00007fb262dbb78c in dlasq1_ () from /lib64/liblapack.so.3
    #4  0x00007fb262da1e2e in dbdsqr_ () from /lib64/liblapack.so.3
    #5  0x00007fb262960110 in dgesvd_ () from /lib64/liblapack.so.3
    #6  0x00007fb264e74b66 in KSPComputeExtremeSingularValues_GMRES
    (ksp=0x1816560, emax=0x7ffc5010e7c8, emin=0x7ffc5010e7d0) at
    /home/sg/petsc-3.13.2/src/ksp/ksp/impls/gmres/gmreig.c:32
    #7  0x00007fb264dfe69a in KSPComputeExtremeSingularValues
    (ksp=0x1816560, emax=0x7ffc5010e7c8, emin=0x7ffc5010e7d0) at
    /home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c:64
    #8  0x00007fb264b44a1f in PCGAMGOptProlongator_AGG (pc=0x12f3d30,
    Amat=0x11a2630, a_P=0x7ffc5010ebe0) at
    /home/sg/petsc-3.13.2/src/ksp/pc/impls/gamg/agg.c:1145
    #9  0x00007fb264b248a1 in PCSetUp_GAMG (pc=0x12f3d30) at
    /home/sg/petsc-3.13.2/src/ksp/pc/impls/gamg/gamg.c:557
    #10 0x00007fb264d8535b in PCSetUp (pc=0x12f3d30) at
    /home/sg/petsc-3.13.2/src/ksp/pc/interface/precon.c:898
    #11 0x00007fb264e01a93 in KSPSetUp (ksp=0x128dd80) at
    /home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c:376
    #12 0x00007fb264e057af in KSPSolve_Private (ksp=0x128dd80,
    b=0x1259f30, x=0x125d910) at
    /home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c:633
    #13 0x00007fb264e086b9 in KSPSolve (ksp=0x128dd80, b=0x1259f30,
    x=0x125d910) at
    /home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c:853
    #14 0x00007fb264e46216 in kspsolve_ (ksp=0x832670
    <__pfeapc_MOD_kspsol>, b=0x832698 <__pfeapc_MOD_rhs>, x=0x8326a0
    <__pfeapc_MOD_sol>, __ierr=0x7ffc5010f358)
        at
    /home/sg/petsc-3.13.2/src/ksp/ksp/interface/ftn-auto/itfuncf.c:266
    #15 0x000000000043298d in usolve (flags=..., b=...) at usolve.F:313
    #16 0x000000000044afba in psolve (stype=-3, b=..., fp=...,
    factor=.TRUE., solve=.TRUE., cfr=.FALSE., prnt=.TRUE.) at
    psolve.f:212
    #17 0x00000000006b7393 in pmacr1 (lct=..., ct=..., j=3,
    _lct=_lct@entry=15) at pmacr1.f:578
    #18 0x00000000005c247b in pmacr (initf=.FALSE.) at pmacr.f:578
    #19 0x000000000044ff20 in pcontr () at pcontr.f:1307
    #20 0x0000000000404d9b in feap () at feap86.f:162
    #21 main (argc=<optimized out>, argv=<optimized out>) at feap86.f:168
    #22 0x00007fb261aaef43 in __libc_start_main () from /lib64/libc.so.6
    #23 0x0000000000404dde in _start ()

    (gdb) list
    1       <built-in>: No such file or directory.
    (gdb) up
    #1  0x00007fb262dc5332 in ilaenv_ () from /lib64/liblapack.so.3
    (gdb) up
    #2  0x00007fb262dbbcef in dlasq2_ () from /lib64/liblapack.so.3
    (gdb) up
    #3  0x00007fb262dbb78c in dlasq1_ () from /lib64/liblapack.so.3
    (gdb) up
    #4  0x00007fb262da1e2e in dbdsqr_ () from /lib64/liblapack.so.3
    (gdb) up
    #5  0x00007fb262960110 in dgesvd_ () from /lib64/liblapack.so.3
    (gdb) up
    #6  0x00007fb264e74b66 in KSPComputeExtremeSingularValues_GMRES
    (ksp=0x1816560, emax=0x7ffc5010e7c8, emin=0x7ffc5010e7d0) at
    /home/sg/petsc-3.13.2/src/ksp/ksp/impls/gmres/gmreig.c:32
    32
    
PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("N","N",&bn,&bn,R,&bN,realpart,&sdummy,&idummy,&sdummy,&idummy,work,&lwork,&lierr));
    (gdb) up
    #7  0x00007fb264dfe69a in KSPComputeExtremeSingularValues
    (ksp=0x1816560, emax=0x7ffc5010e7c8, emin=0x7ffc5010e7d0) at
    /home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c:64
    64          ierr =
    (*ksp->ops->computeextremesingularvalues)(ksp,emax,emin);CHKERRQ(ierr);
    (gdb) up
    #8  0x00007fb264b44a1f in PCGAMGOptProlongator_AGG (pc=0x12f3d30,
    Amat=0x11a2630, a_P=0x7ffc5010ebe0) at
    /home/sg/petsc-3.13.2/src/ksp/pc/impls/gamg/agg.c:1145
    1145          ierr = KSPComputeExtremeSingularValues(eksp, &emax,
    &emin);CHKERRQ(ierr);
    (gdb) up
    #9  0x00007fb264b248a1 in PCSetUp_GAMG (pc=0x12f3d30) at
    /home/sg/petsc-3.13.2/src/ksp/pc/impls/gamg/gamg.c:557
    557               ierr = pc_gamg->ops->optprolongator(pc,
    Aarr[level], &Prol11);CHKERRQ(ierr);
    (gdb) up
    #10 0x00007fb264d8535b in PCSetUp (pc=0x12f3d30) at
    /home/sg/petsc-3.13.2/src/ksp/pc/interface/precon.c:898
    898         ierr = (*pc->ops->setup)(pc);CHKERRQ(ierr);
    (gdb) up
    #11 0x00007fb264e01a93 in KSPSetUp (ksp=0x128dd80) at
    /home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c:376
    376       ierr = PCSetUp(ksp->pc);CHKERRQ(ierr);
    (gdb) up
    #12 0x00007fb264e057af in KSPSolve_Private (ksp=0x128dd80,
    b=0x1259f30, x=0x125d910) at
    /home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c:633
    633       ierr = KSPSetUp(ksp);CHKERRQ(ierr);
    (gdb) up
    #13 0x00007fb264e086b9 in KSPSolve (ksp=0x128dd80, b=0x1259f30,
    x=0x125d910) at
    /home/sg/petsc-3.13.2/src/ksp/ksp/interface/itfunc.c:853
    853       ierr = KSPSolve_Private(ksp,b,x);CHKERRQ(ierr);
    (gdb) up
    #14 0x00007fb264e46216 in kspsolve_ (ksp=0x832670
    <__pfeapc_MOD_kspsol>, b=0x832698 <__pfeapc_MOD_rhs>, x=0x8326a0
    <__pfeapc_MOD_sol>, __ierr=0x7ffc5010f358)
        at
    /home/sg/petsc-3.13.2/src/ksp/ksp/interface/ftn-auto/itfuncf.c:266
    266     *__ierr = KSPSolve(
    (gdb) up
    #15 0x000000000043298d in usolve (flags=..., b=...) at usolve.F:313
    313               call KSPSolve         (kspsol, rhs, sol, ierr)





Reply via email to