On Thu, 05 Dec 2019 10:22:11 +0000
Stuart Henderson <s...@spacehopper.org> wrote:

> On 5 December 2019 01:15:09 Matthew Hull <castersupm...@verizon.net> wrote:
> 
> > I'm interested in guile2 (because I do some programming in Scheme) and
> > powerpc because I have a Mac Mini G4 with OpenBSD 6.5 installed.
> >
> > The package is marked broken for powerpc...
> >
> > Does the default make in include "-g" or "-ggdb" flags??? Would a build
> > with -O0 -ggdb be a practical debugging option??? If so, how could those
> > flags be propagated "from the top"?
> >
> make clean=all
> make DEBUG="-O0 -g" install
> 
> Gdb in base is old and doesn't work too well - use a newer one from 
> packages: pkg_add gdb and use the "egdb" command.

Hello Matt.  For some reason, I didn't receive your mails.  I did
receive Stuart's reply, and other mails sent to ports@.  This problem
is at my end: I'm using GMail.  I'm reading your mails through the
archives at MARC.

Your backtrace https://marc.info/?l=openbsd-ports&m=157566079007497&w=2
shows where Guile crashes, but doesn't provide enough information to fix
the problem.  I have a PowerBook G4, so I have reproduced the crash and
gotten more info, but still don't know the fix.  My PowerBook5,4 runs a
snapshot of OpenBSD macppc 6.6-current from a few days ago, with a ports
tree from about 2 weeks ago, including lang/guile2 version 2.2.6p0.
Your OpenBSD 6.5 would have lang/guile2 version 2.2.4p0.

Your backtrace shows a crash at "vm-engine.c:573 NEXT (0);".  I got the
crash in the same place.  The macro "NEXT (0);" has a part that reads
ip[0].  In my crash, I can't access *ip, so ip[0] probably caused the
crash by segfault.

This code in vm-engine.c "call" assigns ip before doing "NEXT (0);":

      if (SCM_LIKELY (SCM_PROGRAM_P (FP_REF (0))))
        ip = SCM_PROGRAM_CODE (FP_REF (0));
      else
        ip = (scm_t_uint32 *) vm_apply_non_program_code;

      APPLY_HOOK ();

      NEXT (0);

By looking at macro definitions, I concluded that "FP_REF (0)" gets
(vp->fp - 1)->as_scm, a pointer to a Scheme object; and SCM_PROGRAM_*
interpret the object as a scm_t_cell.  I printed this cell in egdb.

(gdb) print ip
$19 = (scm_t_uint32 *) 0x33955378
(gdb) print *ip
Cannot access memory at address 0x33955378
(gdb) print *(scm_t_cell *)((vp->fp - 1)->as_scm)
$20 = {word_0 = 0x45, word_1 = 0x33955378}

Here 0x45 is scm_tc7_program (so SCM_PROGRAM_P is true), and 0x33955378
is the bad pointer that SCM_PROGRAM_CODE gets from word_1.  Some code
might put bad pointers in program objects.  I modified guile to look for
such code.  I added a global "scm_t_uint32 aaa;" and added some checks
like "aaa = *pointer".  One such check crashed at vm-engine.c:1654
"make-closure":

      UNPACK_24 (op, dst);
      offset = ip[1];
      UNPACK_24 (ip[2], nfree);

      // FIXME: Assert range of nfree?
      SYNC_IP ();
      closure = scm_inline_words (thread, scm_tc7_program | (nfree << 16),
                                  nfree + 2);
      aaa = *(ip + offset);
      SCM_SET_CELL_WORD_1 (closure, ip + offset);
      // FIXME: Elide these initializations?
      for (n = 0; n < nfree; n++)
        SCM_PROGRAM_FREE_VARIABLE_SET (closure, n, SCM_BOOL_F);
      SP_SET (dst, closure);
      NEXT (3);

(gdb) print ip   
$12 = (scm_t_uint32 *) 0xcf1ea3b8
(gdb) print offset
$13 = -1005191168
(gdb) print *(ip + offset)
Cannot access memory at address 0xdf76a3b8
(gdb) print ip[1]
Cannot access memory at address 0xcf1ea3bc

I can't read ip[1] in the core dump, but the program did read ip[1] in
"offset = ip[1];" before the crash.  The call to scm_inline_words(), to
allocate the scm_tc7_program object, seems to have also freed the memory
where ip points.  This might be a problem with the garbage collector.

I also can't read ip[0] and ip[3] in the core dump.  If the program
didn't run "aaa = *(ip + offset);", it would crash when "NEXT (3);"
reads ip[3].  This doesn't make sense, because the original crash was
not at this "NEXT (3);", but at that other "NEXT (0);".  I seem to have
changed the behavior of the garbage collector.  I wonder if the GC scans
global variables, and my added "aaa" caused the change.

The garbage collector is from devel/boehm-gc version 7.6.0p3.  I did

$ cd /usr/ports/devel/boehm-gc
$ make test

and all 15 tests passed.  If the garbage collector has a problem, these
tests don't expose the problem.  I still don't know how to fix the
problem in Guile.  --George

Reply via email to