On 12/8/19 12:42 PM, George Koehler wrote:
On Thu, 05 Dec 2019 10:22:11 +0000
Stuart Henderson <s...@spacehopper.org> wrote:

On 5 December 2019 01:15:09 Matthew Hull <castersupm...@verizon.net> wrote:

I'm interested in guile2 (because I do some programming in Scheme) and
powerpc because I have a Mac Mini G4 with OpenBSD 6.5 installed.

The package is marked broken for powerpc...

Does the default make in include "-g" or "-ggdb" flags??? Would a build
with -O0 -ggdb be a practical debugging option??? If so, how could those
flags be propagated "from the top"?

make clean=all
make DEBUG="-O0 -g" install

Gdb in base is old and doesn't work too well - use a newer one from
packages: pkg_add gdb and use the "egdb" command.
Hello Matt.  For some reason, I didn't receive your mails.  I did
receive Stuart's reply, and other mails sent to ports@.  This problem
is at my end: I'm using GMail.  I'm reading your mails through the
archives at MARC.

Your backtrace https://marc.info/?l=openbsd-ports&m=157566079007497&w=2
shows where Guile crashes, but doesn't provide enough information to fix
the problem.  I have a PowerBook G4, so I have reproduced the crash and
gotten more info, but still don't know the fix.  My PowerBook5,4 runs a
snapshot of OpenBSD macppc 6.6-current from a few days ago, with a ports
tree from about 2 weeks ago, including lang/guile2 version 2.2.6p0.
Your OpenBSD 6.5 would have lang/guile2 version 2.2.4p0.

Your backtrace shows a crash at "vm-engine.c:573 NEXT (0);".  I got the
crash in the same place.  The macro "NEXT (0);" has a part that reads
ip[0].  In my crash, I can't access *ip, so ip[0] probably caused the
crash by segfault.

This code in vm-engine.c "call" assigns ip before doing "NEXT (0);":

       if (SCM_LIKELY (SCM_PROGRAM_P (FP_REF (0))))
         ip = SCM_PROGRAM_CODE (FP_REF (0));
       else
         ip = (scm_t_uint32 *) vm_apply_non_program_code;

       APPLY_HOOK ();

       NEXT (0);

By looking at macro definitions, I concluded that "FP_REF (0)" gets
(vp->fp - 1)->as_scm, a pointer to a Scheme object; and SCM_PROGRAM_*
interpret the object as a scm_t_cell.  I printed this cell in egdb.

(gdb) print ip
$19 = (scm_t_uint32 *) 0x33955378
(gdb) print *ip
Cannot access memory at address 0x33955378
(gdb) print *(scm_t_cell *)((vp->fp - 1)->as_scm)
$20 = {word_0 = 0x45, word_1 = 0x33955378}

Here 0x45 is scm_tc7_program (so SCM_PROGRAM_P is true), and 0x33955378
is the bad pointer that SCM_PROGRAM_CODE gets from word_1.  Some code
might put bad pointers in program objects.  I modified guile to look for
such code.  I added a global "scm_t_uint32 aaa;" and added some checks
like "aaa = *pointer".  One such check crashed at vm-engine.c:1654
"make-closure":

       UNPACK_24 (op, dst);
       offset = ip[1];
       UNPACK_24 (ip[2], nfree);

       // FIXME: Assert range of nfree?
       SYNC_IP ();
       closure = scm_inline_words (thread, scm_tc7_program | (nfree << 16),
                                   nfree + 2);
       aaa = *(ip + offset);
       SCM_SET_CELL_WORD_1 (closure, ip + offset);
       // FIXME: Elide these initializations?
       for (n = 0; n < nfree; n++)
         SCM_PROGRAM_FREE_VARIABLE_SET (closure, n, SCM_BOOL_F);
       SP_SET (dst, closure);
       NEXT (3);

(gdb) print ip
$12 = (scm_t_uint32 *) 0xcf1ea3b8
(gdb) print offset
$13 = -1005191168
(gdb) print *(ip + offset)
Cannot access memory at address 0xdf76a3b8
(gdb) print ip[1]
Cannot access memory at address 0xcf1ea3bc

I can't read ip[1] in the core dump, but the program did read ip[1] in
"offset = ip[1];" before the crash.  The call to scm_inline_words(), to
allocate the scm_tc7_program object, seems to have also freed the memory
where ip points.  This might be a problem with the garbage collector.

I also can't read ip[0] and ip[3] in the core dump.  If the program
didn't run "aaa = *(ip + offset);", it would crash when "NEXT (3);"
reads ip[3].  This doesn't make sense, because the original crash was
not at this "NEXT (3);", but at that other "NEXT (0);".  I seem to have
changed the behavior of the garbage collector.  I wonder if the GC scans
global variables, and my added "aaa" caused the change.

The garbage collector is from devel/boehm-gc version 7.6.0p3.  I did

$ cd /usr/ports/devel/boehm-gc
$ make test

and all 15 tests passed.  If the garbage collector has a problem, these
tests don't expose the problem.  I still don't know how to fix the
problem in Guile.  --George


Thanks George.  This is good information.  I'm traveling the next 2 weeks but I'm taking the G4 Mini with me in case I have time to work on it.  Thanks again for looking into it.

Reply via email to