On 12/8/19 12:42 PM, George Koehler wrote:
On Thu, 05 Dec 2019 10:22:11 +0000
Stuart Henderson <s...@spacehopper.org> wrote:
On 5 December 2019 01:15:09 Matthew Hull <castersupm...@verizon.net> wrote:
I'm interested in guile2 (because I do some programming in Scheme) and
powerpc because I have a Mac Mini G4 with OpenBSD 6.5 installed.
The package is marked broken for powerpc...
Does the default make in include "-g" or "-ggdb" flags??? Would a build
with -O0 -ggdb be a practical debugging option??? If so, how could those
flags be propagated "from the top"?
make clean=all
make DEBUG="-O0 -g" install
Gdb in base is old and doesn't work too well - use a newer one from
packages: pkg_add gdb and use the "egdb" command.
Hello Matt. For some reason, I didn't receive your mails. I did
receive Stuart's reply, and other mails sent to ports@. This problem
is at my end: I'm using GMail. I'm reading your mails through the
archives at MARC.
Your backtrace https://marc.info/?l=openbsd-ports&m=157566079007497&w=2
shows where Guile crashes, but doesn't provide enough information to fix
the problem. I have a PowerBook G4, so I have reproduced the crash and
gotten more info, but still don't know the fix. My PowerBook5,4 runs a
snapshot of OpenBSD macppc 6.6-current from a few days ago, with a ports
tree from about 2 weeks ago, including lang/guile2 version 2.2.6p0.
Your OpenBSD 6.5 would have lang/guile2 version 2.2.4p0.
Your backtrace shows a crash at "vm-engine.c:573 NEXT (0);". I got the
crash in the same place. The macro "NEXT (0);" has a part that reads
ip[0]. In my crash, I can't access *ip, so ip[0] probably caused the
crash by segfault.
This code in vm-engine.c "call" assigns ip before doing "NEXT (0);":
if (SCM_LIKELY (SCM_PROGRAM_P (FP_REF (0))))
ip = SCM_PROGRAM_CODE (FP_REF (0));
else
ip = (scm_t_uint32 *) vm_apply_non_program_code;
APPLY_HOOK ();
NEXT (0);
By looking at macro definitions, I concluded that "FP_REF (0)" gets
(vp->fp - 1)->as_scm, a pointer to a Scheme object; and SCM_PROGRAM_*
interpret the object as a scm_t_cell. I printed this cell in egdb.
(gdb) print ip
$19 = (scm_t_uint32 *) 0x33955378
(gdb) print *ip
Cannot access memory at address 0x33955378
(gdb) print *(scm_t_cell *)((vp->fp - 1)->as_scm)
$20 = {word_0 = 0x45, word_1 = 0x33955378}
Here 0x45 is scm_tc7_program (so SCM_PROGRAM_P is true), and 0x33955378
is the bad pointer that SCM_PROGRAM_CODE gets from word_1. Some code
might put bad pointers in program objects. I modified guile to look for
such code. I added a global "scm_t_uint32 aaa;" and added some checks
like "aaa = *pointer". One such check crashed at vm-engine.c:1654
"make-closure":
UNPACK_24 (op, dst);
offset = ip[1];
UNPACK_24 (ip[2], nfree);
// FIXME: Assert range of nfree?
SYNC_IP ();
closure = scm_inline_words (thread, scm_tc7_program | (nfree << 16),
nfree + 2);
aaa = *(ip + offset);
SCM_SET_CELL_WORD_1 (closure, ip + offset);
// FIXME: Elide these initializations?
for (n = 0; n < nfree; n++)
SCM_PROGRAM_FREE_VARIABLE_SET (closure, n, SCM_BOOL_F);
SP_SET (dst, closure);
NEXT (3);
(gdb) print ip
$12 = (scm_t_uint32 *) 0xcf1ea3b8
(gdb) print offset
$13 = -1005191168
(gdb) print *(ip + offset)
Cannot access memory at address 0xdf76a3b8
(gdb) print ip[1]
Cannot access memory at address 0xcf1ea3bc
I can't read ip[1] in the core dump, but the program did read ip[1] in
"offset = ip[1];" before the crash. The call to scm_inline_words(), to
allocate the scm_tc7_program object, seems to have also freed the memory
where ip points. This might be a problem with the garbage collector.
I also can't read ip[0] and ip[3] in the core dump. If the program
didn't run "aaa = *(ip + offset);", it would crash when "NEXT (3);"
reads ip[3]. This doesn't make sense, because the original crash was
not at this "NEXT (3);", but at that other "NEXT (0);". I seem to have
changed the behavior of the garbage collector. I wonder if the GC scans
global variables, and my added "aaa" caused the change.
The garbage collector is from devel/boehm-gc version 7.6.0p3. I did
$ cd /usr/ports/devel/boehm-gc
$ make test
and all 15 tests passed. If the garbage collector has a problem, these
tests don't expose the problem. I still don't know how to fix the
problem in Guile. --George
Thanks George. This is good information. I'm traveling the next 2
weeks but I'm taking the G4 Mini with me in case I have time to work on
it. Thanks again for looking into it.