On Thu, 05 Dec 2019 10:22:11 +0000 Stuart Henderson <s...@spacehopper.org> wrote:
> On 5 December 2019 01:15:09 Matthew Hull <castersupm...@verizon.net> wrote: > > > I'm interested in guile2 (because I do some programming in Scheme) and > > powerpc because I have a Mac Mini G4 with OpenBSD 6.5 installed. > > > > The package is marked broken for powerpc... > > > > Does the default make in include "-g" or "-ggdb" flags??? Would a build > > with -O0 -ggdb be a practical debugging option??? If so, how could those > > flags be propagated "from the top"? > > > make clean=all > make DEBUG="-O0 -g" install > > Gdb in base is old and doesn't work too well - use a newer one from > packages: pkg_add gdb and use the "egdb" command. Hello Matt. For some reason, I didn't receive your mails. I did receive Stuart's reply, and other mails sent to ports@. This problem is at my end: I'm using GMail. I'm reading your mails through the archives at MARC. Your backtrace https://marc.info/?l=openbsd-ports&m=157566079007497&w=2 shows where Guile crashes, but doesn't provide enough information to fix the problem. I have a PowerBook G4, so I have reproduced the crash and gotten more info, but still don't know the fix. My PowerBook5,4 runs a snapshot of OpenBSD macppc 6.6-current from a few days ago, with a ports tree from about 2 weeks ago, including lang/guile2 version 2.2.6p0. Your OpenBSD 6.5 would have lang/guile2 version 2.2.4p0. Your backtrace shows a crash at "vm-engine.c:573 NEXT (0);". I got the crash in the same place. The macro "NEXT (0);" has a part that reads ip[0]. In my crash, I can't access *ip, so ip[0] probably caused the crash by segfault. This code in vm-engine.c "call" assigns ip before doing "NEXT (0);": if (SCM_LIKELY (SCM_PROGRAM_P (FP_REF (0)))) ip = SCM_PROGRAM_CODE (FP_REF (0)); else ip = (scm_t_uint32 *) vm_apply_non_program_code; APPLY_HOOK (); NEXT (0); By looking at macro definitions, I concluded that "FP_REF (0)" gets (vp->fp - 1)->as_scm, a pointer to a Scheme object; and SCM_PROGRAM_* interpret the object as a scm_t_cell. I printed this cell in egdb. (gdb) print ip $19 = (scm_t_uint32 *) 0x33955378 (gdb) print *ip Cannot access memory at address 0x33955378 (gdb) print *(scm_t_cell *)((vp->fp - 1)->as_scm) $20 = {word_0 = 0x45, word_1 = 0x33955378} Here 0x45 is scm_tc7_program (so SCM_PROGRAM_P is true), and 0x33955378 is the bad pointer that SCM_PROGRAM_CODE gets from word_1. Some code might put bad pointers in program objects. I modified guile to look for such code. I added a global "scm_t_uint32 aaa;" and added some checks like "aaa = *pointer". One such check crashed at vm-engine.c:1654 "make-closure": UNPACK_24 (op, dst); offset = ip[1]; UNPACK_24 (ip[2], nfree); // FIXME: Assert range of nfree? SYNC_IP (); closure = scm_inline_words (thread, scm_tc7_program | (nfree << 16), nfree + 2); aaa = *(ip + offset); SCM_SET_CELL_WORD_1 (closure, ip + offset); // FIXME: Elide these initializations? for (n = 0; n < nfree; n++) SCM_PROGRAM_FREE_VARIABLE_SET (closure, n, SCM_BOOL_F); SP_SET (dst, closure); NEXT (3); (gdb) print ip $12 = (scm_t_uint32 *) 0xcf1ea3b8 (gdb) print offset $13 = -1005191168 (gdb) print *(ip + offset) Cannot access memory at address 0xdf76a3b8 (gdb) print ip[1] Cannot access memory at address 0xcf1ea3bc I can't read ip[1] in the core dump, but the program did read ip[1] in "offset = ip[1];" before the crash. The call to scm_inline_words(), to allocate the scm_tc7_program object, seems to have also freed the memory where ip points. This might be a problem with the garbage collector. I also can't read ip[0] and ip[3] in the core dump. If the program didn't run "aaa = *(ip + offset);", it would crash when "NEXT (3);" reads ip[3]. This doesn't make sense, because the original crash was not at this "NEXT (3);", but at that other "NEXT (0);". I seem to have changed the behavior of the garbage collector. I wonder if the GC scans global variables, and my added "aaa" caused the change. The garbage collector is from devel/boehm-gc version 7.6.0p3. I did $ cd /usr/ports/devel/boehm-gc $ make test and all 15 tests passed. If the garbage collector has a problem, these tests don't expose the problem. I still don't know how to fix the problem in Guile. --George