On 12/12/19 2:05 PM, Charlene Wendling wrote:
Hi,
On Wed, 11 Dec 2019 20:50:00 -0500
George Koehler wrote:
I believe that the files in WRKSRC/prebuilt/32-bit-big-endian are
broken: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=26854
The diff below adds a post-extract target that moves away the prebuilt
files, so the build ignores them. This fixes the build for me, but
the build is slow, takes about 24 hours on my G4 at 666 MHz.
On Sun, 8 Dec 2019 13:42:38 -0500
George Koehler <kern...@gmail.com> wrote:
... Some code
might put bad pointers in program objects. I modified guile to
look for such code. I added a global "scm_t_uint32 aaa;" and added
some checks like "aaa = *pointer". One such check crashed at
vm-engine.c:1654 "make-closure":
UNPACK_24 (op, dst);
offset = ip[1];
UNPACK_24 (ip[2], nfree);
// FIXME: Assert range of nfree?
SYNC_IP ();
closure = scm_inline_words (thread, scm_tc7_program | (nfree
<< 16), nfree + 2);
aaa = *(ip + offset);
SCM_SET_CELL_WORD_1 (closure, ip + offset);
// FIXME: Elide these initializations?
for (n = 0; n < nfree; n++)
SCM_PROGRAM_FREE_VARIABLE_SET (closure, n, SCM_BOOL_F);
SP_SET (dst, closure);
NEXT (3);
(gdb) print ip
$12 = (scm_t_uint32 *) 0xcf1ea3b8
(gdb) print offset
$13 = -1005191168
(gdb) print *(ip + offset)
Cannot access memory at address 0xdf76a3b8
(gdb) print ip[1]
Cannot access memory at address 0xcf1ea3bc
I can't read ip[1] in the core dump, but the program did read ip[1]
in "offset = ip[1];" before the crash. The call to
scm_inline_words(), to allocate the scm_tc7_program object, seems
to have also freed the memory where ip points. This might be a
problem with the garbage collector.
The failure to read ip[1] was a red herring. Before the crash, `ip`
pointed to an mmap(2) file. In ktrace(1), the file was somewhere
under prebuilt/32-bit-big-endian. This mapping disappeared in the
core dump, so GDB can't access it.
`offset` -1005191168 is 0xc4160000. This looks like the wrong byte
order. The correct value might be 0x000016c4 = 5828. This would make
more sense, if ip + offset should be inside the file!
modules/system/vm/assembler.scm can byte-swap values when it emits
bytecode for a different-endian machine. If a little-endian machine
wrote the prebuilt/32-bit-big-endian files, and assembler.scm forgot
to swap `offset`, then it would cause this bug.
powerpc might be the only 32-bit-big-endian arch where OpenBSD builds
packages. mips64 and sparc64 might be 64-bit-big-endian (but there is
no prebuilt/64-bit-big-endian, so those arches would bootstrap without
prebuilt files), and the other arches might be *-little-endian.
With no prebuilt files, the build ran some slow "bootstrap" commands
on my 666 MHz cpu. (The MPC7447A in my PowerBook G4 can run at 1333
MHz using apmd(8) and apm -A, but I left it at 666 MHz.) The first
bootstrap command took more than 100 minutes. The second command took
just over 4 hours. The next commands continued overnight, and the
whole build might have taken almost 24 hours. The build passes most
tests:
SKIP: test-pthread-create-secondary
FAIL: test-stack-overflow
FAIL: test-out-of-memory
==================================
2 of 38 tests failed
(1 test was not run)
Here's the diff. I didn't set REVISION because powerpc had no
package, and I guess that other arches would ignore
prebuilt/32-bit-big-endian.
--George
It builds and works fine here, it's sure slow to build, i hope upstream
will provide a working bootstrap. I think we should keep `mv' as it
makes the situation clear.
Thanks a lot :)
OK cwen@
Index: Makefile
===================================================================
RCS file: /cvs/ports/lang/guile2/Makefile,v
retrieving revision 1.23
diff -u -p -r1.23 Makefile
--- Makefile 16 Jul 2019 21:29:41 -0000 1.23
+++ Makefile 12 Dec 2019 01:02:07 -0000
@@ -3,8 +3,6 @@
# When updating, check that x11/gnome/aisleriot MODGNOME_CPPFLAGS
# references the proper guile2 includes directory
-BROKEN-powerpc= Segmentation fault (core dumped)
-
COMMENT= GNU's Ubiquitous Intelligent Language for
Extension
# '
@@ -51,6 +49,10 @@ CONFIGURE_ARGS= --program-suffix=$
{V}
# Needed because otherwise regress tests won't build:
# warning: format '%ji' expects type 'intmax_t', but argument 4 has
# type 'scm_t_intmax'
CONFIGURE_ARGS += --disable-error-on-warning
+
+# powerpc: Prevent "Segmentation fault (core dumped)" during build.
+post-patch:
+ mv ${WRKSRC}/prebuilt/32-bit-big-endian{,-broken}
post-install:
install -d ${PREFIX}/share/guile/site/${V}/
Summarizing what seems to have happened to guile2, according to my
understanding of guile, over time, some popular scheme functions become
standardized by committee and are added to the scheme canon and ship
pre-compiled (in this case, guile compiled). Groups of these compiled
objects can be included in user programs with include directives. So if
during a package build, the pre-compiled built-in scheme functions are
missing (moved away as in this patch), guile rebuilds them. There are
quite a lot of them, which explains why they ship them pre-built since
it takes awhile to build (guile compile) all the official built-in
add-ons to scheme.
In a similar way, new user-defined functions are compiled on the fly and
added to a cache so that they don't need subsequent re-recompilation
(unless the source was changed).
I am curious how some pre-built things got "wrong-endianed", though.
Maybe at a version change, something was altered but one of the
pre-builts didn't get flagged as needing rebuild.
Guile 2.2 implements significant efficiencies over 1.8 and even 2.0, as
much as 30% speedup, so very grateful for this fix.
Regards,
Matt