Package: ghc
Version: 7.6.3-7
Severity: important
Tags: upstream patch

Hi,

ghc has been removed from the archive on s390x because it hangs randomly
during the build process. This has been reported upstream as ticket
#7993, which hasn't progress so far. In the meantime the same issue has
been reported as ticket #8134 for powerpc64. It happens the problem is
the same and that it affect 64-bit big endian platforms.

A patch is provided in this bug report, and has been committed upstream.
I have tried this patch and I have been been able to build ghc
successfully 3 times in a loop after bootstraping it from the last
available binary in snapshot.d.o.

I have attached this patch to this bug report for convenience, so that
it could be dropped in debian/patches. Would it be possible to upload 
a fixed version with this patch? I will then take care of bootstrapping
the binary on s390x again and uploading the package to the archive.

Thanks,
Aurelien

-- System Information:
Debian Release: jessie/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: s390x

Kernel: Linux 3.2.0-4-s390x (SMP w/2 CPU cores)
Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
commit a4b1a43542b11d09dd3b603d82c5a0e99da67d74
Author: Austin Seipp <aus...@well-typed.com>
Date:   Fri Nov 1 22:17:01 2013 -0500

    Fix loop on 64bit Big-Endian platforms (#8134)
    
    This is a fun one.
    
    In the RTS, `cas` expects a pointer to StgWord which will translate to
    unsigned long (8 bytes under LP64.) But we had previously declared
    token_locked as *StgBool* - which evaluates to 'int' (4 bytes under
    LP64.) That means we fail to provide enough storage for the cas
    primitive, causing it to corrupt memory on a 64bit platform.
    
    Hilariously, this somehow did not affect little-endian platforms (ARM,
    x86, etc) before. That's because to clear our lock token, we would say:
    
        token_locked = 0;
    
    But because token_locked is 32bits technically, this only writes to
    half of the 64bit quantity. On a Big-Endian machine, this won't do
    anything. That is, token_locked starts as 0:
    
     / token_locked
     |
     v
     0x00000000
    
    and the first cas modifies the memory to:
    
     / valid    / corrupted
     |          |
     v          v
     0x00000000 0x00000001
    
    We then clear token_locked, but this doesn't change the corrupted 4
    bytes of memory. And then we try to lock the token again, spinning until
    it is released - clearly a deadlock.
    
    Related: Windows (amd64) doesn't follow LP64, but LLP64, where both
    int and long are 4 bytes, so this shouldn't change anything on these
    platforms.
    
    Thanks to Reid Barton for helping the diagnosis. Also, thanks to Jens
    Peterson who confirmed this also fixes building GHC on Fedora/ppc64 and
    Fedora/s390x.
    
    Authored-by: Gustavo Luiz Duarte <gustav...@linux.vnet.ibm.com>
    Signed-off-by: Austin Seipp <aus...@well-typed.com>

diff --git a/rts/STM.c b/rts/STM.c
index e342ebf..bea0356 100644
--- a/rts/STM.c
+++ b/rts/STM.c
@@ -949,7 +949,7 @@ void stmPreGCHook (Capability *cap) {
 static volatile StgInt64 max_commits = 0;
 
 #if defined(THREADED_RTS)
-static volatile StgBool token_locked = FALSE;
+static volatile StgWord token_locked = FALSE;
 
 static void getTokenBatch(Capability *cap) {
   while (cas((void *)&token_locked, FALSE, TRUE) == TRUE) { /* nothing */ }

Reply via email to