Source: openmpi Version: 1.6.5-8 Severity: important Tags: patch User: debian-al...@lists.debian.org Usertags: alpha Justification: Causes FTBFS in other packages that built in the past.
The atomic operations defined for Alpha in openmpi can cause weird behaviour including segfaults leading to failures to build packages that build-depend on openmpi binary packages. This probably only arises now with a binNMU of openmpi and compilation with a newer gcc that has tighter requirements of asm constructs. The current definition of opal_atomic_cmpset_32() in opal/include/opal/sys/alpha/atomic.h has the following statement: __asm __volatile__ ( "1: ldl_l %0, %1 \n\t" "cmpeq %0, %2, %0 \n\t" "beq %0, 2f \n\t" "mov %3, %0 \n\t" "stl_c %0, %1 \n\t" "beq %0, 1b \n\t" "jmp 3f \n" "2: mov $31, %0 \n" "3: \n" : "=&r" (ret), "+m" (*addr) : "r" (oldval), "r" (newval) : "memory"); however the "jmp 3f" instruction is an assembler macro and expands to two CPU instructions that use CPU register t12 ($26) and the global pointer ($29) to construct the address to label 3 and jump there. This modification of t12 is not in the clobber list of the asm statement, and the use of the global pointer is not listed in the asm inputs. So, for example, in the function orte_plm_base_check_job_completed() defined in the source file orte/mca/plm/base/plm_base_launch_support.c a segfault results because the compiler sets up t12 as the pointer to a struct, inserts the atomic asm code above, then accesses a field in the structure via t12, which has been corrupted by the included atomic asm. This leads to the created executable /usr/bin/orted to segfault in the test suite of mpi4py, hence the build failure of mpi4py on Alpha [1]. Further segfaults occur in functions such as opal_atomic_add_32() defined in the source file opal/include/opal/sys/atomic_impl.h because the compiler determines that they are leaf functions that do not make use of the global pointer so does not initialise the global pointer, but the inserted asm code does in fact use the global pointer, so, kaboom! The asm code above is weird anyway. The "mov $31, %0" statement, which clears the output %0 to zero, is superfluous as %0 must already be zero because the only entry to label 2 is from the "beq %0, 2f" and that statement will only branch to label 2 if the output %0 is zero! So I recommend the following more efficient version of the asm code: __asm__ __volatile__ ( "1: ldl_l %0, %1 \n\t" "cmpeq %0, %2, %0 \n\t" "beq %0, 2f \n\t" "mov %3, %0 \n\t" "stl_c %0, %1 \n\t" "beq %0, 1b \n\t" "2: \n" : "=&r" (ret), "+m" (*addr) : "r" (oldval), "r" (newval) : "memory"); I attach a patch that fixes both opal_atomic_cmpset_32() and opal_atomic_cmpset_64() on Alpha. With that openmpi builds to completion and with the fixed openmpi mpi4py also builds to completion. Cheers Michael. [1] http://buildd.debian-ports.org/status/package.php?p=mpi4py
Index: openmpi-1.6.5/opal/include/opal/sys/alpha/atomic.h =================================================================== --- openmpi-1.6.5.orig/opal/include/opal/sys/alpha/atomic.h +++ openmpi-1.6.5/opal/include/opal/sys/alpha/atomic.h @@ -87,18 +87,16 @@ static inline void opal_atomic_wmb(void) static inline int opal_atomic_cmpset_32( volatile int32_t *addr, int32_t oldval, int32_t newval) { - int32_t ret; + int32_t ret; - __asm __volatile__ ( + __asm__ __volatile__ ( "1: ldl_l %0, %1 \n\t" "cmpeq %0, %2, %0 \n\t" "beq %0, 2f \n\t" "mov %3, %0 \n\t" "stl_c %0, %1 \n\t" "beq %0, 1b \n\t" - "jmp 3f \n" - "2: mov $31, %0 \n" - "3: \n" + "2: \n" : "=&r" (ret), "+m" (*addr) : "r" (oldval), "r" (newval) : "memory"); @@ -141,9 +139,7 @@ static inline int opal_atomic_cmpset_64( "mov %3, %0 \n\t" "stq_c %0, %1 \n\t" "beq %0, 1b \n\t" - "jmp 3f \n" - "2: mov $31, %0 \n" - "3: \n" + "2: \n" : "=&r" (ret), "+m" (*addr) : "r" (oldval), "r" (newval) : "memory");