Problem with stores and loads from unions and proposed fix
This problem showed up in a PDP10 C version of GCC I'm responsible for and took a good while to track down. The fix is in generic gcc code so even though my PDP10 compiler is not an official gcc version and I haven't been successful at creating a failing program on the Intel compiler it seems like it should cause problems elsewhere so I figured I should pass it on. Here's a union that allows referencing bits in a word in different ways (the PDP10 has a 36 bit word, but that doesn't seem to be an issue here) union { int word; struct { unsigned long w0 : 32; unsigned long pad : 4; } i32; struct { unsigned long s0 : 16; unsigned long s1 : 16; unsigned long pad : 4; } i16; struct { unsigned long b0 : 8; unsigned long b1 : 8; unsigned long b2 : 8; unsigned long b3 : 8; unsigned long pad : 4; } i8; } u32; u32.word = ; /* in a subsequent different basic block which is guaranteed to be reached with u32 unchanged */ u32.i8.b1 = ... ; ... = u32.word ; CSE detects that the same subexpression is used in two places and substitutes a reaching register for the reference to u32.word without noticing that the memory has been modified by the bit field reference. Adding a call to invalidate_any_buried_refs(dest) flags the memory reference in such a way that it is not ignored and the erroneous CSE optimization is not done. --- gcse.c (revision 156482) +++ gcse.c (working copy) @@ -4783,7 +4783,12 @@ compute_ld_motion_mems (void) else ptr->invalid = 1; } - } + else +{ + /* Make sure there isn't a buried store somewhere. */ + invalidate_any_buried_refs (dest); +} + } else invalidate_any_buried_refs (PATTERN (insn)); } Thanks to anyone who can help determine whether this is a problem for other gcc versions and getting a fix into the gcc source. Martin Chaney XKL, LLC
Re: Problem with stores and loads from unions and proposed fix
Richard Guenther wrote: On Sat, Feb 6, 2010 at 3:44 AM, Martin Chaney wrote: This problem showed up in a PDP10 C version of GCC I'm responsible for and took a good while to track down. The fix is in generic gcc code so even though my PDP10 compiler is not an official gcc version and I haven't been successful at creating a failing program on the Intel compiler it seems like it should cause problems elsewhere so I figured I should pass it on. Here's a union that allows referencing bits in a word in different ways (the PDP10 has a 36 bit word, but that doesn't seem to be an issue here) union { int word; struct { unsigned long w0 : 32; unsigned long pad : 4; } i32; struct { unsigned long s0 : 16; unsigned long s1 : 16; unsigned long pad : 4; } i16; struct { unsigned long b0 : 8; unsigned long b1 : 8; unsigned long b2 : 8; unsigned long b3 : 8; unsigned long pad : 4; } i8; } u32; u32.word = ; /* in a subsequent different basic block which is guaranteed to be reached with u32 unchanged */ u32.i8.b1 = ... ; ... = u32.word ; CSE detects that the same subexpression is used in two places and substitutes a reaching register for the reference to u32.word without noticing that the memory has been modified by the bit field reference. Adding a call to invalidate_any_buried_refs(dest) flags the memory reference in such a way that it is not ignored and the erroneous CSE optimization is not done. --- gcse.c (revision 156482) +++ gcse.c (working copy) @@ -4783,7 +4783,12 @@ compute_ld_motion_mems (void) else ptr->invalid = 1; } - } + else +{ + /* Make sure there isn't a buried store somewhere. */ + invalidate_any_buried_refs (dest); +} + } else invalidate_any_buried_refs (PATTERN (insn)); } Thanks to anyone who can help determine whether this is a problem for other gcc versions and getting a fix into the gcc source. Please specify the GCC version this happens with and show the RTL before gcse. Richard. Martin Chaney XKL, LLC My PDP10 compiler is based on GCC 4.3.0. (I'm still trying to port the 4.4 gimplification changes.) Pasted below is the RTL from the pass prior to GCSE. Without the fix, GCSE will replace the use of REG 48 in insn 70 with a reaching register which contains the value (mem:SI (reg:SI 16 )) and delete insn 69. With the fix, this section of code is not modified by GCSE. (note 66 65 67 15 [bb 15] NOTE_INSN_BASIC_BLOCK) (insn 67 66 68 15 testunion.c:52 (set (reg:SI 46) (const_int 0 [0x0])) 66 {*movsi} (nil)) (insn 68 67 69 15 testunion.c:52 (set (zero_extract:SI (mem/s/j/c:SI (reg/f:SI 16 ) [0+0 S4 A36]) (const_int 8 [0x8]) (const_int 24 [0x18])) (reg:SI 46)) 48 {*insv_memsi} (expr_list:REG_EQUAL (const_int 0 [0x0]) (nil))) (insn 69 68 70 15 testunion.c:53 (set (reg:SI 48) (mem/s/c:SI (reg/f:SI 16 ) [0+0 S4 A36])) 66 {*movsi} (nil)) (insn 70 69 71 15 testunion.c:53 (set (reg:SI 48) (lshiftrt:SI (reg:SI 48) (neg:SI (const_int -4 [0xfffc] 155 {LSH_right} (nil)) (insn 71 70 73 15 testunion.c:53 (set (reg:SI 48) (and:SI (reg:SI 48) (const_int 65535 [0x]))) 170 {*andsi3} (nil)) (insn 73 71 76 15 testunion.c:53 (set (reg/v:SI 32 [ sum ]) (plus:SI (reg/v:SI 32 [ sum ]) (reg:SI 48))) 98 {*addsi3} (nil)) ;; End of basic block 15 -> ( 8)
gcc compiler for pdp10
Hi, I'm am the proprietor of a gcc compiler for the PDP10 architecture. (This is a compiler previously worked on by Lars Brinkhoff who left XKL some while before I joined XKL. It's possible some of you may have been familiar with him or the compiler from that time.) The compiler is currently in a state where it is synched with the both the 4.3 and 4.4 branches, and it passes the testsuite tests (with the exception of some I've flagged as expected failures for the pdp10). My employer is happy to release my work on the gcc compiler back to the gcc community and I've sent in a request for the necessary forms. The PDP10 architecture is unusual in various ways that distinguish it from the mainstream architectures supported by the gcc compiler and this has made the development of this compiler a significant task. Undoubtedly I've made customizations in inappropriate ways. I'm seeking contacts with people who might be able to advise me on how to cleanup my implementation to reduce the amount of #ifdef __PDP10_H__ I've sprinkled liberally throughout the source. Also, if its possible to get simple changes made to prevent breaking my PDP10 version and that are otherwise innocuous that would be wonderful. For example, the PDP10 word size is 36 bits; Fairly recently people have taken to writing code that assumes word size is a power of 2 even when it's straightforward to write in a manner that doesn't make that assumption. Considering the large number of files customized to get the PDP10 compiler working, I'm not sure whether it's possible to get it to build directly from the gcc trunk, but it would be nice to work toward that goal. Some other things which distinguish the PDP10 architecture from assumptions in the gcc code base include: its variety of formats of pointers only one of which can be viewed as an integer and that one is capable of referencing only word aligned data, a functional difference between signed and unsigned integers, and peculiarities to the use of PDP10 byte arrays which are very difficult to describe. Any help or advise would be appreciated. Martin Chaney XKL, LLC