RE: postreload problem using reload_inm and SECONDARY_RELOAD macros
further hints : The immediate which gcc wants to move in R_REGS is a (const_int 8) as described in the error message below : arithmetic.c:197: error: insn does not satisfy its constraints: (insn 1505 903 1506 0 (set (reg:SI 25 $R9) (const_int 8 [0x8])) 0 {movsi_internal} (nil) (nil)) arithmetic.c:197: internal compiler error: in reload_cse_simplify_operands, at postreload.c:391 So I added some debug trace in function 'my_secondary_input_reload_class' to understand why (const_int 8) is not reloaded to C_REGS. 'my_secondary_input_reload_class' is called several times but never with rtl expression (const_int 8). I also checked 'my_preferred_reload_class' function and I don't see expression (const_int 8). Selim
Are there plans to maintain the CFG througout the compilation?
I would like to analyse code using the GCC CFG, however, some steps invalidate it notably delay slot scheduling. Are there plans to move toward "how it ought to work" as outlined in : http://gcc.gnu.org/wiki/basic_block_graph Peter.
CCmode size
Hi, genmodes.c has the following comment: /* Again, nothing more need be said. For historical reasons, the size of a CC mode is four units. */ validate_mode (m, UNSET, UNSET, UNSET, UNSET, UNSET); m->bytesize = 4; Now, this is probably ok for _most_ archs but for my arch where a word == byte == 16 bits, this causes a lot of pain. Is there a way (macro?) to change this to m->bytesize = 1; in the backend without hardcoding it into genmodes.c? Cheers, -- PMatos
RE: CCmode size
>genmodes.c has the following comment: > > > /* Again, nothing more need be said. For historical reasons, > > > the size of a CC mode is four units. */ > validate_mode (m, UNSET, UNSET, UNSET, UNSET, UNSET); > > m->bytesize = 4; > > >Now, this is probably ok for _most_ archs but for my arch where a word == byte >== 16 bits, this causes a lot of pain. Is there a way (macro?) to change this >to m->bytesize = 1; in the backend without hardcoding it into genmodes.c? It would seem that making it equal to word size (whatever that is on the platform) or size of the int type would be a way to make this better. Would that have any bad consequences for other platforms? paul
RE: CCmode size
Quoting paul_kon...@dell.com: It would seem that making it equal to word size (whatever that is on the platform) or size of the int type would be a way to make this better. Would that have any bad consequences for other platforms? For MXP (16-bit word addressed, with 128 bit vector registers) I had this: 2008-04-25 J"orn Rennecke * genmodes.c (vector_class): (complete_mode): Allow bytesize to have been set for MODE_CC. case MODE_CC: /* Again, nothing more need be said. For historical reasons, -the size of a CC mode is four units. */ - validate_mode (m, UNSET, UNSET, UNSET, UNSET, UNSET); +the size of a CC mode defaults to four units. */ + if (m->bytesize != blank_mode.bytesize) + validate_mode (m, UNSET, SET, UNSET, UNSET, UNSET); + else + { + validate_mode (m, UNSET, UNSET, UNSET, UNSET, UNSET); + m->bytesize = 4; + } - m->bytesize = 4; m->ncomponents = 1; m->component = 0; break; Then there was also: ChangeLog.ARC: (SIZED_CC_MODE): New macro. genmodes.c:#define SIZED_CC_MODE(N, Y) (CC_MODE (N)->bytesize = (Y)) Of course, MXP needed a few more patches to support MODE_VECTOR_PARTIAL_INT and MODE_VECTOR_CC modes. In mxp-modes.def, I had then: VECTOR_MODES (INT, 4); /* V4QI V2HI */ VECTOR_MODES (INT, 8); /* V8QI V4HI V2SI */ VECTOR_MODES (INT, 16); /* V16QI V8HI V4SI V2DI */ PARTIAL_INT_MODE (SI); /* Needed to make V2PSI / V4PSI. */ VECTOR_MODE (PARTIAL_INT, PSI, 2); /* V2PSI, flags for DImode arithmetic. */ VECTOR_MODE (PARTIAL_INT, PSI, 4); /* V4PSI, flags for V2DImode arithmetic. */ VECTOR_MODES (FLOAT, 8); /* V2SF */ VECTOR_MODES (FLOAT, 16); /* V4SF V2DF */ #define CC_MODES(N) SIZED_CC_MODE (N, 2); \ VECTOR_MODE (CC, N, 2); VECTOR_MODE (CC, N, 4); VECTOR_MODE (CC, N, 8) CC_MODES (CCI); /* Ordinary integer flags. */ CC_MODES (CCZN); /* Only zero / negative flag relevant. */ CC_MODES (CCZ); /* Only zero flag relevant. */ VECTOR_MODE (CC, CC, 2); /* V2CCmode - flag clobber for DI arithmetic. */ VECTOR_MODE (CC, CC, 4); /* V4CCmode - flag clobber for V2DI arithmetic. */
IRA misses register range overlap
In the msp430 back end, hard registers 4 through 15 are HImode, with adjacent register sequences used for SImode and DImode. In preparation for a library call, I'm emitting RTL that assigns values directly to reg:SI 4. Despite that, in gcc 4.5.x IRA choses reg:HI 4 as the destination for a pseudo-register for a preceding assignment, and does nothing to preserve the value over the span where the register is part of an SI value. The subsequence: (insn 2 4 3 2 (set (reg/v:HI 38 [ x ]) (reg:HI 15 r15 [ x ])) test.c:28 21 {*movhi2} (expr_list:REG_DEAD (reg:HI 15 r15 [ x ]) (nil))) (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) (note 6 3 10 2 NOTE_INSN_DELETED) (insn 10 6 11 2 (set (reg:SI 8 r8) (mem/c/i:SI (symbol_ref:HI ("seed") [flags 0x2] ) [2 seed+0 S4 A16])) test.c:14 24 {*movsi2} (nil)) (insn 11 10 12 2 (set (reg:SI 4 r4) (const_int 33614 [0x834e])) test.c:14 24 {*movsi2} (nil)) with: insn=2, live_throughout: 1, dead_or_set: 15, 38 insn=10, live_throughout: 1, 38, dead_or_set: 8, 9 insn=11, live_throughout: 1, 8, 9, 38, dead_or_set: 4, 5 insn=12, live_throughout: 1, 38, dead_or_set: 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 becomes: (insn 2 4 3 2 (set (reg/v:HI 4 r4 [orig:38 x ] [38]) (reg:HI 15 r15 [ x ])) test.c:28 21 {*movhi2} (nil)) (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) (note 6 3 10 2 NOTE_INSN_DELETED) (insn 10 6 11 2 (set (reg:SI 8 r8) (mem/c/i:SI (symbol_ref:HI ("seed") [flags 0x2] ) [2 seed+0 S4 A16])) test.c:14 24 {*movsi2} (nil)) (insn 11 10 12 2 (set (reg:SI 4 r4) (const_int 33614 [0x834e])) test.c:14 24 {*movsi2} (nil)) and the subsequent reference to reg:HI 4 (formerly reg/v:HI 38) has value 33614 instead of the user's parameter. Could somebody suggest where should I look to understand why this is happening and how should it be fixed? Thanks. Peter
Re: IRA misses register range overlap
On 09/15/2011 11:16 AM, Peter Bigot wrote: In the msp430 back end, hard registers 4 through 15 are HImode, with adjacent register sequences used for SImode and DImode. In preparation for a library call, I'm emitting RTL that assigns values directly to reg:SI 4. Despite that, in gcc 4.5.x IRA choses reg:HI 4 as the destination for a pseudo-register for a preceding assignment, and does nothing to preserve the value over the span where the register is part of an SI value. The subsequence: (insn 2 4 3 2 (set (reg/v:HI 38 [ x ]) (reg:HI 15 r15 [ x ])) test.c:28 21 {*movhi2} (expr_list:REG_DEAD (reg:HI 15 r15 [ x ]) (nil))) (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) (note 6 3 10 2 NOTE_INSN_DELETED) (insn 10 6 11 2 (set (reg:SI 8 r8) (mem/c/i:SI (symbol_ref:HI ("seed") [flags 0x2]) [2 seed+0 S4 A16])) test.c:14 24 {*movsi2} (nil)) (insn 11 10 12 2 (set (reg:SI 4 r4) (const_int 33614 [0x834e])) test.c:14 24 {*movsi2} (nil)) with: insn=2, live_throughout: 1, dead_or_set: 15, 38 insn=10, live_throughout: 1, 38, dead_or_set: 8, 9 insn=11, live_throughout: 1, 8, 9, 38, dead_or_set: 4, 5 insn=12, live_throughout: 1, 38, dead_or_set: 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 becomes: (insn 2 4 3 2 (set (reg/v:HI 4 r4 [orig:38 x ] [38]) (reg:HI 15 r15 [ x ])) test.c:28 21 {*movhi2} (nil)) (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) (note 6 3 10 2 NOTE_INSN_DELETED) (insn 10 6 11 2 (set (reg:SI 8 r8) (mem/c/i:SI (symbol_ref:HI ("seed") [flags 0x2]) [2 seed+0 S4 A16])) test.c:14 24 {*movsi2} (nil)) (insn 11 10 12 2 (set (reg:SI 4 r4) (const_int 33614 [0x834e])) test.c:14 24 {*movsi2} (nil)) and the subsequent reference to reg:HI 4 (formerly reg/v:HI 38) has value 33614 instead of the user's parameter. Could somebody suggest where should I look to understand why this is happening and how should it be fixed? The best way is to file a bug to http://gcc.gnu.org/bugzilla/. You should submit the test (the smaller the test, the better) and how to reproduce it: how to build gcc (configure options) and how to call the built gcc to reproduce results. I think you could look at ira dump file and check that allocno for p38 conflicting with hard reg 4 *and* 5. If it is not, the problem is in conflict calculation. Otherwise, it might be IRA hard register assignment or in reload (the worst case). But having only info you provided it is very hard to say what is wrong.
Re: [google] Merge trunk into google/integration
LGTM Ollie On Wed, Sep 14, 2011 at 3:29 PM, Diego Novillo wrote: > > This merge brings google/integration up to rev 178783. I also > merged rev 178833 to get the testsuite validation script I > committed to trunk yesterday. > > Simon, Ollie, I expect our internal builder to fail until I > incorporate validate_failures.py into it. It's a catch-22, but > it is easier to keep the local changes to the builder than the > whole merge. > > I have reverted all the xfail/skip markers we used to have. I > moved the ones that still fail to the new xfail manifest file in > contrib/testsuite-management (we'll likely need manifests for > other platforms as well). > > Tested on x86_64. Committed to google/integration. > > > 2011-09-14 Diego Novillo > > Mainline merge rev 178783. > Cherry pick mainline rev 178833. > > 2011-09-14 Diego Novillo > > contrib/ChangeLog.google-integration > > * testsuite-management/x86_64-unknown-linux-gnu.xfail: New. > > gcc/testsuite/ChangeLog.google-integration > > * g++.dg/tree-prof/partition2.C: Revert to mainline variant. > * g++.dg/tree-ssa/pr41186.C: Likewise. > * gcc.dg/cproj-fails-with-broken-glibc.c: Likewise. > * gcc.dg/guality/sra-1.c: Likewise. > * gcc.dg/guality/vla-1.c: Likewise. > * gcc.dg/guality/vla-2.c: Likewise. > * gcc.dg/inline_3.c: Likewise. > * gcc.dg/inline_4.c: Likewise. > * gcc.dg/tree-ssa/vrp47.c: Likewise. > * gcc.dg/uninit-B.c: Likewise. > * gcc.dg/uninit-pr19430.c: Likewise. > * gcc.dg/unroll_2.c: Likewise. > * gcc.dg/unroll_3.c: Likewise. > * gcc.dg/unroll_4.c: Likewise. > * gcc.target/i386/pr27827.c: Likewise. > * gcc.target/i386/sse4_1-blendps-2.c: Likewise. > * gcc.target/i386/sse4_1-blendps.c: Likewise. > > libmudflap/ChangeLog.google-integration > > * testsuite/libmudflap.c++/pass55-frag.cxx: Revert to > mainline variant. > > libstdc++-v3/ChangeLog.google-integration: > > * testsuite/23_containers/vector/requirements/dr438/assign_neg.cc: > Revert to mainline variant. > * > testsuite/23_containers/vector/requirements/dr438/constructor_1_neg.cc: > Likewise. > * > testsuite/23_containers/vector/requirements/dr438/constructor_2_neg.cc: > Likewise. > * testsuite/23_containers/vector/requirements/dr438/insert_neg.cc: > Likewise. > > diff --git > a/svnclient/contrib/testsuite-management/x86_64-unknown-linux-gnu.xfail > b/svnclient/contrib/testsuite-management/x86_64-unknown-linux-gnu.xfail > new file mode 100644 > index 000..b3e86a5 > --- /dev/null > +++ b/svnclient/contrib/testsuite-management/x86_64-unknown-linux-gnu.xfail > @@ -0,0 +1,59 @@ > +# These tests fail in trunk in all configurations. > +FAIL: 23_containers/vector/requirements/dr438/assign_neg.cc (test for > errors, line 1222) > +FAIL: 23_containers/vector/requirements/dr438/assign_neg.cc (test for excess > errors) > +FAIL: 23_containers/vector/requirements/dr438/constructor_1_neg.cc (test for > excess errors) > +FAIL: 23_containers/vector/requirements/dr438/constructor_1_neg.cc (test for > errors, line 1152) > +FAIL: 23_containers/vector/requirements/dr438/constructor_2_neg.cc (test for > excess errors) > +FAIL: 23_containers/vector/requirements/dr438/constructor_2_neg.cc (test for > errors, line 1152) > +FAIL: 23_containers/vector/requirements/dr438/insert_neg.cc (test for excess > errors) > +FAIL: 23_containers/vector/requirements/dr438/insert_neg.cc (test for > errors, line 1263) > +FAIL: gcc.dg/cproj-fails-with-broken-glibc.c execution test > +XPASS: gcc.dg/inline_3.c (test for excess errors) > +XPASS: gcc.dg/inline_4.c (test for excess errors) > +FAIL: gcc.dg/tree-ssa/vrp47.c scan-tree-dump-times dom1 "x[^ ]* & y" 1 > +XPASS: gcc.dg/uninit-B.c uninit i warning (test for warnings, line 12) > +XPASS: gcc.dg/uninit-pr19430.c uninitialized (test for warnings, line 41) > +XPASS: gcc.dg/uninit-pr19430.c (test for warnings, line 32) > +XPASS: gcc.dg/unroll_2.c (test for excess errors) > +XPASS: gcc.dg/unroll_3.c (test for excess errors) > +XPASS: gcc.dg/unroll_4.c (test for excess errors) > +FAIL: libmudflap.c++/pass55-frag.cxx ( -O) execution test > + > +# The following tests are failing with gold. The LTO plugin is not resolving > +# names properly. Only builds configured to use gold will show these. > +UNRESOLVED: gcc.c-torture/execute/20010209-1.c execution, -O2 -flto > -flto-partition=none > +UNRESOLVED: gcc.c-torture/execute/20010209-1.c execution, -O2 -flto > +FAIL: gcc.c-torture/execute/20010209-1.c compilation, -O2 -flto (internal > compiler error) > +FAIL: gcc.c-torture/execute/20010209-1.c compilation, -O2 -flto > -flto-partition=none (internal compiler error) > + > +# These tests fail in trunk when compiled with -m32. > +FAIL: boehm-gc.c/thread_leak_test.c -O2 (test for excess errors) > +FAIL: gcc.target/i386/pr27827.c scan-assembler fmul[ \t]*%st > +FAIL: gfort
Re: should sync builtins be full optimization barriers?
> > I'd say they should be optimization barriers too (and at the tree level > > they I think work that way, being represented as function calls), so if > > they don't act as memory barriers in RTL, the *.md patterns should be > > fixed. The only exception should be IMHO the __SYNC_MEM_RELAXED > > variants - if the CPU can reorder memory accesses across them at will, > > why shouldn't the compiler be able to do the same as well? > > Agreed, so we have a bug in all released versions of GCC. :( I wouldn't go that far. They *used* to be compiler barriers, but clearly something broke at some point without anyone noticing. We don't know how many versions are affected until we debug it. For all we know it broke in 4.5 and 4.4 is fine. There's no reference to a GCC bug report about this in the thread. Did the folks over at the libdispatch project never think to file one? Or does a bug report exist and my search skills are weak? r~
Re: should sync builtins be full optimization barriers?
On 09/15/2011 06:19 PM, Richard Henderson wrote: I wouldn't go that far. They *used* to be compiler barriers, but clearly something broke at some point without anyone noticing. We don't know how many versions are affected until we debug it. For all we know it broke in 4.5 and 4.4 is fine. 4.4 is not necessarily fine, it may also be that an unrelated 4.5 change exposed a latent bug. But indeed Richard Sandiford mentioned offlist that perhaps ALIAS_SET_MEMORY_BARRIER machinery broke. Fixing the bug in 4.5/4.6/4.7 will definitely shed more light. There's no reference to a GCC bug report about this in the thread. Did the folks over at the libdispatch project never think to file one? I asked them to attach a preprocessed testcase somewhere, but they haven't done so yet. :( Paolo
[google] Merged gcc-4_6-branch -> google/gcc-4_6
This merge adds the testsuite validation script to google/gcc-4_6. Merged up to rev 178854. Validated on x86_64. Diego.
Re: IRA misses register range overlap
On Thu, Sep 15, 2011 at 10:34 AM, Vladimir Makarov wrote: > On 09/15/2011 11:16 AM, Peter Bigot wrote: >> >> In the msp430 back end, hard registers 4 through 15 are HImode, with >> adjacent register sequences used for SImode and DImode. In preparation >> for >> a library call, I'm emitting RTL that assigns values directly to reg:SI 4. >> >> Despite that, in gcc 4.5.x IRA choses reg:HI 4 as the destination >> for a pseudo-register for a preceding assignment, and does nothing to >> preserve the value over the span where the register is part of an SI >> value. >> The subsequence: >> >> (insn 2 4 3 2 (set (reg/v:HI 38 [ x ]) >> (reg:HI 15 r15 [ x ])) test.c:28 21 {*movhi2} >> (expr_list:REG_DEAD (reg:HI 15 r15 [ x ]) >> (nil))) >> >> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) >> >> (note 6 3 10 2 NOTE_INSN_DELETED) >> >> (insn 10 6 11 2 (set (reg:SI 8 r8) >> (mem/c/i:SI (symbol_ref:HI ("seed") [flags 0x2]> 0x7f032064f960 seed>) [2 seed+0 S4 A16])) test.c:14 24 {*movsi2} >> (nil)) >> >> (insn 11 10 12 2 (set (reg:SI 4 r4) >> (const_int 33614 [0x834e])) test.c:14 24 {*movsi2} >> (nil)) >> >> with: >> >> insn=2, live_throughout: 1, dead_or_set: 15, 38 >> insn=10, live_throughout: 1, 38, dead_or_set: 8, 9 >> insn=11, live_throughout: 1, 8, 9, 38, dead_or_set: 4, 5 >> insn=12, live_throughout: 1, 38, dead_or_set: 4, 5, 6, 7, 8, 9, 10, >> 11, 12, 13, 14, 15 >> >> becomes: >> >> (insn 2 4 3 2 (set (reg/v:HI 4 r4 [orig:38 x ] [38]) >> (reg:HI 15 r15 [ x ])) test.c:28 21 {*movhi2} >> (nil)) >> >> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) >> >> (note 6 3 10 2 NOTE_INSN_DELETED) >> >> (insn 10 6 11 2 (set (reg:SI 8 r8) >> (mem/c/i:SI (symbol_ref:HI ("seed") [flags 0x2]> 0x7f032064f960 seed>) [2 seed+0 S4 A16])) test.c:14 24 {*movsi2} >> (nil)) >> >> (insn 11 10 12 2 (set (reg:SI 4 r4) >> (const_int 33614 [0x834e])) test.c:14 24 {*movsi2} >> (nil)) >> >> and the subsequent reference to reg:HI 4 (formerly reg/v:HI 38) has value >> 33614 instead of the user's parameter. >> >> Could somebody suggest where should I look to understand why this is >> happening and how should it be fixed? >> > The best way is to file a bug to http://gcc.gnu.org/bugzilla/. You should > submit the test (the smaller the test, the better) and how to reproduce it: > how to build gcc (configure options) and how to call the built gcc to > reproduce results. Unfortunately, the former msp430 maintainers never pushed the back-end upstream, so filing a bug on a target that isn't part of gcc is unlikely to get much attention. It's also pretty specific to the machine description, so I doubt it could be reproduced on another target. I was hoping for more of a "yes, that happens if you don't [missed back-end requirement here]", or even a "no, that shouldn't be happening". It looks almost like the fact that I'm generating RTL that references the hard registers directly is ignored by IRA for conflict resolution, which seems to only occur among the registers that it's responsible for assigning. I'll look again through the docs to see if there's some hints that I'm missing a step. If anybody else has further suggestions or insights I'd appreciate them. Thanks. Peter > I think you could look at ira dump file and check that allocno for p38 > conflicting with hard reg 4 *and* 5. If it is not, the problem is in > conflict calculation. Otherwise, it might be IRA hard register assignment > or in reload (the worst case). > > But having only info you provided it is very hard to say what is wrong. > >
Re: Are there plans to maintain the CFG througout the compilation?
Thanks Ian. Any idea what the size of the problem would be , perhaps first for backends that don't chose to vandalise things themselves 1st?
Re: IRA misses register range overlap
On 09/15/2011 03:06 PM, Peter Bigot wrote: On Thu, Sep 15, 2011 at 10:34 AM, Vladimir Makarov wrote: On 09/15/2011 11:16 AM, Peter Bigot wrote: In the msp430 back end, hard registers 4 through 15 are HImode, with adjacent register sequences used for SImode and DImode. In preparation for a library call, I'm emitting RTL that assigns values directly to reg:SI 4. Despite that, in gcc 4.5.x IRA choses reg:HI 4 as the destination for a pseudo-register for a preceding assignment, and does nothing to preserve the value over the span where the register is part of an SI value. The subsequence: (insn 2 4 3 2 (set (reg/v:HI 38 [ x ]) (reg:HI 15 r15 [ x ])) test.c:28 21 {*movhi2} (expr_list:REG_DEAD (reg:HI 15 r15 [ x ]) (nil))) (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) (note 6 3 10 2 NOTE_INSN_DELETED) (insn 10 6 11 2 (set (reg:SI 8 r8) (mem/c/i:SI (symbol_ref:HI ("seed") [flags 0x2]) [2 seed+0 S4 A16])) test.c:14 24 {*movsi2} (nil)) (insn 11 10 12 2 (set (reg:SI 4 r4) (const_int 33614 [0x834e])) test.c:14 24 {*movsi2} (nil)) with: insn=2, live_throughout: 1, dead_or_set: 15, 38 insn=10, live_throughout: 1, 38, dead_or_set: 8, 9 insn=11, live_throughout: 1, 8, 9, 38, dead_or_set: 4, 5 insn=12, live_throughout: 1, 38, dead_or_set: 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 becomes: (insn 2 4 3 2 (set (reg/v:HI 4 r4 [orig:38 x ] [38]) (reg:HI 15 r15 [ x ])) test.c:28 21 {*movhi2} (nil)) (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) (note 6 3 10 2 NOTE_INSN_DELETED) (insn 10 6 11 2 (set (reg:SI 8 r8) (mem/c/i:SI (symbol_ref:HI ("seed") [flags 0x2]) [2 seed+0 S4 A16])) test.c:14 24 {*movsi2} (nil)) (insn 11 10 12 2 (set (reg:SI 4 r4) (const_int 33614 [0x834e])) test.c:14 24 {*movsi2} (nil)) and the subsequent reference to reg:HI 4 (formerly reg/v:HI 38) has value 33614 instead of the user's parameter. Could somebody suggest where should I look to understand why this is happening and how should it be fixed? The best way is to file a bug to http://gcc.gnu.org/bugzilla/. You should submit the test (the smaller the test, the better) and how to reproduce it: how to build gcc (configure options) and how to call the built gcc to reproduce results. Unfortunately, the former msp430 maintainers never pushed the back-end upstream, so filing a bug on a target that isn't part of gcc is unlikely to get much attention. It's also pretty specific to the machine description, so I doubt it could be reproduced on another target. I was hoping for more of a "yes, that happens if you don't [missed back-end requirement here]", or even a "no, that shouldn't be happening". It should not be happening. It is a bug. It should be fixed in RA (IRA or reload). IRA/reload works for many targets where the same situation occurs. So it is hard to say what is wrong without more info. Although RA is directed by many machine-dependent macros and one macro might return a wrong value (e.g. number registers needed to hold value of a mode). But it is less probable.
Re: IRA misses register range overlap
On Thu, Sep 15, 2011 at 4:09 PM, Vladimir Makarov wrote: > On 09/15/2011 03:06 PM, Peter Bigot wrote: >> >> On Thu, Sep 15, 2011 at 10:34 AM, Vladimir Makarov >> wrote: >>> >>> On 09/15/2011 11:16 AM, Peter Bigot wrote: In the msp430 back end, hard registers 4 through 15 are HImode, with adjacent register sequences used for SImode and DImode. In preparation for a library call, I'm emitting RTL that assigns values directly to reg:SI 4. Despite that, in gcc 4.5.x IRA choses reg:HI 4 as the destination for a pseudo-register for a preceding assignment, and does nothing to preserve the value over the span where the register is part of an SI value. The subsequence: (insn 2 4 3 2 (set (reg/v:HI 38 [ x ]) (reg:HI 15 r15 [ x ])) test.c:28 21 {*movhi2} (expr_list:REG_DEAD (reg:HI 15 r15 [ x ]) (nil))) (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) (note 6 3 10 2 NOTE_INSN_DELETED) (insn 10 6 11 2 (set (reg:SI 8 r8) (mem/c/i:SI (symbol_ref:HI ("seed") [flags 0x2]>>> 0x7f032064f960 seed>) [2 seed+0 S4 A16])) test.c:14 24 {*movsi2} (nil)) (insn 11 10 12 2 (set (reg:SI 4 r4) (const_int 33614 [0x834e])) test.c:14 24 {*movsi2} (nil)) with: insn=2, live_throughout: 1, dead_or_set: 15, 38 insn=10, live_throughout: 1, 38, dead_or_set: 8, 9 insn=11, live_throughout: 1, 8, 9, 38, dead_or_set: 4, 5 insn=12, live_throughout: 1, 38, dead_or_set: 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 becomes: (insn 2 4 3 2 (set (reg/v:HI 4 r4 [orig:38 x ] [38]) (reg:HI 15 r15 [ x ])) test.c:28 21 {*movhi2} (nil)) (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) (note 6 3 10 2 NOTE_INSN_DELETED) (insn 10 6 11 2 (set (reg:SI 8 r8) (mem/c/i:SI (symbol_ref:HI ("seed") [flags 0x2]>>> 0x7f032064f960 seed>) [2 seed+0 S4 A16])) test.c:14 24 {*movsi2} (nil)) (insn 11 10 12 2 (set (reg:SI 4 r4) (const_int 33614 [0x834e])) test.c:14 24 {*movsi2} (nil)) and the subsequent reference to reg:HI 4 (formerly reg/v:HI 38) has value 33614 instead of the user's parameter. Could somebody suggest where should I look to understand why this is happening and how should it be fixed? >>> The best way is to file a bug to http://gcc.gnu.org/bugzilla/. You >>> should >>> submit the test (the smaller the test, the better) and how to reproduce >>> it: >>> how to build gcc (configure options) and how to call the built gcc to >>> reproduce results. >> >> Unfortunately, the former msp430 maintainers never pushed the back-end >> upstream, so filing a bug on a target that isn't part of gcc is >> unlikely to get much attention. It's also pretty specific to the >> machine description, so I doubt it could be reproduced on another >> target. >> >> I was hoping for more of a "yes, that happens if you don't [missed >> back-end requirement here]", or even a "no, that shouldn't be >> happening". >> > It should not be happening. It is a bug. It should be fixed in RA (IRA or > reload). IRA/reload works for many targets where the same situation occurs. > So it is hard to say what is wrong without more info. Based on what you've said I've provided source and the before/after IRA dump files in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50427. I'll continue to dig into this; suggestions welcome. Peter > Although RA is directed by many machine-dependent macros and one macro might > return a wrong value (e.g. number registers needed to hold value of a > mode). But it is less probable.
gcc-4.5-20110915 is now available
Snapshot gcc-4.5-20110915 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.5-20110915/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.5 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_5-branch revision 178897 You'll find: gcc-4.5-20110915.tar.bz2 Complete GCC MD5=92277bf6896948d5ede50ad1210aa9c8 SHA1=baf856ececfd00d192d330f1d1f56687f4a928cf Diffs from 4.5-20110908 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.5 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
RE: A case that PRE optimization hurts performance
Hi Richard, I slightly changed the case to be like below, int f(char *t) { int s=0; while (*t && s != 1) { switch (s) { case 0: /* path 1 */ s = 2; break; case 2: /* path 2 */ s = 3; /* changed */ break; default: /* path 3 */ if (*t == '-') s = 2; break; } t++; } return s; } "-O2" is still worse than "-O2 -fno-tree-pre". "-O2 -fno-tree-pre" result is f: pushl %ebp xorl%eax, %eax movl%esp, %ebp movl8(%ebp), %edx movzbl (%edx), %ecx jmp .L14 .p2align 4,,7 .p2align 3 .L5: movl$2, %eax .L7: addl$1, %edx cmpl$1, %eax movzbl (%edx), %ecx je .L3 .L14: testb %cl, %cl je .L3 testl %eax, %eax je .L5 cmpl$2, %eax .p2align 4,,5 je .L17 cmpb$45, %cl .p2align 4,,5 je .L5 addl$1, %edx cmpl$1, %eax movzbl (%edx), %ecx jne .L14 .p2align 4,,7 .p2align 3 .L3: popl%ebp .p2align 4,,2 ret .p2align 4,,7 .p2align 3 .L17: movb$3, %al .p2align 4,,3 jmp .L7 While "-O2" result is f: pushl %ebp xorl%eax, %eax movl%esp, %ebp movl8(%ebp), %edx pushl %ebx movzbl (%edx), %ecx jmp .L14 .p2align 4,,7 .p2align 3 .L5: movl$1, %ebx movl$2, %eax .L7: addl$1, %edx testb %bl, %bl movzbl (%edx), %ecx je .L3 .L14: testb %cl, %cl je .L3 testl %eax, %eax je .L5 cmpl$2, %eax .p2align 4,,5 je .L16 cmpb$45, %cl .p2align 4,,5 je .L5 cmpl$1, %eax setne %bl addl$1, %edx testb %bl, %bl movzbl (%edx), %ecx jne .L14 .p2align 4,,7 .p2align 3 .L3: popl%ebx popl%ebp ret .p2align 4,,7 .p2align 3 .L16: movl$1, %ebx movb$3, %al jmp .L7 You may notice that register ebx is introduced, and some more instructions around ebx are generated as well. i.e. setne %bl testb %bl, %bl I agree with you that in theory PRE does the right thing to minimize the computation cost on gimple level. However, the problem is the cost of converting comparison result to a bool value is not considered, so it actually makes binary code worse. For this case, as I summarized below, to complete the same functionality "With PRE" is worse than "Without PRE" for all three paths, * Without PRE, Path1: movl$2, %eax cmpl$1, %eax je .L3 Path2: movb$3, %al cmpl$1, %eax je .L3 Path3: cmpl$1, %eax jne .L14 * With PRE, Path1: movl$1, %ebx movl$2, %eax testb %bl, %bl je .L3 Path2: movl$1, %ebx movb$3, %al testb %bl, %bl je .L3 Path3: cmpl$1, %eax setne %bl testb %bl, %bl jne .L14 Do you have any more thoughts? Thanks, -Jiangning > -Original Message- > From: Richard Guenther [mailto:richard.guent...@gmail.com] > Sent: Tuesday, August 02, 2011 5:23 PM > To: Jiangning Liu > Cc: gcc@gcc.gnu.org > Subject: Re: A case that PRE optimization hurts performance > > On Tue, Aug 2, 2011 at 4:37 AM, Jiangning Liu > wrote: > > Hi, > > > > For the following simple test case, PRE optimization hoists > computation > > (s!=1) into the default branch of the switch statement, and finally > causes > > very poor code generation. This problem occurs in both X86 and ARM, > and I > > believe it is also a problem for other targets. > > > > int f(char *t) { > > int s=0; > > > > while (*t && s != 1) { > > switch (s) { > > case 0: > > s = 2; > > break; > > case 2: > > s = 1; > > break; > > default: > > if (*t == '-') > > s = 1; > > break; > > } > > t++; > > } > > > > return s; > > } > > > > Taking X86 as an example, with option "-O2" you may find 52 > instructions > > generated like below, > > > > : > > 0: 55 push %ebp > > 1: 31 c0 xor %eax,%eax > > 3: 89 e5 mov %esp,%ebp > > 5: 57 push %edi > > 6: 56 push %esi > > 7: 53 push %ebx > > 8: 8b 55 08 mov 0x8(%ebp),%edx