[Bug rtl-optimization/32283] Missed induction variable optimization
--- Comment #9 from ramana dot radhakrishnan at celunite dot com 2007-09-05 11:46 --- The above mentioned testcase works ok and generates auto-increments in Comment #8 . I'd still be interested in looking at why the volatile case cannot work. Adding Zdenek to the CC for this case. -- ramana dot radhakrishnan at celunite dot com changed: What|Removed |Added CC||rakdver at atrey dot karlin ||dot mff dot cuni dot cz http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32283
[Bug tree-optimization/33404] New: Predictive commoning + ivopts possibly introducing extra sign extensions.
Hi , There's a difference in the code generated between O2 and O3 in the case below. void fred(long in, short *out1) { int i; for (i=0;i<100;i++) out1[i+1] = out1[i]*in; } With O2 we generate at expand time - fred (in, out1) { unsigned int ivtmp.24; : ivtmp.24 = (unsigned int) out1; : MEM[index: ivtmp.24, offset: 2] = (short int) (in * (long int) MEM[index: ivtmp.24]); ivtmp.24 = ivtmp.24 + 2; if (ivtmp.24 != (unsigned int) (out1 + 200)) goto ; else goto ; : return; } With O3 we generate . fred (in, out1) { unsigned int ivtmp.23; short int D__lsm0.18; long int D.1212; : D__lsm0.18 = *out1; ivtmp.23 = 1; : D.1212 = (long int) D__lsm0.18 * in; D__lsm0.18 = (short int) D.1212; MEM[base: out1, index: ivtmp.23 * 2] = D__lsm0.18; ivtmp.23 = ivtmp.23 + 1; if (ivtmp.23 != 101) goto ; else goto ; : return; } -- Summary: Predictive commoning + ivopts possibly introducing extra sign extensions. Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: ramana dot radhakrishnan at celunite dot com GCC build triplet: i686-linux-gnu GCC host triplet: i686-linux-gnu GCC target triplet: arm-none-eabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33404
[Bug tree-optimization/33508] New: tree struct aliasing goes into a loop marking call clobbers.
l 11 kB ( 0%) ggc tree PRE : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.23 ( 0%) wall 192 kB ( 1%) ggc tree FRE : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.42 ( 0%) wall 113 kB ( 0%) ggc tree code sinking : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.18 ( 0%) wall 5 kB ( 0%) ggc tree forward propagate: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall 40 kB ( 0%) ggc tree conservative DCE : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree loop bounds : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 4 kB ( 0%) ggc tree iv optimization : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 38 kB ( 0%) ggc tree loop init: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 12 kB ( 0%) ggc tree SSA to normal: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.52 ( 0%) wall 31 kB ( 0%) ggc tree SSA verifier : 4.07 ( 0%) usr 0.27 ( 2%) sys 5.43 ( 0%) wall 99 kB ( 0%) ggc tree STMT verifier: 0.18 ( 0%) usr 0.00 ( 0%) sys 0.22 ( 0%) wall 0 kB ( 0%) ggc dominance computation : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc expand: 0.96 ( 0%) usr 0.06 ( 0%) sys 9.88 ( 0%) wall 25260 kB (66%) ggc forward prop : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 32 kB ( 0%) ggc CSE : 0.06 ( 0%) usr 0.01 ( 0%) sys 0.46 ( 0%) wall 18 kB ( 0%) ggc dead code elimination : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc dead store elim1 : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall 17 kB ( 0%) ggc dead store elim2 : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 24 kB ( 0%) ggc loop analysis : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 15 kB ( 0%) ggc CPROP 1 : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 15 kB ( 0%) ggc PRE : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall 6 kB ( 0%) ggc CPROP 2 : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 15 kB ( 0%) ggc bypass jumps : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 14 kB ( 0%) ggc auto inc dec : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 7 kB ( 0%) ggc CSE 2 : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall 8 kB ( 0%) ggc combiner : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.38 ( 0%) wall 37 kB ( 0%) ggc regmove : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 2 kB ( 0%) ggc scheduling: 0.04 ( 0%) usr 0.00 ( 0%) sys 0.24 ( 0%) wall 39 kB ( 0%) ggc local alloc : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 98 kB ( 0%) ggc global alloc : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.26 ( 0%) wall 35 kB ( 0%) ggc reload CSE regs : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall 52 kB ( 0%) ggc thread pro- & epilogue: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 34 kB ( 0%) ggc peephole 2: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc rename registers : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 0 kB ( 0%) ggc scheduling 2 : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall 7 kB ( 0%) ggc machine dep reorg : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 4 kB ( 0%) ggc final : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.74 ( 0%) wall 0 kB ( 0%) ggc symout: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.22 ( 0%) wall 0 kB ( 0%) ggc TOTAL :3617.3513.09 4205.33 38504 kB -- Summary: tree struct aliasing goes into a loop marking call clobbers. Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: ramana dot radhakrishnan at celunite dot com GCC build triplet: i686-linux-gnu GCC host triplet: i686-linux-gnu GCC target triplet: arm-none-eabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33508
[Bug tree-optimization/33508] tree struct aliasing goes into a loop marking call clobbers.
--- Comment #1 from ramana dot radhakrishnan at celunite dot com 2007-09-20 10:44 --- Created an attachment (id=14229) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14229&action=view) testcase. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33508
[Bug tree-optimization/33508] [4.3 Regression] tree struct aliasing goes into a loop marking call clobbers.
--- Comment #6 from ramana dot radhakrishnan at celunite dot com 2007-09-20 13:52 --- (In reply to comment #4) > Created an attachment (id=14230) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14230&action=view) [edit] > patch fixing the problem > > This fixes it. The idea is to keep track of which parent vars we need to add > all subvars to the call clobbered list in a bitmap and process them after the > first walk. > Yep it does - Thanks for the quick fix. I am testing it now and will let you know in a bit . (In reply to comment #5) > 4.2 doesn't have this extra loop. > -- ramana dot radhakrishnan at celunite dot com changed: What|Removed |Added Version|4.3.0 |unknown http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33508
[Bug rtl-optimization/34849] Missed autoincrement oppurtunities thanks to a different basic block structure.
--- Comment #3 from ramana dot radhakrishnan at celunite dot com 2008-01-18 14:37 --- Add CC -- ramana dot radhakrishnan at celunite dot com changed: What|Removed |Added CC||pranav dot bhandarkar at ||gmail dot com, dave at ||icerasemi dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34849
[Bug rtl-optimization/34849] Missed autoincrement oppurtunities thanks to a different basic block structure.
--- Comment #2 from ramana dot radhakrishnan at celunite dot com 2008-01-18 14:35 --- (In reply to comment #1) > Which optimization level? -O2 . > > Why does cross-jumping not optimize this? > Will check on cross-jumping and get back. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34849
[Bug rtl-optimization/34849] New: Missed autoincrement oppurtunities thanks to a different basic block structure.
Whilst investigating a missed optimization oppurtunity in comparison to gcc 3.4 I came across this case. void foo (int n, int in[], int res[]) { int i; for (i=0; i: if (n > 0) goto ; else goto ; : i = 0; ivtmp.19 = 0; : if (MEM[base: in, index: ivtmp.19] != 0) goto ; else goto ; : MEM[base: res, index: ivtmp.19] = 4660; goto ; : MEM[base: res, index: ivtmp.19] = 39030; : i = i + 1; ivtmp.19 = ivtmp.19 + 4; if (n > i) goto ; else goto ; : return; } If you notice ivtmp.19 can be used for post-increment based addressing modes. Note that GCC 3.4 did not have another basic block for the else case, the basic block for the else case got merged with the tail block of the loop and hence auto-inc could get generated in the else case and not in the if side of things. Can be reproduced with today's head of 4.3.0 -- Summary: Missed autoincrement oppurtunities thanks to a different basic block structure. Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: ramana dot radhakrishnan at celunite dot com GCC build triplet: i686-linux-gnu GCC host triplet: i686-linux-gnu GCC target triplet: arm-none-eabi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34849
[Bug c++/32716] [4.2/4.3 Regression] Wrong code generation. Alias and C++ virtual bases problem.
--- Comment #4 from ramana dot radhakrishnan at celunite dot com 2007-07-10 15:14 --- (In reply to comment #3) > Fixed with "take3.diff". > Did you forget to attach take3.diff ? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32716
[Bug tree-optimization/32721] New: CCP removes volatile qualifiers.
With today's trunk on a private port . consider the following testcase. volatile int spinlock[2]; void main (void) { volatile int * spinlock0; volatile int * spinlock1; spinlock0 = &spinlock[0]; spinlock1 = &spinlock[1]; *spinlock0 = 0; *spinlock1 = 0; while (*spinlock0); } CCP folds this into the following form Simulating block 4 Simulating block 3 Substituing values and folding statements Folded statement: *spinlock0_1 = 0; into: spinlock[0] = 0; Folded statement: *spinlock1_2 = 0; into: spinlock[1] = 0; Folded statement: D.1498_3 = *spinlock0_1; into: D.1498_3 = spinlock[0]; main () { volatile int * spinlock1; volatile int * spinlock0; int D.1498; : spinlock0_1 = &spinlock[0]; spinlock1_2 = &spinlock[1]; spinlock[0] = 0; spinlock[1] = 0; : D.1498_3 = spinlock[0]; ---> This folding seems to be wrong. if (D.1498_3 != 0) goto ; else goto ; : return; } -- Summary: CCP removes volatile qualifiers. Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: ramana dot radhakrishnan at celunite dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32721
[Bug tree-optimization/32721] CCP removes volatile qualifiers.
--- Comment #3 from ramana dot radhakrishnan at celunite dot com 2007-07-10 20:14 --- (In reply to comment #2) > As the decl is volatile as well this is clearly a bogus optimization. > Putting a breakpoint on evaluate_stmt in tree-ssa-ccp.c shows that stmt_ann of the stmt does not have has_volatile_ops set to true. (gdb) p stmt (gdb) pt sizes-gimplified unsigned SI size unit size align 32 symtab 0 alias set -1 canonical type 0xb7d44bd0> visited var def_stmt version 1> arg 1 constant invariant arg 0 arg 0 arg 1 >> try.c:5> (gdb) (gdb) p *(stmt->base->ann) $17 = {common = {type = STMT_ANN, aux = 0x0, value_handle = 0x0}, vdecl = { common = {type = STMT_ANN, aux = 0x0, value_handle = 0x0}, out_of_ssa_tag = 0, base_var_processed = 0, used = 0, need_phi_state = NEED_PHI_STATE_NO, in_vuse_list = 1, in_vdef_list = 0, is_heapvar = 0, call_clobbered = 1, noalias_state = NO_ALIAS_GLOBAL, mpt = 0xb7cb6804, symbol_mem_tag = 0x0, partition = 0, base_index = 0, current_def = 0x0, subvars = 0x0, escape_mask = 3084210192}, fdecl = { common = {type = STMT_ANN, aux = 0x0, value_handle = 0x0}, reference_vars_info = 0xb7eed528}, stmt = {common = {type = STMT_ANN, aux = 0x0, value_handle = 0x0}, bb = 0xb7eed528, operands = { def_ops = 0xb7cb6804, use_ops = 0x0, vdef_ops = 0x0, vuse_ops = 0x0, stores = 0x0, loads = 0x0}, addresses_taken = 0xb7d55010, uid = 0, references_memory = 0, modified = 0, has_volatile_ops = 0, makes_clobbering_call = 0}} Shouldn't has_volatile_ops get set to true in this case because the stmt essentially has one volatile operand here ? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32721
[Bug rtl-optimization/32283] Missed induction variable optimization
--- Comment #14 from ramana dot radhakrishnan at celunite dot com 2007-11-27 11:00 --- (In reply to comment #13) > This patch sometimes confuses loop2_doloop. On ia64 this prevents use of > countable loop branch machine idiom (br.cloop). On the example used in this > thread loop2_doloop complains: > > Loop 1 is simple: > simple exit 5 -> 6 > infinite if: (expr_list:REG_DEP_TRUE (subreg:SI (and:DI (plus:DI (minus:DI > (reg:DI 391) > (reg:DI 370 [ ivtmp.16 ])) > (const:DI (plus:DI (symbol_ref:DI ("a") [flags 0x2] 0x2abd7000 a>) > (const_int -2 [0xfffe] > (const_int 1 [0x1])) 0) > (nil)) > number of iterations: (lshiftrt:DI (plus:DI (minus:DI (reg:DI 392) > (reg:DI 370 [ ivtmp.16 ])) > (const_int -2 [0xfffe])) > (const_int 1 [0x1])) > upper bound: -1 > Doloop: Possible infinite iteration case. > Doloop: The loop is not suitable. > > The "infinite if" condition is: > ((r391 - r370) + ('a' - 2)) & 1 == 1 > where r370 is &(a[i]) and r391 is len*sizeof(a[0]), so that r391+'a' is > &a[len]. Of course, such "infinite if" condition is always false, but > loop2_doloop does not see that. > This is pretty much the case that causes things to go worse even on the private port I work on after this patch. Debugging this on ia64 or my port shows the same point at which this detection fails, though mine would fail for the SI case rather than the DI case. The infinite if case is detected in loop_iv.c : iv_number_of_iterations when it can't simplify the above mentioned expression. I looked through tree-ssa-ivopts but I can't see how this can get fixed there unless we change it in loop_iv.c . I wonder if we could use DF info to recursively figure out the reaching definition at insn of r391 and r370 and substitute the RHS in this to simplify this further. However whether that effort would be worthwhile depends on the number of such cases that we detect in any useful benchmark. My guess is that since this is a pretty normal construct we'd find it in quite a number of loops that are rather self-respecting. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32283