[Bug tree-optimization/40057] New: Incorrect right shift by 31 with long long
The following code compiled with GCC 4.4 and -O1 produces a wrong result for the SHIFT and AND operation. Bit 31 of the variable 'var' in fucntion shiftTest computes to '1' instead of a '0'. Compiling with -O0 however, produces the right result. #include "stdio.h" typedef unsigned long long ulonglong; int shiftTest (const ulonglong var) { ulonglong predicate = (var >> 31ULL) & 1ULL; if (predicate == 0ULL) { return 0; } return -1; } int main (void) { ulonglong var = 0x1682a9aaaULL; printf ("Bit 31 of 0x%llx is %llu\n", var, (var >> 31ULL) & 1ULL); int result = shiftTest (var); if (result == 0) { printf ("Bit 31 is 0 - Correct!\n"); } else { printf ("Bit 31 is 1 - Incorrect!\n"); } return 0; } -- Summary: Incorrect right shift by 31 with long long Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rahul at icerasemi dot com GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40057
[Bug tree-optimization/40057] Incorrect right shift by 31 with long long
--- Comment #1 from rahul at icerasemi dot com 2009-05-07 11:11 --- Suspect tree-ter optimisation pass. Compiling with -O1 -fno-tree-ter produces the right result. Using -fdump-tree-optimized shows SSA-Gimple to change from shiftTest (const ulonglong var) { int D.1842; : if (var >> 31 & 1 == 0) goto ; else goto ; : D.1842 = -1; goto ; : D.1842 = 0; : return D.1842; } to shiftTest (const ulonglong var) { ulonglong predicate; int D.1842; const ulonglong D.1839; : D.1839 = var >> 31; predicate = D.1839 & 1; if (predicate == 0) goto ; else goto ; : D.1842 = -1; goto ; : D.1842 = 0; : return D.1842; } Does the complex expression "var >> 31 & 1 == 0" cause problems during RTL expansion phase? Are the precedences of the SHIFT and AND operations maintained by the expression replacement phase? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40057
[Bug middle-end/40057] Incorrect right shift by 31 with long long
--- Comment #11 from rahul at icerasemi dot com 2009-05-07 15:57 --- Confirmed issue resolved. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40057
[Bug middle-end/30905] [4.3 Regression] Fails to cross-jump
--- Comment #15 from rahul at icerasemi dot com 2009-06-11 17:38 --- GCC4.4 is still missing this fix. GCC-4.4.1 (20090507) on x86_64 produces the following with O2/O3 kernel: pushl %ebp movl%esp, %ebp subl$24, %esp movl$1, (%esp) callgen_int testl %eax, %eax je .L2 movla, %edx movl%edx, %ecx andl$3, %ecx leal(%ecx,%edx), %edx movl%edx, a movlb, %edx movl%edx, %ecx orl $3, %ecx leal(%ecx,%edx), %edx movl%edx, b .L7: movla+4, %eax movl%eax, %edx andl$3, %edx leal(%edx,%eax), %eax movl%eax, a+4 movlb+4, %eax movl%eax, %edx orl $3, %edx leal(%edx,%eax), %eax movl%eax, b+4 leave ret .p2align 4,,7 .p2align 3 .L2: movla, %eax movl%eax, %edx andl$3, %edx leal(%edx,%eax), %eax movl%eax, a movlb, %eax movl%eax, %edx orl $3, %edx leal(%edx,%eax), %eax movl%eax, b jmp .L7 Any reason why this shouldn't go into 4.4? -- rahul at icerasemi dot com changed: What|Removed |Added CC||rahul at icerasemi dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30905
[Bug tree-optimization/41026] New: invariant address load inside loop
gcc --version gcc (GCC) 4.4.1 20090507 (prerelease) The following test compiled with gcc -S -Os struct struct_t { int* data; }; void testAddr (struct struct_t* sp, int len) { int i; for (i = 0; i < len; i++) { sp->data[i] = 0; } } generates the following code for x86 testAddr : pushl %ebp xorl%eax, %eax movl%esp, %ebp movl8(%ebp), %ecx pushl %ebx movl12(%ebp), %edx jmp .L2 .L3: movl(%ecx), %ebx <-- invariant address load movl$0, (%ebx,%eax,4) incl%eax .L2: cmpl%edx, %eax jl .L3 popl%ebx popl%ebp ret Whereas making the intent explicit like so void testAddr (struct struct_t* sp, int len) { int i; int *p = sp->data; for (i = 0; i < len; i++) { p[i] = 0; } } generates testAddr : pushl %ebp movl%esp, %ebp movl8(%ebp), %eax movl12(%ebp), %ecx movl(%eax), %edx <-- now outside the loop xorl%eax, %eax jmp .L2 .L3: movl$0, (%edx,%eax,4) incl%eax .L2: cmpl%ecx, %eax jl .L3 popl%ebp ret Why can't we move the address load outside the loop in the first case? -- Summary: invariant address load inside loop Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rahul at icerasemi dot com GCC build triplet: i686-pc-linux GCC host triplet: i686-pc-linux GCC target triplet: i686-pc-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41026
[Bug tree-optimization/41026] invariant address load inside loop with -Os.
--- Comment #4 from rahul at icerasemi dot com 2009-08-13 15:46 --- Confirmed. Introducing loop header copy for Os, resolves the problem. On our port, this not only helps move the invariant load outside the loop, but also correctly uses an auto-increment address mode via the AutoInc patches we use. Other examples also confirm that the header copying enables more induction variables to be identified and hence post-increment opportunities. Does better loop analysis and hence potential for further optimizations outweigh the cost of copying the loop header? It would be ideal to relax the loop header copy predicate for Os and select an appropriate threshold, currently set at 20 insn, a lower value to start with perhaps. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41026
[Bug rtl-optimization/20070] If-conversion can't match equivalent code, and cross-jumping only works for literal matches
--- Comment #29 from rahul at icerasemi dot com 2009-09-04 14:51 --- I am testing Steven's Crossjumping patch attached here. With CoreMark we see a 1% increase in performance when using Os. Other proprietary tests show ~0.5% decrease in code size. The path however does not fix PR30905. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20070
[Bug tree-optimization/41026] invariant address load inside loop with -Os.
--- Comment #6 from rahul at icerasemi dot com 2009-09-11 10:03 --- An interesting regression results as a side effect of loop header copying (this occurs even in vanilla O2). If I modify my original test case to struct struct_t { int* data; }; void testAddr (struct struct_t* sp, int len) { short i; for (i = 0; i < len; i++) { sp->data[len-i-1] = 0; } } The index is now a short, and I have purposefully added an int to form the final induction variable. With gcc -S -O2 -fdump-tree-all, I get the following SSA short int i; int * D.1220; long unsigned int D.1219; long unsigned int D.1218; long unsigned int D.1217; int D.1216; int D.1215; int * D.1214; : goto ; : D.1214_6 = sp_5(D)->data; D.1215_7 = (int) i_1; D.1216_8 = len_4(D) - D.1215_7; D.1217_9 = (long unsigned int) D.1216_8; D.1218_10 = D.1217_9 + -1; D.1219_11 = D.1218_10 * 4; D.1220_12 = D.1214_6 + D.1219_11; *D.1220_12 ={v} 0; i_13 = i_1 + 1; : # i_1 = PHI <0(2), i_13(3)> D.1215_3 = (int) i_1; if (D.1215_3 < len_4(D)) goto ; else goto ; : return; The following copy propagation and/or FRE passes identify D.1215_7 as a copy of D.1215_3 and we get : D.1214_6 = sp_5(D)->data; D.1216_8 = len_4(D) - D.1215_3; D.1217_9 = (long unsigned int) D.1216_8; D.1218_10 = D.1217_9 + -1; D.1219_11 = D.1218_10 * 4; D.1220_12 = D.1214_6 + D.1219_11; *D.1220_12 = 0; i_13 = i_1 + 1; Loop header copying introduces a PHI for D.1215 : D.1215_19 = 0; if (D.1215_19 < len_4(D)) goto ; else goto ; : # i_20 = PHI # D.1215_21 = PHI D.1214_6 = sp_5(D)->data; D.1216_8 = len_4(D) - D.1215_21; D.1217_9 = (long unsigned int) D.1216_8; D.1218_10 = D.1217_9 + -1; D.1219_11 = D.1218_10 * 4; D.1220_12 = D.1214_6 + D.1219_11; *D.1220_12 = 0; i_13 = i_20 + 1; D.1215_3 = (int) i_13; if (D.1215_3 < len_4(D)) goto ; else goto ; This causes IVOpts below, and all subsequent optimisations to fall over. : D.1214_6 = sp_5(D)->data; D.1238_7 = (unsigned int) len_4(D); D.1239_1 = D.1238_7 + 0x0; __builtin_loop_start (1, D.1239_1); D.1241_24 = (unsigned int) len_4(D); : # D.1215_21 = PHI <0(3), D.1215_3(5)> # ivtmp.13_14 = PHI <0(3), ivtmp.13_18(5)> __builtin_loop_iteration (1); D.1216_8 = len_4(D) - D.1215_21; D.1217_9 = (long unsigned int) D.1216_8; D.1218_10 = D.1217_9 + -1; D.1219_11 = D.1218_10 * 4; D.1220_12 = D.1214_6 + D.1219_11; *D.1220_12 = 0; D.1240_19 = ivtmp.13_14 + 1; D.1215_23 = (int) D.1240_19; D.1215_3 = D.1215_23; ivtmp.13_18 = ivtmp.13_14 + 1; if (ivtmp.13_18 != D.1241_24) goto ; else goto ; On this test using -fno-tree-copy-prop -fno-tree-pre results in better optimizations, implying either copy propagating (across blocks) / FREing potential induction variables is undesirable. Or a less ideal solution is disable loop header copying when dealing with type promoted loop indices. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41026
[Bug tree-optimization/23821] [4.3/4.4/4.5 Regression] DOM and VRP creating harder to optimize code
--- Comment #25 from rahul at icerasemi dot com 2009-09-25 14:26 --- Do the fixes in comment #11 and #24 alone solve the missed induction variable problem? I'm using the 4.4.1 release branch and it doesn't seem to work for me. After DOM i get # i_10 = PHI i_5 = i_10 + 1; and PHI propagation turns this into i_5 = x_4 + 1; -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23821
[Bug tree-optimization/23821] [4.3/4.4/4.5 Regression] DOM and VRP creating harder to optimize code
--- Comment #28 from rahul at icerasemi dot com 2009-09-25 17:10 --- Sorry, I also had changes to move loop header copying before FRE from http://gcc.gnu.org/ml/gcc/2009-09/msg00434.html. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23821
[Bug tree-optimization/41488] New: IVOpts cannot coalesce multiple induction variables
Using GCC 4.4.1 release version and compiling the following test with gcc -O2 -fdump-tree-all struct struct_t { int* data; }; void testAutoIncStruct (struct struct_t* sp, int start, int end) { int i; for (i = 0; i+start < end; i++) { sp->data[i+start] = 0; } } IVOpts dump shows induction variables (start and ivtmp.32) cannot be coalesced testAutoIncStruct (struct struct_t * sp, int start, int end) { unsigned int D.1283; unsigned int D.1284; int D.1282; unsigned int ivtmp.32; int * pretmp.17; int i; int * D.1245; unsigned int D.1244; unsigned int D.1243; : if (start_3(D) < end_5(D)) goto ; else goto ; : pretmp.17_22 = sp_6(D)->data; D.1282_23 = start_3(D) + 1; ivtmp.32_25 = (unsigned int) D.1282_23; D.1283_27 = (unsigned int) end_5(D); D.1284_28 = D.1283_27 + 1; : # start_20 = PHI # ivtmp.32_7 = PHI D.1243_9 = (unsigned int) start_20; D.1244_10 = D.1243_9 * 4; D.1245_11 = pretmp.17_22 + D.1244_10; *D.1245_11 = 0; start_26 = (int) ivtmp.32_7; start_4 = start_26; ivtmp.32_24 = ivtmp.32_7 + 1; if (ivtmp.32_24 != D.1284_28) goto ; else goto ; : goto ; : return; } The problem arises from expression "i + start" being identified as a common expression between the header and the latch. This seems to creates an extra induction variable and a PHI in the latch. If we disable tree FRE and tree copy propagation with gcc -O2 -fno-tree-fre -fno-tree-copy-prop We get : pretmp.17_23 = sp_6(D)->data; D.1287_27 = (unsigned int) end_5(D); D.1288_28 = (unsigned int) start_3(D); D.1289_29 = D.1287_27 - D.1288_28; D.1290_30 = (int) D.1289_29; : # i_20 = PHI D.1241_7 = pretmp.17_23; D.1284_26 = (unsigned int) start_3(D); D.1285_25 = (unsigned int) i_20; D.1286_24 = D.1284_26 + D.1285_25; MEM[base: pretmp.17_23, index: D.1286_24, step: 4] = 0; i_12 = i_20 + 1; if (i_12 != D.1290_30) goto ; else goto ; The induction variable and the memory reference is now correctly identified. -- Summary: IVOpts cannot coalesce multiple induction variables Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rahul at icerasemi dot com GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41488
[Bug tree-optimization/41488] IVOpts cannot coalesce multiple induction variables
--- Comment #1 from rahul at icerasemi dot com 2009-09-28 12:45 --- See http://gcc.gnu.org/ml/gcc/2009-09/msg00432.html for some followup. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41488
[Bug tree-optimization/41834] New: Missed "may be uninitialized warning" on array reference
Using GCC 4.4.1 and the command on the following test gcc -O2 -Wall -Wextra #include int foo (int b) { int a[10], c, i; for (i = 0; i < b; i++) { a[i] = b; c = b; } if (a[2] == 5 && c == 5) { printf("hello world\n"); } return 0; } testWarn.c: In function 'foo': testWarn.c:5: warning: 'c' may be used uninitialized in this function However, a warning for a[2] being possibly uninitialized is missing. If I understand right, this should be handled by late warning pass which just after DCE. Looking at post DCE dump foo (int b) { unsigned int D.1282; int i; int c; int a[10]; _Bool D.1243; _Bool D.1242; _Bool D.1241; int D.1240; : if (b_5(D) > 0) goto ; else goto ; : # i_21 = PHI <0(2), i_8(3)> D.1282_25 = (unsigned int) i_21; MEM[base: &a, index: D.1282_25, step: 4] = b_5(D); i_8 = i_21 + 1; if (i_8 != b_5(D)) goto ; else goto ; : # c_17 = PHI D.1240_9 = a[2]; D.1241_10 = D.1240_9 == 5; D.1242_11 = c_17 == 5; D.1243_12 = D.1242_11 & D.1241_10; if (D.1243_12 != 0) goto ; else goto ; : __builtin_puts (&"hello world"[0]); : return 0; } there is a path to bb 4, which does not initialize a. Why do we not generate a warning? Is it due a missing PHI for a? -- Summary: Missed "may be uninitialized warning" on array reference Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: minor Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rahul at icerasemi dot com GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41834
[Bug tree-optimization/47059] New: compiler fails to coalesce loads/stores
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47059 Summary: compiler fails to coalesce loads/stores Product: gcc Version: 4.5.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: ra...@icerasemi.com CC: sdkteam-...@icerasemi.com Host: i686-pc-linux-gnu Target: i686-pc-linux-gnu Build: i686-pc-linux-gnu Consider the following test case compiled with GCC4.5.1 (x86) and the following command: gcc -S -Os test.c struct struct1 { void *data; unsigned short f1; unsigned short f2; }; typedef struct struct1 S1; struct struct2 { int f3; S1 f4; }; typedef struct struct2 S2; extern void foo (S1 *ptr); extern S2 gstruct2_var; extern S1 gstruct1_var; static S1 bar (const S1 *ptr) __attribute__ ((always_inline)); static S1 bar (const S1 *ptr) { S1 ls_var = *ptr; foo (&ls_var); return ls_var; } int main () { S2 *ps_var; ps_var = &gstruct2_var; ps_var->f4 = bar (&gstruct1_var); return 0; } We get: main: leal4(%esp), %ecx andl$-16, %esp pushl -4(%ecx) pushl %ebp movl%esp, %ebp pushl %ecx subl$32, %esp movlgstruct1_var, %eax movlgstruct1_var+4, %edx movl%eax, -16(%ebp) leal-16(%ebp), %eax pushl %eax movl%edx, -12(%ebp) callfoo movl-16(%ebp), %eax movl-4(%ebp), %ecx movl%eax, gstruct2_var+4 movl-12(%ebp), %eax<-- load1 [ebp - 12] @ 4 bytes movw%ax, gstruct2_var+8<-- store1 [gstruct2_var + 8] @ 2 bytes movw-10(%ebp), %ax <-- load2 [ebp - 10] @ 2 bytes movw%ax, gstruct2_var+10 <-- store2 [gstruct2_var + 10] @ 2 bytes xorl%eax, %eax leave leal-4(%ecx), %esp ret .size main, .-main .ident "GCC: (GNU) 4.5.1" .section.note.GNU-stack,"",@progbits With GCC4.4.1 we get: main: leal4(%esp), %ecx andl$-16, %esp pushl -4(%ecx) pushl %ebp movl%esp, %ebp pushl %ecx subl$32, %esp movlgstruct1_var, %eax movlgstruct1_var+4, %edx movl%eax, -16(%ebp) leal-16(%ebp), %eax movl%edx, -12(%ebp) pushl %eax callfoo movl-12(%ebp), %eax <-- Load1 [ebp - 12] @ 4 bytes movl-4(%ebp), %ecx movl%eax, gstruct2_var+8 <-- Store1 [gstruct2_var + 8] @ 4 bytes movl-16(%ebp), %eax movl%eax, gstruct2_var+4 xorl%eax, %eax leave leal-4(%ecx), %esp ret .size main, .-main .ident "GCC: (GNU) 4.4.1" .section.note.GNU-stack,"",@progbits The extra load stores appear to be the result of change to SRA fully scalarizing structure members f1 and f2. With GCC4.4.1 the access to these fields is done using a BIT_FIELD_REF which combines the two loads and stores. Talking to MartinJ on IRC I was told the changes to SRA make aggressive scalarization of aggregates. In the past there was some functionality to try and combine appropriate components into BIT_FIELD_REFs so as to reduce the number of loads/stores. This has been removed from 4.5 in favour of simplicity of the Gimple IR and working towards generic MEM_REFs. The plan is to introduce new IR constructs to load/store individual bits and in a separate gimple pass decide how to combine them together. But, this will only be available in 4.7+. We also have the exact same issue on our port and causes a significant performance regression on our software.
[Bug tree-optimization/47059] compiler fails to coalesce loads/stores
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47059 --- Comment #1 from Rahul Kharche 2011-01-15 12:32:01 UTC --- Created attachment 22974 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22974 Patch Vs 4.5.2 Rev 167088
[Bug tree-optimization/47059] compiler fails to coalesce loads/stores
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47059 --- Comment #2 from Rahul Kharche 2011-01-15 12:43:27 UTC --- This issue also exists on the trunk. I am in the process of bootstrap testing this for i686-pc-linux-gnu. I will send out this patch once it checks out. The attached patch is Vs 4.5.2 Rev 167088.
[Bug rtl-optimization/43515] New: Basic block re-ordering unconditionally disabled for Os
Basic block re-ordering appears to be unconditionally disabled when optimizing for size, irrespective of whether -freorder-blocks was specified on command line. This is applicable to all versions 4.4.1 - 4.5. As suggested in the following discussion this is incorrect behaviour http://gcc.gnu.org/ml/gcc/2010-03/msg00365.html -- Summary: Basic block re-ordering unconditionally disabled for Os Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rahul at icerasemi dot com GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43515
[Bug rtl-optimization/43515] Basic block re-ordering unconditionally disabled for Os
--- Comment #3 from rahul at icerasemi dot com 2010-03-26 12:25 --- The following test in 'rest_of_handle_reorder_blocks' if ((flag_reorder_blocks || flag_reorder_blocks_and_partition) && optimize_function_for_speed_p (cfun)) { ... } suggests when optimize_size is true reordering would not run, even if I were to use -freorder-blocks as a command line option or a function attribute? I also just noticed PR41396 is related. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43515
[Bug tree-optimization/42614] New: FRE optimizes away valid code after IPA inlining
On the following test case compiled with GCC 4.4.1 release version and the following command line gcc -S -O2 -finline-functions-called-once -fdump-tree-all-details -fdump-ipa-all fail.c typedef struct SEntry { unsigned char num; } TEntry; typedef struct STable { TEntry data[2]; } TTable; TTable *init (); int fake_expect (int, int); void fake_assert (int); void expect_func (int a, unsigned char *b) __attribute__ ((noinline)); static inline void inlined_wrong (TEntry *entry_p, int flag); void inlined_wrong (TEntry *entry_p, int flag) { unsigned char index; entry_p->num = 0; if (!flag) fake_assert (0); for (index = 0; index < 1; index++) entry_p->num++; asm ("before"); if (entry_p->num) { fake_assert(0); asm ("#here"); } } void expect_func (int a, unsigned char *b) { if (fake_expect ((a == 0), 0)) fake_assert (0); if (fake_expect ((b == 0), 0)) fake_assert (0); } void broken () { unsigned char index = 0; TTable *table_p = init(); inlined_wrong (&(table_p->data[1]), 1); expect_func (0, &index); inlined_wrong ((TEntry *)0xf00f, 1); LocalFreeMemory (&table_p); } we get after FRE: broken () { unsigned char index; unsigned char D.1321; unsigned char D.1320; unsigned char index; unsigned char D.1316; unsigned char D.1315; struct TTable * table_p; unsigned char index; struct TEntry * D.1281; struct TTable * table_p.1; struct TTable * table_p.0; : index = 0; table_p.0_1 = init (); table_p = table_p.0_1; table_p.1_2 = table_p.0_1; D.1281_3 = &table_p.1_2->data[1]; table_p.1_2->data[1].num = 0; goto ; : D.1315_4 = D.1281_3->num; D.1316_5 = D.1315_4 + 1; D.1281_3->num = D.1316_5; index_7 = index_6 + 1; : # index_6 = PHI <0(2), index_7(3)> if (index_6 == 0) goto ; else goto ; : __asm__ __volatile__("before"); D.1315_8 = 0; expect_func (0, &index); 61455B->num ={v} 0; goto ; : D.1320_10 ={v} 61455B->num; D.1321_11 = D.1320_10 + 1; 61455B->num ={v} D.1321_11; index_13 = index_12 + 1; : # index_12 = PHI <0(5), index_13(6)> if (index_12 == 0) goto ; else goto ; : __asm__ __volatile__("before"); D.1320_14 ={v} 61455B->num; if (D.1320_14 != 0) goto ; else goto ; : fake_assert (0); __asm__ __volatile__("#here"); : LocalFreeMemory (&table_p); return; } Note the check "if (entry_p->num)" and associated block is completely eliminated. The dumps indicate: Replaced table_p with table_p.0_1 in table_p.1_2 = table_p; Replaced table_p.1_2->data[1].num with 0 in D.1315_8 = table_p.1_2->data[1].num; Removing basic block 6 ;; basic block 6, loop depth 0, count 0 ;; prev block 5, next block 7 ;; pred: 5 [39.0%] (true,exec) ;; succ: 7 [100.0%] (fallthru,exec) : fake_assert (0); __asm__ __volatile__("#here"); If the same code is compiled with the function "inlined_wrong" declared as static inline void inlined_wrong (TEntry *entry_p, int flag) __attribute__ ((always_inline)); The generated code is correct with the check in place, suggesting ipa-inline is troublesome while early inlining works okay? -- Summary: FRE optimizes away valid code after IPA inlining Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rahul at icerasemi dot com GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42614
[Bug tree-optimization/42620] New: FRE optimizes away valid code after IPA inlining
On the following test case compiled with GCC 4.4.1 release version and the following command line gcc -S -O2 -finline-functions-called-once -fdump-tree-all-details -fdump-ipa-all fail.c typedef struct SEntry { unsigned char num; } TEntry; typedef struct STable { TEntry data[2]; } TTable; TTable *init (); int fake_expect (int, int); void fake_assert (int); void expect_func (int a, unsigned char *b) __attribute__ ((noinline)); static inline void inlined_wrong (TEntry *entry_p, int flag); void inlined_wrong (TEntry *entry_p, int flag) { unsigned char index; entry_p->num = 0; if (!flag) fake_assert (0); for (index = 0; index < 1; index++) entry_p->num++; asm ("before"); if (entry_p->num) { fake_assert(0); asm ("#here"); } } void expect_func (int a, unsigned char *b) { if (fake_expect ((a == 0), 0)) fake_assert (0); if (fake_expect ((b == 0), 0)) fake_assert (0); } void broken () { unsigned char index = 0; TTable *table_p = init(); inlined_wrong (&(table_p->data[1]), 1); expect_func (0, &index); inlined_wrong ((TEntry *)0xf00f, 1); LocalFreeMemory (&table_p); } we get after FRE: broken () { unsigned char index; unsigned char D.1321; unsigned char D.1320; unsigned char index; unsigned char D.1316; unsigned char D.1315; struct TTable * table_p; unsigned char index; struct TEntry * D.1281; struct TTable * table_p.1; struct TTable * table_p.0; : index = 0; table_p.0_1 = init (); table_p = table_p.0_1; table_p.1_2 = table_p.0_1; D.1281_3 = &table_p.1_2->data[1]; table_p.1_2->data[1].num = 0; goto ; : D.1315_4 = D.1281_3->num; D.1316_5 = D.1315_4 + 1; D.1281_3->num = D.1316_5; index_7 = index_6 + 1; : # index_6 = PHI <0(2), index_7(3)> if (index_6 == 0) goto ; else goto ; : __asm__ __volatile__("before"); D.1315_8 = 0; expect_func (0, &index); 61455B->num ={v} 0; goto ; : D.1320_10 ={v} 61455B->num; D.1321_11 = D.1320_10 + 1; 61455B->num ={v} D.1321_11; index_13 = index_12 + 1; : # index_12 = PHI <0(5), index_13(6)> if (index_12 == 0) goto ; else goto ; : __asm__ __volatile__("before"); D.1320_14 ={v} 61455B->num; if (D.1320_14 != 0) goto ; else goto ; : fake_assert (0); __asm__ __volatile__("#here"); : LocalFreeMemory (&table_p); return; } Note the check "if (entry_p->num)" and associated block is completely eliminated. The dumps indicate: Replaced table_p with table_p.0_1 in table_p.1_2 = table_p; Replaced table_p.1_2->data[1].num with 0 in D.1315_8 = table_p.1_2->data[1].num; Removing basic block 6 ;; basic block 6, loop depth 0, count 0 ;; prev block 5, next block 7 ;; pred: 5 [39.0%] (true,exec) ;; succ: 7 [100.0%] (fallthru,exec) : fake_assert (0); __asm__ __volatile__("#here"); If the same code is compiled with the function "inlined_wrong" declared as static inline void inlined_wrong (TEntry *entry_p, int flag) __attribute__ ((always_inline)); The generated code is correct with the check in place, suggesting ipa-inline is troublesome while early inlining works okay? -- Summary: FRE optimizes away valid code after IPA inlining Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rahul at icerasemi dot com GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42620
[Bug tree-optimization/42614] FRE optimizes away valid code after IPA inlining
--- Comment #3 from rahul at icerasemi dot com 2010-01-05 11:30 --- *** Bug 42620 has been marked as a duplicate of this bug. *** -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42614
[Bug tree-optimization/42620] FRE optimizes away valid code after IPA inlining
--- Comment #1 from rahul at icerasemi dot com 2010-01-05 11:30 --- Accidentally added due to browser refresh. Bug is duplicate of PR42614. *** This bug has been marked as a duplicate of 42614 *** -- rahul at icerasemi dot com changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||DUPLICATE http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42620
[Bug rtl-optimization/20070] If-conversion can't match equivalent code, and cross-jumping only works for literal matches
--- Comment #32 from rahul at icerasemi dot com 2010-01-11 12:34 --- I will re-test on our port and report my findings, cheers! -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20070
[Bug tree-optimization/45195] New: incorrect "array subscript above bounds" warning
Using GCC 4.4.1 and the following command, test generates an "array subscript is above array bounds" warning. gcc -S -Os test.c -Wall void foo (int b[2][6]) { int i = 0; for (i = 0; i < 6; i++) { int *pb = &b[1][i]; *pb = 0; } } Output from VRP looks like foo (int[6] * b) { int i; unsigned int D.1240; unsigned int i.0; : goto ; : # i_16 = PHI i.0_6 = (unsigned int) i_16; D.1240_7 = i.0_6 + 6; (*b_4(D))[D.1240_7] = 0; <-- warning generated here i_10 = i_16 + 1; : # i_1 = PHI if (i_1 <= 5) goto ; else goto ; : return; : # i_14 = PHI <0(2)> goto ; } In the statement (*b_4(D))[D.1240_7] = 0, range of b_4 appears to be [0 5] while the range of index D.1240_7 is [6 11]. -- Summary: incorrect "array subscript above bounds" warning Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rahul at icerasemi dot com GCC build triplet: i686-pc-linux GCC host triplet: i686-pc-linux GCC target triplet: i686-pc-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45195
[Bug tree-optimization/45195] incorrect "array subscript above bounds" warning
--- Comment #2 from rahul at icerasemi dot com 2010-08-06 08:01 --- Confirmed, fix for PR41317 avoids forwarding ARRAY_REFs to their use and fixes this issue. Does this fix hinder any optimizations? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45195