[Bug tree-optimization/40057] New: Incorrect right shift by 31 with long long

2009-05-07 Thread rahul at icerasemi dot com
The following code compiled with GCC 4.4 and -O1 produces a wrong result for
the SHIFT and AND operation. Bit 31 of the variable 'var' in fucntion shiftTest
computes to '1' instead of a '0'.

Compiling with -O0 however, produces the right result.


#include "stdio.h"

typedef unsigned long long ulonglong;

int shiftTest (const ulonglong var)
{
  ulonglong predicate = (var >> 31ULL) & 1ULL;

  if (predicate == 0ULL)
{
  return 0;
}
  return -1;
}


int main (void)
{
  ulonglong var = 0x1682a9aaaULL;

  printf ("Bit 31 of 0x%llx is %llu\n", var, (var >> 31ULL) & 1ULL);

  int result = shiftTest (var);

  if (result == 0)
{
  printf ("Bit 31 is 0 - Correct!\n");
}
  else
{
  printf ("Bit 31 is 1 - Incorrect!\n");
}
  return 0;
}


-- 
   Summary: Incorrect right shift by 31 with long long
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: rahul at icerasemi dot com
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40057



[Bug tree-optimization/40057] Incorrect right shift by 31 with long long

2009-05-07 Thread rahul at icerasemi dot com


--- Comment #1 from rahul at icerasemi dot com  2009-05-07 11:11 ---
Suspect tree-ter optimisation pass. Compiling with -O1 -fno-tree-ter produces
the right result. Using -fdump-tree-optimized shows SSA-Gimple to change from

shiftTest (const ulonglong var)
{
  int D.1842;

:
  if (var >> 31 & 1 == 0)
goto ;
  else
goto ;

:
  D.1842 = -1;
  goto ;

:
  D.1842 = 0;

:
  return D.1842;

}

to

shiftTest (const ulonglong var)
{
  ulonglong predicate;
  int D.1842;
  const ulonglong D.1839;

:
  D.1839 = var >> 31;
  predicate = D.1839 & 1;
  if (predicate == 0)
goto ;
  else
goto ;

:
  D.1842 = -1;
  goto ;

:
  D.1842 = 0;

:
  return D.1842;

}

Does the complex expression "var >> 31 & 1 == 0" cause problems during RTL
expansion phase?
Are the precedences of the SHIFT and AND operations maintained by the
expression replacement phase?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40057



[Bug middle-end/40057] Incorrect right shift by 31 with long long

2009-05-07 Thread rahul at icerasemi dot com


--- Comment #11 from rahul at icerasemi dot com  2009-05-07 15:57 ---
Confirmed issue resolved.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40057



[Bug middle-end/30905] [4.3 Regression] Fails to cross-jump

2009-06-11 Thread rahul at icerasemi dot com


--- Comment #15 from rahul at icerasemi dot com  2009-06-11 17:38 ---
GCC4.4 is still missing this fix. GCC-4.4.1 (20090507) on x86_64 produces the
following with O2/O3

kernel:
pushl   %ebp
movl%esp, %ebp
subl$24, %esp
movl$1, (%esp)
callgen_int
testl   %eax, %eax
je  .L2
movla, %edx
movl%edx, %ecx
andl$3, %ecx
leal(%ecx,%edx), %edx
movl%edx, a
movlb, %edx
movl%edx, %ecx
orl $3, %ecx
leal(%ecx,%edx), %edx
movl%edx, b
.L7:
movla+4, %eax
movl%eax, %edx
andl$3, %edx
leal(%edx,%eax), %eax
movl%eax, a+4
movlb+4, %eax
movl%eax, %edx
orl $3, %edx
leal(%edx,%eax), %eax
movl%eax, b+4
leave
ret
.p2align 4,,7
.p2align 3
.L2:
movla, %eax
movl%eax, %edx
andl$3, %edx
leal(%edx,%eax), %eax
movl%eax, a
movlb, %eax
movl%eax, %edx
orl $3, %edx
leal(%edx,%eax), %eax
movl%eax, b
jmp .L7

Any reason why this shouldn't go into 4.4?


-- 

rahul at icerasemi dot com changed:

   What|Removed |Added

 CC||rahul at icerasemi dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30905



[Bug tree-optimization/41026] New: invariant address load inside loop

2009-08-10 Thread rahul at icerasemi dot com
gcc --version
gcc (GCC) 4.4.1 20090507 (prerelease)

The following test compiled with
gcc -S -Os

struct struct_t {
  int* data;
};

void testAddr (struct struct_t* sp, int len)
{
int i;
for (i = 0; i < len; i++)
  {
sp->data[i] = 0;
  }
}

generates the following code for x86

testAddr :
pushl   %ebp
xorl%eax, %eax
movl%esp, %ebp
movl8(%ebp), %ecx
pushl   %ebx
movl12(%ebp), %edx
jmp .L2
.L3:
movl(%ecx), %ebx  <-- invariant address load
movl$0, (%ebx,%eax,4)
incl%eax
.L2:
cmpl%edx, %eax
jl  .L3
popl%ebx
popl%ebp
ret

Whereas making the intent explicit like so

void testAddr (struct struct_t* sp, int len)
{
int i;
int *p = sp->data;
for (i = 0; i < len; i++)
  {
p[i] = 0;
  }
}

generates

testAddr :
pushl   %ebp
movl%esp, %ebp
movl8(%ebp), %eax
movl12(%ebp), %ecx
movl(%eax), %edx  <-- now outside the loop
xorl%eax, %eax
jmp .L2
.L3:
movl$0, (%edx,%eax,4)
incl%eax
.L2:
cmpl%ecx, %eax
jl  .L3
popl%ebp
ret

Why can't we move the address load outside the loop in the first case?


-- 
   Summary: invariant address load inside loop
   Product: gcc
   Version: 4.4.1
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: rahul at icerasemi dot com
 GCC build triplet: i686-pc-linux
  GCC host triplet: i686-pc-linux
GCC target triplet: i686-pc-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41026



[Bug tree-optimization/41026] invariant address load inside loop with -Os.

2009-08-13 Thread rahul at icerasemi dot com


--- Comment #4 from rahul at icerasemi dot com  2009-08-13 15:46 ---
Confirmed. Introducing loop header copy for Os, resolves the problem.
On our port, this not only helps move the invariant load outside the loop, but
also correctly uses an auto-increment address mode via the AutoInc patches we
use. Other examples also confirm that the header copying enables more induction
variables to be identified and hence post-increment opportunities.

Does better loop analysis and hence potential for further optimizations
outweigh the cost of copying the loop header? It would be ideal to relax the
loop header copy predicate for Os and select an appropriate threshold,
currently set at 20 insn, a lower value to start with perhaps.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41026



[Bug rtl-optimization/20070] If-conversion can't match equivalent code, and cross-jumping only works for literal matches

2009-09-04 Thread rahul at icerasemi dot com


--- Comment #29 from rahul at icerasemi dot com  2009-09-04 14:51 ---
I am testing Steven's Crossjumping patch attached here. With CoreMark we see a
1% increase in performance when using Os. Other proprietary tests show ~0.5%
decrease in code size.

The path however does not fix PR30905.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20070



[Bug tree-optimization/41026] invariant address load inside loop with -Os.

2009-09-11 Thread rahul at icerasemi dot com


--- Comment #6 from rahul at icerasemi dot com  2009-09-11 10:03 ---
An interesting regression results as a side effect of loop header copying (this
occurs even in vanilla O2). If I modify my original test case to

struct struct_t {
  int* data;
};

void testAddr (struct struct_t* sp, int len)
{
short i;
for (i = 0; i < len; i++)
  {
sp->data[len-i-1] = 0;
  }
}

The index is now a short, and I have purposefully added an int to form the
final induction variable.

With gcc -S -O2 -fdump-tree-all, I get the following SSA

  short int i;
  int * D.1220;
  long unsigned int D.1219;
  long unsigned int D.1218;
  long unsigned int D.1217;
  int D.1216;
  int D.1215;
  int * D.1214;

:
  goto ;

:
  D.1214_6 = sp_5(D)->data;
  D.1215_7 = (int) i_1;
  D.1216_8 = len_4(D) - D.1215_7;
  D.1217_9 = (long unsigned int) D.1216_8;
  D.1218_10 = D.1217_9 + -1;
  D.1219_11 = D.1218_10 * 4;
  D.1220_12 = D.1214_6 + D.1219_11;
  *D.1220_12 ={v} 0;
  i_13 = i_1 + 1;

:
  # i_1 = PHI <0(2), i_13(3)>
  D.1215_3 = (int) i_1;
  if (D.1215_3 < len_4(D))
goto ;
  else
goto ;

:
  return;

The following copy propagation and/or FRE passes identify D.1215_7 as a copy of
D.1215_3 and we get

:
  D.1214_6 = sp_5(D)->data;
  D.1216_8 = len_4(D) - D.1215_3;
  D.1217_9 = (long unsigned int) D.1216_8;
  D.1218_10 = D.1217_9 + -1;
  D.1219_11 = D.1218_10 * 4;
  D.1220_12 = D.1214_6 + D.1219_11;
  *D.1220_12 = 0;
  i_13 = i_1 + 1;

Loop header copying introduces a PHI for D.1215

:
  D.1215_19 = 0;
  if (D.1215_19 < len_4(D))
goto ;
  else
goto ;

:
  # i_20 = PHI 
  # D.1215_21 = PHI 
  D.1214_6 = sp_5(D)->data;
  D.1216_8 = len_4(D) - D.1215_21;
  D.1217_9 = (long unsigned int) D.1216_8;
  D.1218_10 = D.1217_9 + -1;
  D.1219_11 = D.1218_10 * 4;
  D.1220_12 = D.1214_6 + D.1219_11;
  *D.1220_12 = 0;
  i_13 = i_20 + 1;
  D.1215_3 = (int) i_13;
  if (D.1215_3 < len_4(D))
goto ;
  else
goto ;

This causes IVOpts below, and all subsequent optimisations to fall over.

:
  D.1214_6 = sp_5(D)->data;
  D.1238_7 = (unsigned int) len_4(D);
  D.1239_1 = D.1238_7 + 0x0;
  __builtin_loop_start (1, D.1239_1);
  D.1241_24 = (unsigned int) len_4(D);

:
  # D.1215_21 = PHI <0(3), D.1215_3(5)>
  # ivtmp.13_14 = PHI <0(3), ivtmp.13_18(5)>
  __builtin_loop_iteration (1);
  D.1216_8 = len_4(D) - D.1215_21;
  D.1217_9 = (long unsigned int) D.1216_8;
  D.1218_10 = D.1217_9 + -1;
  D.1219_11 = D.1218_10 * 4;
  D.1220_12 = D.1214_6 + D.1219_11;
  *D.1220_12 = 0;
  D.1240_19 = ivtmp.13_14 + 1;
  D.1215_23 = (int) D.1240_19;
  D.1215_3 = D.1215_23;
  ivtmp.13_18 = ivtmp.13_14 + 1;
  if (ivtmp.13_18 != D.1241_24)
goto ;
  else
goto ;

On this test using -fno-tree-copy-prop -fno-tree-pre results in better
optimizations, implying either copy propagating (across blocks) / FREing
potential induction variables is undesirable. Or a less ideal solution is
disable loop header copying when dealing with type promoted loop indices.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41026



[Bug tree-optimization/23821] [4.3/4.4/4.5 Regression] DOM and VRP creating harder to optimize code

2009-09-25 Thread rahul at icerasemi dot com


--- Comment #25 from rahul at icerasemi dot com  2009-09-25 14:26 ---
Do the fixes in comment #11 and #24 alone solve the missed induction variable
problem?

I'm using the 4.4.1 release branch and it doesn't seem to work for me.
After DOM i get

# i_10 = PHI 
i_5 = i_10 + 1;

and PHI propagation turns this into

i_5 = x_4 + 1;


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23821



[Bug tree-optimization/23821] [4.3/4.4/4.5 Regression] DOM and VRP creating harder to optimize code

2009-09-25 Thread rahul at icerasemi dot com


--- Comment #28 from rahul at icerasemi dot com  2009-09-25 17:10 ---
Sorry, I also had changes to move loop header copying before FRE from
http://gcc.gnu.org/ml/gcc/2009-09/msg00434.html.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23821



[Bug tree-optimization/41488] New: IVOpts cannot coalesce multiple induction variables

2009-09-28 Thread rahul at icerasemi dot com
Using GCC 4.4.1 release version and compiling the following test with

gcc -O2 -fdump-tree-all

struct struct_t {
  int* data;
};

void testAutoIncStruct (struct struct_t* sp, int start, int end) {
int i;
for (i = 0; i+start < end; i++)
  {
sp->data[i+start] = 0;
  }
}

IVOpts dump shows induction variables (start and ivtmp.32) cannot be coalesced

testAutoIncStruct (struct struct_t * sp, int start, int end) {
  unsigned int D.1283;
  unsigned int D.1284;
  int D.1282;
  unsigned int ivtmp.32;
  int * pretmp.17;
  int i;
  int * D.1245;
  unsigned int D.1244;
  unsigned int D.1243;

:
  if (start_3(D) < end_5(D))
goto ;
  else
goto ;

:
  pretmp.17_22 = sp_6(D)->data;
  D.1282_23 = start_3(D) + 1;
  ivtmp.32_25 = (unsigned int) D.1282_23;
  D.1283_27 = (unsigned int) end_5(D);
  D.1284_28 = D.1283_27 + 1;

:
  # start_20 = PHI 
  # ivtmp.32_7 = PHI 
  D.1243_9 = (unsigned int) start_20;
  D.1244_10 = D.1243_9 * 4;
  D.1245_11 = pretmp.17_22 + D.1244_10;
  *D.1245_11 = 0;
  start_26 = (int) ivtmp.32_7;
  start_4 = start_26;
  ivtmp.32_24 = ivtmp.32_7 + 1;
  if (ivtmp.32_24 != D.1284_28)
goto ;
  else
goto ;

:
  goto ;

:
  return;

}

The problem arises from expression "i + start" being identified as a common
expression between the header and the latch. This seems to creates an extra
induction variable and a PHI in the latch. If we disable tree FRE and tree copy
propagation with

gcc -O2 -fno-tree-fre -fno-tree-copy-prop

We get

:
  pretmp.17_23 = sp_6(D)->data;
  D.1287_27 = (unsigned int) end_5(D);
  D.1288_28 = (unsigned int) start_3(D);
  D.1289_29 = D.1287_27 - D.1288_28;
  D.1290_30 = (int) D.1289_29;

:
  # i_20 = PHI 
  D.1241_7 = pretmp.17_23;
  D.1284_26 = (unsigned int) start_3(D);
  D.1285_25 = (unsigned int) i_20;
  D.1286_24 = D.1284_26 + D.1285_25;
  MEM[base: pretmp.17_23, index: D.1286_24, step: 4] = 0;
  i_12 = i_20 + 1;
  if (i_12 != D.1290_30)
goto ;
  else
goto ;

The induction variable and the memory reference is now correctly identified.


-- 
   Summary: IVOpts cannot coalesce multiple induction variables
   Product: gcc
   Version: 4.4.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: rahul at icerasemi dot com
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41488



[Bug tree-optimization/41488] IVOpts cannot coalesce multiple induction variables

2009-09-28 Thread rahul at icerasemi dot com


--- Comment #1 from rahul at icerasemi dot com  2009-09-28 12:45 ---
See http://gcc.gnu.org/ml/gcc/2009-09/msg00432.html for some followup.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41488



[Bug tree-optimization/41834] New: Missed "may be uninitialized warning" on array reference

2009-10-26 Thread rahul at icerasemi dot com
Using GCC 4.4.1 and the command on the following test 

gcc -O2 -Wall -Wextra

#include 

int foo (int b)
{
int a[10], c, i; 

for (i = 0; i < b; i++)
{
a[i] = b;
c = b;
}

if (a[2] == 5 && c == 5)
{
printf("hello world\n");
}
return 0;
}

testWarn.c: In function 'foo': testWarn.c:5: warning: 'c' may be used
uninitialized in this function 

However, a warning for a[2] being possibly uninitialized is missing.

If I understand right, this should be handled by late warning pass which just
after DCE. Looking at post DCE dump

foo (int b)
{
  unsigned int D.1282;
  int i;
  int c;
  int a[10];
  _Bool D.1243;
  _Bool D.1242;
  _Bool D.1241;
  int D.1240;

:
  if (b_5(D) > 0)
goto ;
  else
goto ;

:
  # i_21 = PHI <0(2), i_8(3)>
  D.1282_25 = (unsigned int) i_21;
  MEM[base: &a, index: D.1282_25, step: 4] = b_5(D);
  i_8 = i_21 + 1;
  if (i_8 != b_5(D))
goto ;
  else
goto ;

:
  # c_17 = PHI 
  D.1240_9 = a[2];
  D.1241_10 = D.1240_9 == 5;
  D.1242_11 = c_17 == 5;
  D.1243_12 = D.1242_11 & D.1241_10;
  if (D.1243_12 != 0)
goto ;
  else
goto ;

:
  __builtin_puts (&"hello world"[0]);

:
  return 0;

}

there is a path to bb 4, which does not initialize a. Why do we not generate a
warning? Is it due a missing PHI for a?


-- 
   Summary: Missed "may be uninitialized warning" on array reference
   Product: gcc
   Version: 4.4.1
Status: UNCONFIRMED
  Severity: minor
  Priority: P3
 Component: tree-optimization
    AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: rahul at icerasemi dot com
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41834



[Bug tree-optimization/47059] New: compiler fails to coalesce loads/stores

2010-12-24 Thread rahul at icerasemi dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47059

   Summary: compiler fails to coalesce loads/stores
   Product: gcc
   Version: 4.5.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: ra...@icerasemi.com
CC: sdkteam-...@icerasemi.com
  Host: i686-pc-linux-gnu
Target: i686-pc-linux-gnu
 Build: i686-pc-linux-gnu


Consider the following test case compiled with GCC4.5.1 (x86) and the following
command:

gcc -S -Os test.c

struct struct1
{
  void *data;
  unsigned short f1;
  unsigned short f2;
};
typedef struct struct1 S1;

struct struct2
{
  int f3;
  S1 f4;
};
typedef struct struct2 S2;


extern void foo (S1 *ptr);
extern S2 gstruct2_var;
extern S1 gstruct1_var;

static S1 bar (const S1 *ptr) __attribute__ ((always_inline));

static S1
bar (const S1 *ptr)
{
  S1 ls_var = *ptr;
  foo (&ls_var);
  return ls_var;
}

int
main ()
{
  S2 *ps_var;

  ps_var = &gstruct2_var;
  ps_var->f4 = bar (&gstruct1_var);

  return 0;
}

We get:

main:
leal4(%esp), %ecx
andl$-16, %esp
pushl   -4(%ecx)
pushl   %ebp
movl%esp, %ebp
pushl   %ecx
subl$32, %esp
movlgstruct1_var, %eax
movlgstruct1_var+4, %edx
movl%eax, -16(%ebp)
leal-16(%ebp), %eax
pushl   %eax
movl%edx, -12(%ebp)
callfoo
movl-16(%ebp), %eax
movl-4(%ebp), %ecx
movl%eax, gstruct2_var+4
movl-12(%ebp), %eax<-- load1   [ebp - 12] @ 4 bytes
movw%ax, gstruct2_var+8<-- store1  [gstruct2_var + 8] @ 2 bytes
movw-10(%ebp), %ax <-- load2   [ebp - 10] @ 2 bytes
movw%ax, gstruct2_var+10   <-- store2  [gstruct2_var + 10] @ 2
bytes
xorl%eax, %eax
leave
leal-4(%ecx), %esp
ret
.size   main, .-main
.ident  "GCC: (GNU) 4.5.1"
.section.note.GNU-stack,"",@progbits


With GCC4.4.1 we get:

main:
leal4(%esp), %ecx
andl$-16, %esp
pushl   -4(%ecx)
pushl   %ebp
movl%esp, %ebp
pushl   %ecx
subl$32, %esp
movlgstruct1_var, %eax
movlgstruct1_var+4, %edx
movl%eax, -16(%ebp)
leal-16(%ebp), %eax
movl%edx, -12(%ebp)
pushl   %eax
callfoo
movl-12(%ebp), %eax   <-- Load1 [ebp - 12] @ 4 bytes
movl-4(%ebp), %ecx
movl%eax, gstruct2_var+8  <-- Store1 [gstruct2_var + 8] @ 4 bytes
movl-16(%ebp), %eax
movl%eax, gstruct2_var+4
xorl%eax, %eax
leave
leal-4(%ecx), %esp
ret
.size   main, .-main
.ident  "GCC: (GNU) 4.4.1"
.section.note.GNU-stack,"",@progbits


The extra load stores appear to be the result of change to SRA fully
scalarizing structure members f1 and f2. With GCC4.4.1 the access to these
fields is done using a BIT_FIELD_REF which combines the two loads and stores.

Talking to MartinJ on IRC I was told the changes to SRA make aggressive
scalarization of aggregates. In the past there was some functionality to try
and combine appropriate components into BIT_FIELD_REFs so as to reduce the
number of loads/stores. This has been removed from 4.5 in favour of simplicity
of the Gimple IR and working towards generic MEM_REFs. The plan is to introduce
new IR constructs to load/store individual bits and in a separate gimple pass
decide how to combine them together. But, this will only be available in 4.7+.

We also have the exact same issue on our port and causes a significant
performance regression on our software.


[Bug tree-optimization/47059] compiler fails to coalesce loads/stores

2011-01-15 Thread rahul at icerasemi dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47059

--- Comment #1 from Rahul Kharche  2011-01-15 
12:32:01 UTC ---
Created attachment 22974
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22974
Patch Vs 4.5.2 Rev 167088


[Bug tree-optimization/47059] compiler fails to coalesce loads/stores

2011-01-15 Thread rahul at icerasemi dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47059

--- Comment #2 from Rahul Kharche  2011-01-15 
12:43:27 UTC ---
This issue also exists on the trunk. I am in the process of bootstrap testing
this for i686-pc-linux-gnu. I will send out this patch once it checks out.
The attached patch is Vs 4.5.2 Rev 167088.


[Bug rtl-optimization/43515] New: Basic block re-ordering unconditionally disabled for Os

2010-03-25 Thread rahul at icerasemi dot com
Basic block re-ordering appears to be unconditionally disabled when optimizing
for size, irrespective of whether -freorder-blocks was specified on command
line. This is applicable to all versions 4.4.1 - 4.5.

As suggested in the following discussion this is incorrect behaviour
http://gcc.gnu.org/ml/gcc/2010-03/msg00365.html


-- 
   Summary: Basic block re-ordering unconditionally disabled for Os
   Product: gcc
   Version: 4.4.1
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: rahul at icerasemi dot com
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43515



[Bug rtl-optimization/43515] Basic block re-ordering unconditionally disabled for Os

2010-03-26 Thread rahul at icerasemi dot com


--- Comment #3 from rahul at icerasemi dot com  2010-03-26 12:25 ---
The following test in 'rest_of_handle_reorder_blocks'

if ((flag_reorder_blocks || flag_reorder_blocks_and_partition)
 && optimize_function_for_speed_p (cfun))
{ ... }

suggests when optimize_size is true reordering would not run, even if I were to
use -freorder-blocks as a command line option or a function attribute?

I also just noticed PR41396 is related.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43515



[Bug tree-optimization/42614] New: FRE optimizes away valid code after IPA inlining

2010-01-04 Thread rahul at icerasemi dot com
On the following test case compiled with GCC 4.4.1 release version and the
following command line

gcc -S -O2 -finline-functions-called-once -fdump-tree-all-details
-fdump-ipa-all fail.c

typedef struct SEntry
{
  unsigned char num;
} TEntry;

typedef struct STable
{
  TEntry data[2];
} TTable;


TTable *init ();
int fake_expect (int, int);
void fake_assert (int);

void
expect_func (int a, unsigned char *b) __attribute__ ((noinline));

static inline void
inlined_wrong (TEntry *entry_p, int flag);

void
inlined_wrong (TEntry *entry_p, int flag)
{
  unsigned char index;
  entry_p->num = 0;

  if (!flag)
  fake_assert (0);

  for (index = 0; index < 1; index++)
entry_p->num++;

  asm ("before");

  if (entry_p->num)
{
  fake_assert(0);
  asm ("#here");
}
}

void
expect_func (int a, unsigned char *b)
{
  if (fake_expect ((a == 0), 0))
fake_assert (0);
  if (fake_expect ((b == 0), 0))
fake_assert (0);
}

void
broken ()
{
  unsigned char index = 0;
  TTable *table_p = init();

  inlined_wrong (&(table_p->data[1]), 1);
  expect_func (0, &index);
  inlined_wrong ((TEntry *)0xf00f, 1);

  LocalFreeMemory (&table_p);
}


we get after FRE:

broken ()
{
  unsigned char index;
  unsigned char D.1321;
  unsigned char D.1320;
  unsigned char index;
  unsigned char D.1316;
  unsigned char D.1315;
  struct TTable * table_p;
  unsigned char index;
  struct TEntry * D.1281;
  struct TTable * table_p.1;
  struct TTable * table_p.0;

:
  index = 0;
  table_p.0_1 = init ();
  table_p = table_p.0_1;
  table_p.1_2 = table_p.0_1;
  D.1281_3 = &table_p.1_2->data[1];
  table_p.1_2->data[1].num = 0;
  goto ;

:
  D.1315_4 = D.1281_3->num;
  D.1316_5 = D.1315_4 + 1;
  D.1281_3->num = D.1316_5;
  index_7 = index_6 + 1;

:
  # index_6 = PHI <0(2), index_7(3)>
  if (index_6 == 0)
goto ;
  else
goto ;

:
  __asm__ __volatile__("before");
  D.1315_8 = 0;
  expect_func (0, &index);
  61455B->num ={v} 0;
  goto ;

:
  D.1320_10 ={v} 61455B->num;
  D.1321_11 = D.1320_10 + 1;
  61455B->num ={v} D.1321_11;
  index_13 = index_12 + 1;

:
  # index_12 = PHI <0(5), index_13(6)>
  if (index_12 == 0)
goto ;
  else
goto ;

:
  __asm__ __volatile__("before");
  D.1320_14 ={v} 61455B->num;
  if (D.1320_14 != 0)
goto ;
  else
goto ;

:
  fake_assert (0);
  __asm__ __volatile__("#here");

:
  LocalFreeMemory (&table_p);
  return;

}


Note the check "if (entry_p->num)" and associated block is completely
eliminated. The dumps indicate:

Replaced table_p with table_p.0_1 in table_p.1_2 = table_p;
Replaced table_p.1_2->data[1].num with 0 in D.1315_8 =
table_p.1_2->data[1].num;
Removing basic block 6
;; basic block 6, loop depth 0, count 0
;; prev block 5, next block 7
;; pred:   5 [39.0%]  (true,exec)
;; succ:   7 [100.0%]  (fallthru,exec)
:
fake_assert (0);
__asm__ __volatile__("#here");


If the same code is compiled with the function "inlined_wrong" declared as

static inline void
inlined_wrong (TEntry *entry_p, int flag) __attribute__ ((always_inline));

The generated code is correct with the check in place, suggesting ipa-inline is
troublesome while early inlining works okay?


-- 
   Summary: FRE optimizes away valid code after IPA inlining
   Product: gcc
   Version: 4.4.1
Status: UNCONFIRMED
          Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: rahul at icerasemi dot com
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42614



[Bug tree-optimization/42620] New: FRE optimizes away valid code after IPA inlining

2010-01-05 Thread rahul at icerasemi dot com
On the following test case compiled with GCC 4.4.1 release version and the
following command line

gcc -S -O2 -finline-functions-called-once -fdump-tree-all-details
-fdump-ipa-all fail.c

typedef struct SEntry
{
  unsigned char num;
} TEntry;

typedef struct STable
{
  TEntry data[2];
} TTable;


TTable *init ();
int fake_expect (int, int);
void fake_assert (int);

void
expect_func (int a, unsigned char *b) __attribute__ ((noinline));

static inline void
inlined_wrong (TEntry *entry_p, int flag);

void
inlined_wrong (TEntry *entry_p, int flag)
{
  unsigned char index;
  entry_p->num = 0;

  if (!flag)
  fake_assert (0);

  for (index = 0; index < 1; index++)
entry_p->num++;

  asm ("before");

  if (entry_p->num)
{
  fake_assert(0);
  asm ("#here");
}
}

void
expect_func (int a, unsigned char *b)
{
  if (fake_expect ((a == 0), 0))
fake_assert (0);
  if (fake_expect ((b == 0), 0))
fake_assert (0);
}

void
broken ()
{
  unsigned char index = 0;
  TTable *table_p = init();

  inlined_wrong (&(table_p->data[1]), 1);
  expect_func (0, &index);
  inlined_wrong ((TEntry *)0xf00f, 1);

  LocalFreeMemory (&table_p);
}


we get after FRE:

broken ()
{
  unsigned char index;
  unsigned char D.1321;
  unsigned char D.1320;
  unsigned char index;
  unsigned char D.1316;
  unsigned char D.1315;
  struct TTable * table_p;
  unsigned char index;
  struct TEntry * D.1281;
  struct TTable * table_p.1;
  struct TTable * table_p.0;

:
  index = 0;
  table_p.0_1 = init ();
  table_p = table_p.0_1;
  table_p.1_2 = table_p.0_1;
  D.1281_3 = &table_p.1_2->data[1];
  table_p.1_2->data[1].num = 0;
  goto ;

:
  D.1315_4 = D.1281_3->num;
  D.1316_5 = D.1315_4 + 1;
  D.1281_3->num = D.1316_5;
  index_7 = index_6 + 1;

:
  # index_6 = PHI <0(2), index_7(3)>
  if (index_6 == 0)
goto ;
  else
goto ;

:
  __asm__ __volatile__("before");
  D.1315_8 = 0;
  expect_func (0, &index);
  61455B->num ={v} 0;
  goto ;

:
  D.1320_10 ={v} 61455B->num;
  D.1321_11 = D.1320_10 + 1;
  61455B->num ={v} D.1321_11;
  index_13 = index_12 + 1;

:
  # index_12 = PHI <0(5), index_13(6)>
  if (index_12 == 0)
goto ;
  else
goto ;

:
  __asm__ __volatile__("before");
  D.1320_14 ={v} 61455B->num;
  if (D.1320_14 != 0)
goto ;
  else
goto ;

:
  fake_assert (0);
  __asm__ __volatile__("#here");

:
  LocalFreeMemory (&table_p);
  return;

}


Note the check "if (entry_p->num)" and associated block is completely
eliminated. The dumps indicate:

Replaced table_p with table_p.0_1 in table_p.1_2 = table_p;
Replaced table_p.1_2->data[1].num with 0 in D.1315_8 =
table_p.1_2->data[1].num;
Removing basic block 6
;; basic block 6, loop depth 0, count 0
;; prev block 5, next block 7
;; pred:   5 [39.0%]  (true,exec)
;; succ:   7 [100.0%]  (fallthru,exec)
:
fake_assert (0);
__asm__ __volatile__("#here");


If the same code is compiled with the function "inlined_wrong" declared as

static inline void
inlined_wrong (TEntry *entry_p, int flag) __attribute__ ((always_inline));

The generated code is correct with the check in place, suggesting ipa-inline is
troublesome while early inlining works okay?


-- 
   Summary: FRE optimizes away valid code after IPA inlining
   Product: gcc
   Version: 4.4.1
Status: UNCONFIRMED
          Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: rahul at icerasemi dot com
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42620



[Bug tree-optimization/42614] FRE optimizes away valid code after IPA inlining

2010-01-05 Thread rahul at icerasemi dot com


--- Comment #3 from rahul at icerasemi dot com  2010-01-05 11:30 ---
*** Bug 42620 has been marked as a duplicate of this bug. ***


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42614



[Bug tree-optimization/42620] FRE optimizes away valid code after IPA inlining

2010-01-05 Thread rahul at icerasemi dot com


--- Comment #1 from rahul at icerasemi dot com  2010-01-05 11:30 ---
Accidentally added due to browser refresh. Bug is duplicate of PR42614.

*** This bug has been marked as a duplicate of 42614 ***


-- 

rahul at icerasemi dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||DUPLICATE


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42620



[Bug rtl-optimization/20070] If-conversion can't match equivalent code, and cross-jumping only works for literal matches

2010-01-11 Thread rahul at icerasemi dot com


--- Comment #32 from rahul at icerasemi dot com  2010-01-11 12:34 ---
I will re-test on our port and report my findings, cheers!


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20070



[Bug tree-optimization/45195] New: incorrect "array subscript above bounds" warning

2010-08-05 Thread rahul at icerasemi dot com
Using GCC 4.4.1 and the following command, test generates an "array subscript
is above array bounds" warning.

gcc -S -Os test.c -Wall

void foo (int b[2][6])
{
  int i = 0;
  for (i = 0; i < 6; i++)
{
  int *pb = &b[1][i];
  *pb  = 0;
}
}


Output from VRP looks like

foo (int[6] * b)
{
  int i;
  unsigned int D.1240;
  unsigned int i.0;

:
  goto ;

:
  # i_16 = PHI 
  i.0_6 = (unsigned int) i_16;
  D.1240_7 = i.0_6 + 6;
  (*b_4(D))[D.1240_7] = 0;   <-- warning generated here
  i_10 = i_16 + 1;

:
  # i_1 = PHI 
  if (i_1 <= 5)
goto ;
  else
goto ;

:
  return;

:
  # i_14 = PHI <0(2)>
  goto ;

}

In the statement (*b_4(D))[D.1240_7] = 0, range of b_4 appears to be [0 5]
while the range of index D.1240_7 is [6 11].


-- 
   Summary: incorrect "array subscript above bounds" warning
   Product: gcc
   Version: 4.4.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: rahul at icerasemi dot com
 GCC build triplet: i686-pc-linux
  GCC host triplet: i686-pc-linux
GCC target triplet: i686-pc-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45195



[Bug tree-optimization/45195] incorrect "array subscript above bounds" warning

2010-08-06 Thread rahul at icerasemi dot com


--- Comment #2 from rahul at icerasemi dot com  2010-08-06 08:01 ---
Confirmed, fix for PR41317 avoids forwarding ARRAY_REFs to their use and fixes
this issue. Does this fix hinder any optimizations?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45195