[Bug tree-optimization/51049] New: A regression caused by "Improve handling of conditional-branches on targets with high branch costs"

2011-11-08 Thread liujiangning at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51049

 Bug #: 51049
   Summary: A regression caused by "Improve handling of
conditional-branches on targets with high branch
costs"
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: liujiangn...@gcc.gnu.org


int f(char *i, int j)
{
if (*i && j!=2)
return *i;
else
return j;
}

Before the check-in r180109, we have

  D.4710 = *i;
  D.4711 = D.4710 != 0;
  D.4712 = j != 2;
  D.4713 = D.4711 & D.4712;
  if (D.4713 != 0) goto ; else goto ;
  :
  D.4710 = *i;
  D.4716 = (int) D.4710;
  return D.4716;
  :
  D.4716 = j;
  return D.4716;

After check-in r180109, we have

  D.4711 = *i;
  if (D.4711 != 0) goto ; else goto ;
  :
  if (j != 2) goto ; else goto ;
  :
  D.4711 = *i;
  D.4714 = (int) D.4711;
  return D.4714;
  :
  D.4714 = j;
  return D.4714;

the code below in function fold_truth_andor makes difference,

  /* Transform (A AND-IF B) into (A AND B), or (A OR-IF B) into (A OR
B).
 For sequence point consistancy, we need to check for trapping,
 and side-effects.  */
  else if (code == icode && simple_operand_p_2 (arg0)
   && simple_operand_p_2 (arg1))
 return fold_build2_loc (loc, ncode, type, arg0, arg1);

for "*i != 0" simple_operand_p(*i) returns false. Originally this is not
checked by the code. 

Please refer to http://gcc.gnu.org/ml/gcc-patches/2011-10/msg02445.html for
discussion details.

This change accidently made some benchmarks significantly improved due to some
other reasons, but Michael gave the comments below.

==Michael's comment==

It's nice that it caused a benchmark to improve significantly, but that should
be done via a proper analysis and patch, not as a side effect of a supposed
non-change.

==End of Michael's comment==

The potential impact would be hurting other scenarios on performance.

The key point is for this small case I gave RHS doesn't have side effect at
all, so the optimization of changing it to AND doesn't violate C specification. 

==Kai's comment==

As for the case that left-hand side has side-effects but right-hand not, we
aren't allowed to do this AND/OR merge.  For example 'if ((f = foo ()) != 0 &&
f < 24)' we aren't allowed to make this transformation.

This shouldn't be that hard.  We need to provide to simple_operand_p_2 an
additional argument for checking trapping or not.

==End of Kai's comment==

This optimization change is blocking some other optimizations I am working on
in back-ends. For example, the problem I described at
http://gcc.gnu.org/ml/gcc/2011-09/msg00175.html disappeared. But it is not a
proper behavior.

Thanks,
-Jiangning


[Bug tree-optimization/51049] A regression caused by "Improve handling of conditional-branches on targets with high branch costs"

2011-11-08 Thread liujiangning at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51049

Jiangning Liu  changed:

   What|Removed |Added

   Keywords||missed-optimization
   Priority|P3  |P2
 CC||jiangning.liu at arm dot
   ||com, liujiangning at gcc
   ||dot gnu.org


[Bug tree-optimization/50569] [4.6/4.7 regression] unaligned memory accesses generated for memcpy

2012-06-12 Thread liujiangning at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50569

--- Comment #16 from Jiangning Liu  2012-06-12 
09:24:21 UTC ---
Author: liujiangning
Date: Tue Jun 12 09:24:11 2012
New Revision: 188431

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=188431
Log:
2011-06-12  Jiangning Liu

Backport r182252 from mainline
2011-12-12  Eric Botcazou  

PR tree-optimization/50569
* tree-sra.c (build_ref_for_model): Replicate a chain of
* COMPONENT_REFs
in the expression of MODEL instead of just the last one.

2011-06-12  Jiangning Liu

Backport r182252 from mainline
2011-12-12  Eric Botcazou  

PR tree-optimization/50569
* gcc.c-torture/execute/20111212-1.c: New test.


Added:
   
branches/ARM/embedded-4_6-branch/gcc/testsuite/gcc.c-torture/execute/20111212-1.c
Modified:
branches/ARM/embedded-4_6-branch/gcc/ChangeLog.arm
branches/ARM/embedded-4_6-branch/gcc/testsuite/ChangeLog.arm
branches/ARM/embedded-4_6-branch/gcc/tree-sra.c


[Bug tree-optimization/51070] [4.6/4.7 Regression] ICE verify_gimple failed

2012-06-12 Thread liujiangning at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51070

--- Comment #10 from Jiangning Liu  2012-06-12 
09:44:28 UTC ---
Author: liujiangning
Date: Tue Jun 12 09:44:24 2012
New Revision: 188432

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=188432
Log:
2011-06-12  Jiangning Liu

Backport r182839 from mainline
2012-01-03  Richard Guenther  

PR tree-optimization/51070
* tree-loop-distribution.c (generate_builtin): Do not replace
the loop with a builtin if the partition contains statements which
results are used outside of the loop.
(stmt_has_scalar_dependences_outside_loop): Properly handle calls.

2011-06-12  Jiangning Liu

Backport r182839 from mainline
2012-01-03  Richard Guenther  

PR tree-optimization/51070
* gcc.dg/torture/pr51070.c: New testcase.
* gcc.dg/torture/pr51070-2.c: Likewise.


Added:
branches/ARM/embedded-4_6-branch/gcc/testsuite/gcc.dg/torture/pr51070-2.c
branches/ARM/embedded-4_6-branch/gcc/testsuite/gcc.dg/torture/pr51070.c
Modified:
branches/ARM/embedded-4_6-branch/gcc/ChangeLog.arm
branches/ARM/embedded-4_6-branch/gcc/testsuite/ChangeLog.arm
branches/ARM/embedded-4_6-branch/gcc/tree-loop-distribution.c


[Bug tree-optimization/51042] [4.5 Regression] endless recursion in phi_translate

2012-06-12 Thread liujiangning at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51042

--- Comment #9 from Jiangning Liu  2012-06-12 
09:53:57 UTC ---
Author: liujiangning
Date: Tue Jun 12 09:53:53 2012
New Revision: 188433

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=188433
Log:
2011-06-12  Jiangning Liu

Backport r181256 from mainline
2011-11-10  Richard Guenther  

PR tree-optimization/51042
* tree-ssa-pre.c (phi_translate_1): Avoid recursing on
self-referential expressions.  Refactor code to avoid duplication.

2011-06-12  Jiangning Liu

Backport r181256 from mainline
2011-11-10  Richard Guenther  


Added:
branches/ARM/embedded-4_6-branch/gcc/testsuite/gcc.dg/torture/pr51042.c
Modified:
branches/ARM/embedded-4_6-branch/gcc/ChangeLog.arm
branches/ARM/embedded-4_6-branch/gcc/testsuite/ChangeLog.arm
branches/ARM/embedded-4_6-branch/gcc/tree-ssa-pre.c


[Bug testsuite/52563] FAIL: gcc.dg/tree-ssa/scev-[3,4].c scan-tree-dump-times optimized "&a" 1

2012-03-18 Thread liujiangning at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52563

Jiangning Liu  changed:

   What|Removed |Added

 CC||liujiangning at gcc dot
   ||gnu.org

--- Comment #4 from Jiangning Liu  2012-03-19 
04:10:36 UTC ---
Hi,

  /* In general,
 (TYPE) (BASE + STEP * i) = (TYPE) BASE + (TYPE -- sign extend) STEP * i,
 but we must check some assumptions.

 1) If [BASE, +, STEP] wraps, the equation is not valid when precision
of CT is smaller than the precision of TYPE.  For example, when we
cast unsigned char [254, +, 1] to unsigned, the values on left side
are 254, 255, 0, 1, ..., but those on the right side are
254, 255, 256, 257, ...
 2) In case that we must also preserve the fact that signed ivs do not
overflow, we must additionally check that the new iv does not wrap.
For example, unsigned char [125, +, 1] casted to signed char could
become a wrapping variable with values 125, 126, 127, -128, -127, ...,
which would confuse optimizers that assume that this does not
happen.  */
  must_check_src_overflow = TYPE_PRECISION (ct) < TYPE_PRECISION (type);

The code above in function convert_affine_scev set must_check_src_overflow to
true for 32-bit, while set false for 64-bit code.

This means 64-bit mode fails to unfold the address expression for array element
because of the case 1) as listed in comments above.

For the address of array element a[i], "&a + unitsize * i" has different
representation for 32-bit and 64-bit. For 32-bit, it is "(32-bit pointer) +
(32-bit integer)", while for 64-bit, it is "(64-bit pointer) + (32-bit
integer)".

If you try the case scev-5.c as below, you may find the case can pass.

/* { dg-do compile } */
/* { dg-options "-O2 -fdump-tree-optimized" } */

int *a_p;
int a[1000];

f(int k)
{
long long i;

for (i=k; i<1000; i+=k) {
a_p = &a[i];
*a_p = 100;
}
}

/* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */
/* { dg-final { cleanup-tree-dump "optimized" } } */

Any idea to fix this problem?

Thanks,
-Jiangning


[Bug testsuite/52563] FAIL: gcc.dg/tree-ssa/scev-[3,4].c scan-tree-dump-times optimized "&a" 1

2012-03-19 Thread liujiangning at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52563

--- Comment #6 from Jiangning Liu  2012-03-20 
02:32:12 UTC ---
> We cannot fix it without relaxing the POINTER_PLUS_EXPR constraints.
> I was working on that, but as usual the TYPE_IS_SIZETYPE removal
> has priority.

Do you mean you are also working on removing TYPE_IS_SIZETYPE?

> 
> Please consider fixing/XFAILing the testcases as they still FAIL and you
> are responsible for this.  You can open a new enhancement PR covering
> this.
> 

I think 64-bit mode should also have this optimization enabled. XFAIL implies
the missing of this optimization is a correct behavior. But I think this is not
what I expected. So I don't think we should add XFAIL for this case. Instead I
want to add a new test case scev-5.c to cover 64-bit testing.

Thanks,
-Jiangning


[Bug testsuite/52563] FAIL: gcc.dg/tree-ssa/scev-[3,4].c scan-tree-dump-times optimized "&a" 1

2012-03-22 Thread liujiangning at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52563

--- Comment #8 from Jiangning Liu  2012-03-22 
09:17:51 UTC ---
Author: liujiangning
Date: Thu Mar 22 09:17:45 2012
New Revision: 185678

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=185678
Log:
2012-03-22  Jiangning Liu

PR tree-optimization/52563
* gcc.dg/tree-ssa/scev-3.c: XFAIL on lp64.
* gcc.dg/tree-ssa/scev-4.c: XFAIL on lp64.
* gcc.dg/tree-ssa/scev-5.c: New.


Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c
Modified:
trunk/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c


[Bug tree-optimization/50272] A case that PRE optimization hurts performance

2012-03-29 Thread liujiangning at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50272

Jiangning Liu  changed:

   What|Removed |Added

 CC||liujiangning at gcc dot
   ||gnu.org

--- Comment #5 from Jiangning Liu  2012-03-30 
03:41:42 UTC ---
(In reply to comment #4)
> This is still a problem in version 4.7.0 20120225.

http://gcc.gnu.org/ml/gcc/2011-09/msg00342.html

Jeff already gave comments here. The proposed solution is path sensitive
optimization. It seems it's hard to solve this problem in short time.


[Bug rtl-optimization/38644] [4.4/4.5/4.6 Regression] Optimization flag -O1 -fschedule-insns2 causes wrong code

2011-11-16 Thread liujiangning at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644

--- Comment #61 from Jiangning Liu  2011-11-16 
09:47:01 UTC ---
Author: liujiangning
Date: Wed Nov 16 09:46:58 2011
New Revision: 181406

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=181406
Log:
2011-11-16  Jiangning Liu  

Backport r180964 from mainline
2011-11-04  Jiangning Liu  

PR rtl-optimization/38644
* config/arm/arm.c (thumb1_expand_epilogue): Add memory barrier
for epilogue having stack adjustment.

testsuite:

2011-11-16  Jiangning Liu  

Backport r180964 from mainline
2011-11-04  Jiangning Liu  

PR rtl-optimization/38644
* gcc.target/arm/stack-red-zone.c: New.

Added:
   
branches/ARM/embedded-4_6-branch/gcc/testsuite/gcc.target/arm/stack-red-zone.c
Modified:
branches/ARM/embedded-4_6-branch/gcc/ChangeLog.arm
branches/ARM/embedded-4_6-branch/gcc/config/arm/arm.c
branches/ARM/embedded-4_6-branch/gcc/testsuite/ChangeLog.arm