[Bug tree-optimization/51049] New: A regression caused by "Improve handling of conditional-branches on targets with high branch costs"
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51049 Bug #: 51049 Summary: A regression caused by "Improve handling of conditional-branches on targets with high branch costs" Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: liujiangn...@gcc.gnu.org int f(char *i, int j) { if (*i && j!=2) return *i; else return j; } Before the check-in r180109, we have D.4710 = *i; D.4711 = D.4710 != 0; D.4712 = j != 2; D.4713 = D.4711 & D.4712; if (D.4713 != 0) goto ; else goto ; : D.4710 = *i; D.4716 = (int) D.4710; return D.4716; : D.4716 = j; return D.4716; After check-in r180109, we have D.4711 = *i; if (D.4711 != 0) goto ; else goto ; : if (j != 2) goto ; else goto ; : D.4711 = *i; D.4714 = (int) D.4711; return D.4714; : D.4714 = j; return D.4714; the code below in function fold_truth_andor makes difference, /* Transform (A AND-IF B) into (A AND B), or (A OR-IF B) into (A OR B). For sequence point consistancy, we need to check for trapping, and side-effects. */ else if (code == icode && simple_operand_p_2 (arg0) && simple_operand_p_2 (arg1)) return fold_build2_loc (loc, ncode, type, arg0, arg1); for "*i != 0" simple_operand_p(*i) returns false. Originally this is not checked by the code. Please refer to http://gcc.gnu.org/ml/gcc-patches/2011-10/msg02445.html for discussion details. This change accidently made some benchmarks significantly improved due to some other reasons, but Michael gave the comments below. ==Michael's comment== It's nice that it caused a benchmark to improve significantly, but that should be done via a proper analysis and patch, not as a side effect of a supposed non-change. ==End of Michael's comment== The potential impact would be hurting other scenarios on performance. The key point is for this small case I gave RHS doesn't have side effect at all, so the optimization of changing it to AND doesn't violate C specification. ==Kai's comment== As for the case that left-hand side has side-effects but right-hand not, we aren't allowed to do this AND/OR merge. For example 'if ((f = foo ()) != 0 && f < 24)' we aren't allowed to make this transformation. This shouldn't be that hard. We need to provide to simple_operand_p_2 an additional argument for checking trapping or not. ==End of Kai's comment== This optimization change is blocking some other optimizations I am working on in back-ends. For example, the problem I described at http://gcc.gnu.org/ml/gcc/2011-09/msg00175.html disappeared. But it is not a proper behavior. Thanks, -Jiangning
[Bug tree-optimization/51049] A regression caused by "Improve handling of conditional-branches on targets with high branch costs"
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51049 Jiangning Liu changed: What|Removed |Added Keywords||missed-optimization Priority|P3 |P2 CC||jiangning.liu at arm dot ||com, liujiangning at gcc ||dot gnu.org
[Bug tree-optimization/50569] [4.6/4.7 regression] unaligned memory accesses generated for memcpy
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50569 --- Comment #16 from Jiangning Liu 2012-06-12 09:24:21 UTC --- Author: liujiangning Date: Tue Jun 12 09:24:11 2012 New Revision: 188431 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=188431 Log: 2011-06-12 Jiangning Liu Backport r182252 from mainline 2011-12-12 Eric Botcazou PR tree-optimization/50569 * tree-sra.c (build_ref_for_model): Replicate a chain of * COMPONENT_REFs in the expression of MODEL instead of just the last one. 2011-06-12 Jiangning Liu Backport r182252 from mainline 2011-12-12 Eric Botcazou PR tree-optimization/50569 * gcc.c-torture/execute/20111212-1.c: New test. Added: branches/ARM/embedded-4_6-branch/gcc/testsuite/gcc.c-torture/execute/20111212-1.c Modified: branches/ARM/embedded-4_6-branch/gcc/ChangeLog.arm branches/ARM/embedded-4_6-branch/gcc/testsuite/ChangeLog.arm branches/ARM/embedded-4_6-branch/gcc/tree-sra.c
[Bug tree-optimization/51070] [4.6/4.7 Regression] ICE verify_gimple failed
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51070 --- Comment #10 from Jiangning Liu 2012-06-12 09:44:28 UTC --- Author: liujiangning Date: Tue Jun 12 09:44:24 2012 New Revision: 188432 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=188432 Log: 2011-06-12 Jiangning Liu Backport r182839 from mainline 2012-01-03 Richard Guenther PR tree-optimization/51070 * tree-loop-distribution.c (generate_builtin): Do not replace the loop with a builtin if the partition contains statements which results are used outside of the loop. (stmt_has_scalar_dependences_outside_loop): Properly handle calls. 2011-06-12 Jiangning Liu Backport r182839 from mainline 2012-01-03 Richard Guenther PR tree-optimization/51070 * gcc.dg/torture/pr51070.c: New testcase. * gcc.dg/torture/pr51070-2.c: Likewise. Added: branches/ARM/embedded-4_6-branch/gcc/testsuite/gcc.dg/torture/pr51070-2.c branches/ARM/embedded-4_6-branch/gcc/testsuite/gcc.dg/torture/pr51070.c Modified: branches/ARM/embedded-4_6-branch/gcc/ChangeLog.arm branches/ARM/embedded-4_6-branch/gcc/testsuite/ChangeLog.arm branches/ARM/embedded-4_6-branch/gcc/tree-loop-distribution.c
[Bug tree-optimization/51042] [4.5 Regression] endless recursion in phi_translate
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51042 --- Comment #9 from Jiangning Liu 2012-06-12 09:53:57 UTC --- Author: liujiangning Date: Tue Jun 12 09:53:53 2012 New Revision: 188433 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=188433 Log: 2011-06-12 Jiangning Liu Backport r181256 from mainline 2011-11-10 Richard Guenther PR tree-optimization/51042 * tree-ssa-pre.c (phi_translate_1): Avoid recursing on self-referential expressions. Refactor code to avoid duplication. 2011-06-12 Jiangning Liu Backport r181256 from mainline 2011-11-10 Richard Guenther Added: branches/ARM/embedded-4_6-branch/gcc/testsuite/gcc.dg/torture/pr51042.c Modified: branches/ARM/embedded-4_6-branch/gcc/ChangeLog.arm branches/ARM/embedded-4_6-branch/gcc/testsuite/ChangeLog.arm branches/ARM/embedded-4_6-branch/gcc/tree-ssa-pre.c
[Bug testsuite/52563] FAIL: gcc.dg/tree-ssa/scev-[3,4].c scan-tree-dump-times optimized "&a" 1
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52563 Jiangning Liu changed: What|Removed |Added CC||liujiangning at gcc dot ||gnu.org --- Comment #4 from Jiangning Liu 2012-03-19 04:10:36 UTC --- Hi, /* In general, (TYPE) (BASE + STEP * i) = (TYPE) BASE + (TYPE -- sign extend) STEP * i, but we must check some assumptions. 1) If [BASE, +, STEP] wraps, the equation is not valid when precision of CT is smaller than the precision of TYPE. For example, when we cast unsigned char [254, +, 1] to unsigned, the values on left side are 254, 255, 0, 1, ..., but those on the right side are 254, 255, 256, 257, ... 2) In case that we must also preserve the fact that signed ivs do not overflow, we must additionally check that the new iv does not wrap. For example, unsigned char [125, +, 1] casted to signed char could become a wrapping variable with values 125, 126, 127, -128, -127, ..., which would confuse optimizers that assume that this does not happen. */ must_check_src_overflow = TYPE_PRECISION (ct) < TYPE_PRECISION (type); The code above in function convert_affine_scev set must_check_src_overflow to true for 32-bit, while set false for 64-bit code. This means 64-bit mode fails to unfold the address expression for array element because of the case 1) as listed in comments above. For the address of array element a[i], "&a + unitsize * i" has different representation for 32-bit and 64-bit. For 32-bit, it is "(32-bit pointer) + (32-bit integer)", while for 64-bit, it is "(64-bit pointer) + (32-bit integer)". If you try the case scev-5.c as below, you may find the case can pass. /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-optimized" } */ int *a_p; int a[1000]; f(int k) { long long i; for (i=k; i<1000; i+=k) { a_p = &a[i]; *a_p = 100; } } /* { dg-final { scan-tree-dump-times "&a" 1 "optimized" } } */ /* { dg-final { cleanup-tree-dump "optimized" } } */ Any idea to fix this problem? Thanks, -Jiangning
[Bug testsuite/52563] FAIL: gcc.dg/tree-ssa/scev-[3,4].c scan-tree-dump-times optimized "&a" 1
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52563 --- Comment #6 from Jiangning Liu 2012-03-20 02:32:12 UTC --- > We cannot fix it without relaxing the POINTER_PLUS_EXPR constraints. > I was working on that, but as usual the TYPE_IS_SIZETYPE removal > has priority. Do you mean you are also working on removing TYPE_IS_SIZETYPE? > > Please consider fixing/XFAILing the testcases as they still FAIL and you > are responsible for this. You can open a new enhancement PR covering > this. > I think 64-bit mode should also have this optimization enabled. XFAIL implies the missing of this optimization is a correct behavior. But I think this is not what I expected. So I don't think we should add XFAIL for this case. Instead I want to add a new test case scev-5.c to cover 64-bit testing. Thanks, -Jiangning
[Bug testsuite/52563] FAIL: gcc.dg/tree-ssa/scev-[3,4].c scan-tree-dump-times optimized "&a" 1
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52563 --- Comment #8 from Jiangning Liu 2012-03-22 09:17:51 UTC --- Author: liujiangning Date: Thu Mar 22 09:17:45 2012 New Revision: 185678 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=185678 Log: 2012-03-22 Jiangning Liu PR tree-optimization/52563 * gcc.dg/tree-ssa/scev-3.c: XFAIL on lp64. * gcc.dg/tree-ssa/scev-4.c: XFAIL on lp64. * gcc.dg/tree-ssa/scev-5.c: New. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c Modified: trunk/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c trunk/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
[Bug tree-optimization/50272] A case that PRE optimization hurts performance
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50272 Jiangning Liu changed: What|Removed |Added CC||liujiangning at gcc dot ||gnu.org --- Comment #5 from Jiangning Liu 2012-03-30 03:41:42 UTC --- (In reply to comment #4) > This is still a problem in version 4.7.0 20120225. http://gcc.gnu.org/ml/gcc/2011-09/msg00342.html Jeff already gave comments here. The proposed solution is path sensitive optimization. It seems it's hard to solve this problem in short time.
[Bug rtl-optimization/38644] [4.4/4.5/4.6 Regression] Optimization flag -O1 -fschedule-insns2 causes wrong code
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644 --- Comment #61 from Jiangning Liu 2011-11-16 09:47:01 UTC --- Author: liujiangning Date: Wed Nov 16 09:46:58 2011 New Revision: 181406 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=181406 Log: 2011-11-16 Jiangning Liu Backport r180964 from mainline 2011-11-04 Jiangning Liu PR rtl-optimization/38644 * config/arm/arm.c (thumb1_expand_epilogue): Add memory barrier for epilogue having stack adjustment. testsuite: 2011-11-16 Jiangning Liu Backport r180964 from mainline 2011-11-04 Jiangning Liu PR rtl-optimization/38644 * gcc.target/arm/stack-red-zone.c: New. Added: branches/ARM/embedded-4_6-branch/gcc/testsuite/gcc.target/arm/stack-red-zone.c Modified: branches/ARM/embedded-4_6-branch/gcc/ChangeLog.arm branches/ARM/embedded-4_6-branch/gcc/config/arm/arm.c branches/ARM/embedded-4_6-branch/gcc/testsuite/ChangeLog.arm