[Bug c/50272] New: A case that PRE optimization hurts performance

2011-09-01 Thread jiangning.liu at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50272

 Bug #: 50272
   Summary: A case that PRE optimization hurts performance
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: jiangning@arm.com


For the following simple test case, PRE optimization hoists computation
(s!=1) into the default branch of the switch statement, and finally causes
very poor code generation. This problem occurs in both X86 and ARM, and I
believe it is also a problem for other targets. 

int f(char *t) {
int s=0;

while (*t && s != 1) {
switch (s) {
case 0:
s = 2;
break;
case 2:
s = 1;
break;
default:
if (*t == '-') 
s = 1;
break;
}
t++;
}

return s;
}

Taking X86 as an example, with option "-O2" you may find 52 instructions
generated like below,

 :
   0:55   push   %ebp
   1:31 c0xor%eax,%eax
   3:89 e5mov%esp,%ebp
   5:57   push   %edi
   6:56   push   %esi
   7:53   push   %ebx
   8:8b 55 08 mov0x8(%ebp),%edx
   b:0f b6 0a movzbl (%edx),%ecx
   e:84 c9test   %cl,%cl
  10:74 50je 62 
  12:83 c2 01 add$0x1,%edx
  15:85 c0test   %eax,%eax
  17:75 23jne3c 
  19:8d b4 26 00 00 00 00 lea0x0(%esi,%eiz,1),%esi
  20:0f b6 0a movzbl (%edx),%ecx
  23:84 c9test   %cl,%cl
  25:0f 95 c0 setne  %al
  28:89 c7mov%eax,%edi
  2a:b8 02 00 00 00   mov$0x2,%eax
  2f:89 fbmov%edi,%ebx
  31:83 c2 01 add$0x1,%edx
  34:84 dbtest   %bl,%bl
  36:74 2aje 62 
  38:85 c0test   %eax,%eax
  3a:74 e4je 20 
  3c:83 f8 02 cmp$0x2,%eax
  3f:74 1fje 60 
  41:80 f9 2d cmp$0x2d,%cl
  44:74 22je 68 
  46:0f b6 0a movzbl (%edx),%ecx
  49:83 f8 01 cmp$0x1,%eax
  4c:0f 95 c3 setne  %bl
  4f:89 dfmov%ebx,%edi
  51:84 c9test   %cl,%cl
  53:0f 95 c3 setne  %bl
  56:89 demov%ebx,%esi
  58:21 f7and%esi,%edi
  5a:eb d3jmp2f 
  5c:8d 74 26 00  lea0x0(%esi,%eiz,1),%esi
  60:b0 01mov$0x1,%al
  62:5b   pop%ebx
  63:5e   pop%esi
  64:5f   pop%edi
  65:5d   pop%ebp
  66:c3   ret
  67:90   nop
  68:b8 01 00 00 00   mov$0x1,%eax
  6d:5b   pop%ebx
  6e:5e   pop%esi
  6f:5f   pop%edi
  70:5d   pop%ebp
  71:c3   ret

But with command line option "-O2 -fno-tree-pre", there are only 12
instructions generated, and the code would be very clean like below,

 :
   0:55   push   %ebp
   1:31 c0xor%eax,%eax
   3:89 e5mov%esp,%ebp
   5:8b 55 08 mov0x8(%ebp),%edx
   8:80 3a 00 cmpb   $0x0,(%edx)
   b:74 0eje 1b 
   d:80 7a 01 00  cmpb   $0x0,0x1(%edx)
  11:b0 02mov$0x2,%al
  13:ba 01 00 00 00   mov$0x1,%edx
  18:0f 45 c2 cmovne %edx,%eax
  1b:5d   pop%ebp
  1c:c3   ret


[Bug c/50272] A case that PRE optimization hurts performance

2011-09-01 Thread jiangning.liu at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50272

--- Comment #1 from Jiangning Liu  2011-09-02 
05:11:38 UTC ---
Richard gave some analysis at http://gcc.gnu.org/ml/gcc/2011-08/msg00037.html


[Bug rtl-optimization/38644] [4.4/4.5/4.6/4.7 Regression] Optimization flag -O1 -fschedule-insns2 causes wrong code

2011-10-31 Thread jiangning.liu at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644

--- Comment #56 from Jiangning Liu  2011-10-31 
07:48:25 UTC ---
(In reply to comment #54)
> I tested with GCC 4.6.2 and the patch provided by Mikael Pettersson.  It works
> for -march=armv4t and -march=armv5t, but not for -march=armv5te:
> 

Sebastian,

Actually you may try this,

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index aed748c..8269c1a 100755
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -22273,6 +22273,8 @@ thumb1_expand_epilogue (void)
   gcc_assert (amount >= 0);
   if (amount)
 {
+  emit_insn (gen_blockage ());
+
   if (amount < 512)
emit_insn (gen_addsi3 (stack_pointer_rtx, stack_pointer_rtx,
   GEN_INT (amount)));

Thanks,
-Jiangning


[Bug middle-end/39976] [4.5/4.6/4.7 Regression] Big sixtrack degradation on powerpc 32/64 after revision r146817

2012-02-24 Thread jiangning.liu at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39976

--- Comment #44 from Jiangning Liu  2012-02-24 
08:09:25 UTC ---
I'm not sure if this kind of bug has been completely fixed, and posted a
qeustion in mail list at http://gcc.gnu.org/ml/gcc/2012-02/msg00415.html .


[Bug tree-optimization/52424] dom prematurely pops entries from const_and_copies stack

2012-02-28 Thread jiangning.liu at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52424

--- Comment #1 from Jiangning Liu  2012-02-29 
03:23:46 UTC ---
> I've attached a proposed fix.  Jiangning, can you please apply this and see if
> your performance problem is resolved?

Bill,

Confirmed, I think your patch works for my big case and I do see the redundant
copies are removed from final binary code. Benchmark performance boosts
accordingly as well, although there still might be other potential problems.

Thanks a lot for your quick patch. And are you going to check-in to trunk soon
for 4.7? It would be also better if you can add a test case.

-Jiangning


[Bug tree-optimization/52424] dom prematurely pops entries from const_and_copies stack

2012-02-28 Thread jiangning.liu at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52424

--- Comment #2 from Jiangning Liu  2012-02-29 
03:42:17 UTC ---
> Jiangning Liu reported that the following C code has recently experienced
> degraded performance on trunk.  (Jiangning, please fill in the
> Host/Target/Build fields for your configuration, or tell me what they are if
> you don't have access.  The problem is not related to specific targets in any
> event.)
> 

Seems I don't have access.

Host: arm*-*-*
Target: arm*-*-*
Build: arm*-*-*


[Bug testsuite/52563] FAIL: gcc.dg/tree-ssa/scev-[3,4].c scan-tree-dump-times optimized "&a" 1

2012-03-13 Thread jiangning.liu at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52563

--- Comment #3 from Jiangning Liu  2012-03-13 
08:11:40 UTC ---
First, I tried gcc 4.4.3 on x86-64, and it works for this test case, so it is
to some extension a GCC regression.

Second, I tried trunk, and I can reproduce the failure for this test case. It
means there are still some other bugs hidden after my scalar evolution
improvement on address of array element.

Third, loop ivopt should be able to find &a is the base object of a selected IV
candidate after the analysis from simple_iv. After loop ivopt, I see the
following dump,

Selected IV set:
candidate 5 (important)
  depends on 1
  var_before i_13
  var_after i_6
  original biv
  type int
  base k_2(D)
  step k_2(D)
candidate 6
  depends on 1
  var_before ivtmp.11_9
  var_after ivtmp.11_1
  incremented before exit test
  type unsigned long
  base (unsigned long) ((int *) &a + (sizetype) k_2(D) * 4)
  step (unsigned long) ((sizetype) k_2(D) * 4)
  base object (void *) &a

So &a is already identified as an IV base object, but somehow only one &a is 
hoisted out of loop, while the other one isn't.

:
  D.1728_15 = (sizetype) k_2(D);
  D.1729_16 = D.1728_15 * 4;
  D.1730_17 = (sizetype) k_2(D);
  D.1731_18 = D.1730_17 * 4;
  D.1732_19 = &a + D.1731_18;
  ivtmp.11_20 = (unsigned long) D.1732_19;

:
  # i_13 = PHI 
  # ivtmp.11_9 = PHI 
  a_p.0_4 = (int *) ivtmp.11_9;
  MEM[(int *)&a][i_13] = 100;
  i_6 = i_13 + k_2(D);
  ivtmp.11_1 = ivtmp.11_9 + D.1729_16;
  if (i_6 <= 999)
goto ;
  else
goto ;

This statement,

  MEM[(int *)&a][i_13] = 100;

is expected to be like,

  D.4086_21 = (void *) ivtmp.11_9;
  MEM[base: D.4086_21, offset: 0B] = 100;

after loop ivopt.


[Bug rtl-optimization/38644] [4.3/4.4/4.5/4.6/4.7 Regression] Optimization flag -O1 -fschedule-insns2 causes wrong code

2011-04-26 Thread jiangning.liu at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644

Jiangning Liu  changed:

   What|Removed |Added

 CC||jiangning.liu at arm dot
   ||com

--- Comment #35 from Jiangning Liu  2011-04-26 
15:13:41 UTC ---
I verified that two patches in #38644 (back end) and #30282 (middle end) both
work for attached cases. Here comes my two cents,

1) The concept of red zone is to control whether instructions can write memory
below current stack frame or not, and it is only being supported by ABIs for
some particular ISAs, so it shouldn't be enabled in middle end by default for
all targets. At this point, middle end should be fixed to avoid doing things
unwanted in general for all targets.

2) Red zone is an orthogonal concept to prologue/epilogue, so it is not good to
fix this issue in prologue/epilogue back end code. At this point, we shouldn't
simply fix it in back end by adding barrier to implicitly disable red zone.
Instead, some hooks should be abstracted in middle end to support it in
scheduling dependence (middle end code). Back end like X86-64 should enable it
through hooks by itself. 

The key here is red zone should be a clean feature to be supported in middle
end. Exposing this kind of stuff to back end through hooks can improve code
quality for middle end and avoid bringing the bugs to back-end. 

This bug has long history, and it is being now or has ever been exposed on ARM,
POWER and X86(with some options combination). Fixing it in middle end is not
only a bug fix, but a simple infrastructure improvement. Due to the long
duration and the extensive impact for different targets, I don't see good
reason of not fixing it in mainline ASAP.


[Bug rtl-optimization/38644] [4.4/4.5/4.6/4.7 Regression] Optimization flag -O1 -fschedule-insns2 causes wrong code

2011-08-08 Thread jiangning.liu at arm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644

--- Comment #41 from Jiangning Liu  2011-08-09 
02:04:52 UTC ---

> Yes, this is from the libstdc++ sources (4.6.1 20110627,
> libstdc++-v3/libsupc++/new_opnt.cc).  You need a non-EABI ARM variant of GCC
> since this bug manifestation will only show up in the SJLJ version.

I tried and my local patch works on this case. As you can see like below, it is
fixed!

add r0, r0, #12 
bl  _Unwind_SjLj_Unregister
ldr r0, [r7, #8]
mov sp, r7
add sp, sp, #68
@ sp needed for prologue
pop {r2, r3, r4, r5}

Thanks,
-Jiangning