I have just switched to gcc 5.2 from 4.9.2 and the code quality does seem to 
have improved significantly. For example, it now seems much better at using 
ldp/stp and it seems to has stopped gratuitous use of the SIMD registers.

However, I still have a few whinges:-)

See attached copy.c / copy.s (This is a performance critical function from 
OpenJDK)

pd_disjoint_words:
         cmp     x2, 8         <<< (1)
         sub     sp, sp, #64   <<< (2)
         bhi     .L2
         cmp     w2, 8         <<< (1)
         bls     .L15
.L2:
         add     sp, sp, 64    <<< (2)

(1) If count as a 64 bit unsigned is <= 8 then it is probably still <= 8 as a 
32 bit unsigned.

Agreed.  This could probably be done by the mid-end based on value range
propagation.  Please can you file a report in gcc bugzilla?

Not sure how we can do this in VRP. It seems that this is generated during the RTL expansion time. Maybe,it has to be done during expansion. optimized tree looks like:


;; Function pd_disjoint_words (pd_disjoint_words, funcdef_no=0, decl_uid=2763, cgraph_uid=0, symbol_order=0)

Removing basic block 13
pd_disjoint_words (HeapWord * from, HeapWord * to, size_t count)
{
  long int t$b;
  long int t$a;
  struct unit t;
  struct unit t;
  struct unit t;
  struct unit t;
  struct unit t;
  struct unit t;
  long int _5;

  <bb 2>:
switch (count_2(D)) <default: <L16>, case 0: <L18>, case 1: <L1>, case 2: <L2>, case 3: <L4>, case 4: <L6>, case 5: <L8>, case 6: <L10>, case 7: <L12>, case 8: <L14>>

<L1>:
  _5 = *from_4(D);
  *to_6(D) = _5;
  goto <bb 12> (<L18>);

<L2>:
  t$a_8 = MEM[(struct unit *)from_4(D)];
  t$b_9 = MEM[(struct unit *)from_4(D) + 8B];
  MEM[(struct unit *)to_6(D)] = t$a_8;
  MEM[(struct unit *)to_6(D) + 8B] = t$b_9;
  goto <bb 12> (<L18>);

<L4>:
  t = MEM[(struct unit *)from_4(D)];
  MEM[(struct unit *)to_6(D)] = t;
  t ={v} {CLOBBER};
  goto <bb 12> (<L18>);

<L6>:
  t = MEM[(struct unit *)from_4(D)];
  MEM[(struct unit *)to_6(D)] = t;
  t ={v} {CLOBBER};
  goto <bb 12> (<L18>);

<L8>:
  t = MEM[(struct unit *)from_4(D)];
  MEM[(struct unit *)to_6(D)] = t;
  t ={v} {CLOBBER};
  goto <bb 12> (<L18>);

<L10>:
  t = MEM[(struct unit *)from_4(D)];
  MEM[(struct unit *)to_6(D)] = t;
  t ={v} {CLOBBER};
  goto <bb 12> (<L18>);

<L12>:
  t = MEM[(struct unit *)from_4(D)];
  MEM[(struct unit *)to_6(D)] = t;
  t ={v} {CLOBBER};
  goto <bb 12> (<L18>);

<L14>:
  t = MEM[(struct unit *)from_4(D)];
  MEM[(struct unit *)to_6(D)] = t;
  t ={v} {CLOBBER};
  goto <bb 12> (<L18>);

<L16>:
  _Copy_disjoint_words (from_4(D), to_6(D), count_2(D)); [tail call]

<L18>:
  return;

}



Thanks,
Kugan

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to