What are the optimizations that contribute to ~70% improvement on SPEC06 hmmer benchmark?

2019-05-23 Thread a b
Recently I happen to notice that there is more than 70% performance improvement 
for SPEC06 hmmer benchmark from Linaro GCC 
5.2-2015.11-2
 to GCC 10.0 on ARM platforms.

I did some quick searching and think loop 
distribution
 and 
vectorization
 contribute  25% and 30%, respectively. But they still don't add up to 70%. Can 
you explain what else is helping here?

Thanks


incorrect redundency of loading subreg of physical regs

2012-07-16 Thread a b

Hello,
 
I cannot figure out how the entire mechaism works in my case. I work on 
4.5.2.
 
e.g. I have 64 registers. But sometimes I need to 2 kinds of instructions 
(Xand Y)to fill these 64-bit registers. X fills the low 32 bit and Y fills the 
high 32 bit. I try to use subreg for the RTL patterns of X and Y.
 
insn1 (set (subreg:SI   (reg:DI) 0)   (reg:SI)) 
insn2 (set (subreg:SI   (reg:DI) 4)   (reg:SI)) < Y rtl  setting the   high 
32 bit> 
 
  the problem is that register allocation seems  to allocate the same register 
to both insns, and after register allocation (or more precisely postreload), I 
lose the subreg info in rtl. This makes the later phases think that insn2 kills 
insn1 and then insn1 is deleted.
 
   Can you give me some suggestion how to solve this problem? I can think of 
many possible solutions but I don't know what is the better one.
 
how to let the compiler know that insn2 and insn1 write to the different parts 
of the same reg and insn1 should be retained?
 
gcc internal says that use of subreg for physical regs are not encouraged. why? 
is that the reason that subreg is removed after IRA?
 
if I use subreg of physical reg, will the phases avoid the incorrect code 
elimination problem? ordo I need some extra work to teach the later phases to 
work on subreg of physical regs?
 
any example about supporting subreg of physical regs? 
 
thanks

  


POINTER_PLUS_EXPR Vs. PLUS_EXPR

2012-08-07 Thread a b

I hit a problem about the 2 operands of a addr-plus instruction. My instruction 
is special because it is not commutative and requries address be the 2nd 
operand and the offset in the 3rd one. But my port generates PLUS_EXPR instead 
of POINTER_PLUS_EXPR and finally mistakenly switches the order of the address 
and offset.
The relevant code is in fold-const.c:
10281   switch (code)
10282 {
10283 case POINTER_PLUS_EXPR:

10292   /* INT +p INT -> (PTR)(INT + INT).  Stripping types allows for 
this. */
10293   if (INTEGRAL_TYPE_P (TREE_TYPE (arg1))
10294&& INTEGRAL_TYPE_P (TREE_TYPE (arg0)))
10295 return fold_convert_loc (loc, type,
10296  fold_build2_loc (loc, PLUS_EXPR, 
uintptrtype,
10297   fold_convert_loc (loc, 
uintptrtype,
10298 arg1),
10299   fold_convert_loc (loc, 
uintptrtype,
10300 arg0)));

the code seems to force gcc to generate a PLUS_EXPR instead of 
POINTER_PLUS_EXPR and I am not clear what is the context for the piece of code. 
The code was checked in by Andrew in 2007 
(http://gcc.gnu.org/ml/fortran/2007-06/msg00163.html)
Could anybody kindly explain me what was going on with that piece of code?
or more specifically:
1. what does INT +p INT mean? shouldn't POINTER_PLUS_EXPR always better than 
PLUS_EXPR? why change to PLUS_EXPR?
2. In general, when we should use POINTER_PLUS_EXPR and PLUS_EXPR. I understand 
the good of POINTER_PLUS_EXPR in my case. but it seems that it is not strictly 
enforced that address+offset should always represented by POINTER_PLUS_EXPR . 
Then can anybody tell me what the rule that a POINTER_PLUS_EXPR should be or 
happens to be used?

Another irlevant question is where I can find the regression test result of the 
gcc releases such as 4.5/4.7? I try to find out which dejagnu tests have 
known-failures. 

thanks