Make minmax detection more flexible in tree-ssa-phiopt.c

2011-02-23 Thread Lu, John
Hi,

I'm trying to improve the asm code generated for C code like:

  long f(long a, long b) {
_int64 s;

s = (((long long) a) + ((long long) b));

s = (s > 0x7fffL ? (long) 0x7fffL : 
(s <-0x8000L ? (long)-0x8000L : 
s));

return((long) s);
  }

A key step is minmax detection in tree-ssa-phiopt.c.  However, in my test cases 
sometimes minmax detection fails because of input like:

  if (D.5591_11 <= 2147483647)
  goto ;
else
  goto ;

  :
D.5594_19 = MAX_EXPR ;
iftmp.0_20 = (long int) D.5594_19;

  :
# iftmp.0_1 = PHI 


Minmax detection expects the middle block to have one statement, but in this 
case there is an additional cast.  Minmax would be detected if the cast
was moved after the middle block:

  ...
  :
D.5594_19 = MAX_EXPR ;

  :
# s_1 = PHI 
iftmp.0_20 = (long int) s_1;

The limitation occurs around line 725 in tree-ssa-phiopt.c in GCC 4.5.2:

  /* Recognize the following case, assuming d <= u:

 if (a <= u)
   b = MAX (a, d);
 x = PHI 

 This is equivalent to

 b = MAX (a, d);
 x = MIN (b, u);  */

  gimple assign = last_and_only_stmt (middle_bb);
  tree lhs, op0, op1, bound;

I was wondering if anyone could give me guidance on how to add flexibility
to minmax detection in order to handle this case.

Thanks,
John Lu





clz pattern

2011-06-29 Thread Lu, John
Hi,

I'm trying to utilize the clz pattern:

  (define_insn "clzhi2"
[(set (match_operand:HI 0 "register_operand" "=r")
(clz:HI (match_operand:HI 1 "register_operand" "r")))]
""
"cntlz %0 %1")

I can build a compiler successfully with this pattern, but I
can't find any C source that will utilize this pattern.  I was
wondering how GCC utilizes these patterns (and others like it),
which have a functionality that does not straightforwardly map to 
any C operator.  

Thanks,
John Lu




LIM/Alias Analysis performance issue

2010-04-16 Thread Lu, John
Hi,

I've encountered a performance issue in a port of GCC I'm working on, 
where the behavior of LIM is affected by the ordering of fields in a 
structure.  I've been able to re-create it with a GCC 4.3.1
Linux X86 compiler.  With GCC 4.3.1 Linux X86, when I compile:

struct foo {
  int *p;
  int  t;
} T;

void o() {
  unsigned int  i;
  
  for (i = 0; i < 256; i++) {
T.p[i]=0;
  }
}

with the command:

gcc -S -O2 -fdump-tree-all good.c

the file good.c.095t.lim shows T.p being moved outside the loop:

:
  pretmp.10_8 = T.p;

:
  # i_14 = PHI 
  D.1556_4 = (long unsigned int) i_14;
  D.1557_5 = D.1556_4 * 4;
  D.1558_6 = pretmp.10_8 + D.1557_5;
  *D.1558_6 = 0;
  i_7 = i_14 + 1;
  if (i_7 <= 255)
goto ;
  else
goto ;

:
  goto ;

If the fields of the structure are reversed:

struct foo {
  int  t;
  int *p;
} T;

T.p is kept inside the loop:

:
  # i_21 = PHI 
  D.1555_3 = T.p;
  D.1556_4 = (long unsigned int) i_21;
  D.1557_5 = D.1556_4 * 4;
  D.1558_6 = D.1555_3 + D.1557_5;
  *D.1558_6 = 0;
  i_7 = i_21 + 1;
  if (i_7 <= 255)
goto ;
  else
goto ;

:
  goto ;

On my port, this causes a large performance degradation, and I suspect
the
cause is ultimately in the alias analysis pass.  I was wondering if
there 
is a way to configure GCC to avoid this issue.

Thanks,
John Lu


Combine pass with reused sources

2013-08-12 Thread Lu, John
Hi,

I'm working on compiler for an architecture with a multiply instruction that
takes two 32-bit factors, sign-extends both factors to 64-bits and then does a 
64-bit multiplication and stores the result to a destination register.  The 
combine pass successfully generates the pattern (mulhizi3) for this instruction 
twice for the following function.


long long res0;
long long res1;

long f1(long a, long b, long c, long d) {
  res0=((long long) a)*((long long) b);
  res1=((long long) c)*((long long) d);
}

The generated RTL from combine looks like:

(insn 10 9 11 2 g.c:5 (set (reg:ZI 176)
(mult:ZI (sign_extend:ZI (reg:HI 9 r6 [ b ]))
   (sign_extend:ZI (reg:HI 6 r4 [ a ] 262 {*mulhizi3} (nil))

However, if I modify the function so that one of the factors is reused,

long f1(long a, long b, long c) {
  res0=((long long) a)*((long long) b);
  res1=((long long) c)*((long long) b);
}

combine will not fuse the reused sign-extension result to generate the
mulhizi3 pattern.  

I am wondering if anyone else has hit this issue or if I have done something 
wrong in my port.  Any help would be greatly appreciated.

Thanks,
John Lu