Make minmax detection more flexible in tree-ssa-phiopt.c
Hi, I'm trying to improve the asm code generated for C code like: long f(long a, long b) { _int64 s; s = (((long long) a) + ((long long) b)); s = (s > 0x7fffL ? (long) 0x7fffL : (s <-0x8000L ? (long)-0x8000L : s)); return((long) s); } A key step is minmax detection in tree-ssa-phiopt.c. However, in my test cases sometimes minmax detection fails because of input like: if (D.5591_11 <= 2147483647) goto ; else goto ; : D.5594_19 = MAX_EXPR ; iftmp.0_20 = (long int) D.5594_19; : # iftmp.0_1 = PHI Minmax detection expects the middle block to have one statement, but in this case there is an additional cast. Minmax would be detected if the cast was moved after the middle block: ... : D.5594_19 = MAX_EXPR ; : # s_1 = PHI iftmp.0_20 = (long int) s_1; The limitation occurs around line 725 in tree-ssa-phiopt.c in GCC 4.5.2: /* Recognize the following case, assuming d <= u: if (a <= u) b = MAX (a, d); x = PHI This is equivalent to b = MAX (a, d); x = MIN (b, u); */ gimple assign = last_and_only_stmt (middle_bb); tree lhs, op0, op1, bound; I was wondering if anyone could give me guidance on how to add flexibility to minmax detection in order to handle this case. Thanks, John Lu
clz pattern
Hi, I'm trying to utilize the clz pattern: (define_insn "clzhi2" [(set (match_operand:HI 0 "register_operand" "=r") (clz:HI (match_operand:HI 1 "register_operand" "r")))] "" "cntlz %0 %1") I can build a compiler successfully with this pattern, but I can't find any C source that will utilize this pattern. I was wondering how GCC utilizes these patterns (and others like it), which have a functionality that does not straightforwardly map to any C operator. Thanks, John Lu
LIM/Alias Analysis performance issue
Hi, I've encountered a performance issue in a port of GCC I'm working on, where the behavior of LIM is affected by the ordering of fields in a structure. I've been able to re-create it with a GCC 4.3.1 Linux X86 compiler. With GCC 4.3.1 Linux X86, when I compile: struct foo { int *p; int t; } T; void o() { unsigned int i; for (i = 0; i < 256; i++) { T.p[i]=0; } } with the command: gcc -S -O2 -fdump-tree-all good.c the file good.c.095t.lim shows T.p being moved outside the loop: : pretmp.10_8 = T.p; : # i_14 = PHI D.1556_4 = (long unsigned int) i_14; D.1557_5 = D.1556_4 * 4; D.1558_6 = pretmp.10_8 + D.1557_5; *D.1558_6 = 0; i_7 = i_14 + 1; if (i_7 <= 255) goto ; else goto ; : goto ; If the fields of the structure are reversed: struct foo { int t; int *p; } T; T.p is kept inside the loop: : # i_21 = PHI D.1555_3 = T.p; D.1556_4 = (long unsigned int) i_21; D.1557_5 = D.1556_4 * 4; D.1558_6 = D.1555_3 + D.1557_5; *D.1558_6 = 0; i_7 = i_21 + 1; if (i_7 <= 255) goto ; else goto ; : goto ; On my port, this causes a large performance degradation, and I suspect the cause is ultimately in the alias analysis pass. I was wondering if there is a way to configure GCC to avoid this issue. Thanks, John Lu
Combine pass with reused sources
Hi, I'm working on compiler for an architecture with a multiply instruction that takes two 32-bit factors, sign-extends both factors to 64-bits and then does a 64-bit multiplication and stores the result to a destination register. The combine pass successfully generates the pattern (mulhizi3) for this instruction twice for the following function. long long res0; long long res1; long f1(long a, long b, long c, long d) { res0=((long long) a)*((long long) b); res1=((long long) c)*((long long) d); } The generated RTL from combine looks like: (insn 10 9 11 2 g.c:5 (set (reg:ZI 176) (mult:ZI (sign_extend:ZI (reg:HI 9 r6 [ b ])) (sign_extend:ZI (reg:HI 6 r4 [ a ] 262 {*mulhizi3} (nil)) However, if I modify the function so that one of the factors is reused, long f1(long a, long b, long c) { res0=((long long) a)*((long long) b); res1=((long long) c)*((long long) b); } combine will not fuse the reused sign-extension result to generate the mulhizi3 pattern. I am wondering if anyone else has hit this issue or if I have done something wrong in my port. Any help would be greatly appreciated. Thanks, John Lu