https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110089
Kewen Lin <linkw at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |segher at gcc dot gnu.org --- Comment #7 from Kewen Lin <linkw at gcc dot gnu.org> --- (In reply to Richard Biener from comment #2) > powerpc could use subf. and jgt? This is for the downward iterating IV for > partial-vector-usage=2 loop vectorization. Currently subf. exploitation only happens for CCmode and <MODE>mode == Pmode (patterns *subf<mode>3_dot and *subf<mode>3_dot2), as the record form on all fixed point insns perform "the first three bits of CR Field 0 are set by **signed** comparison of the result to zero", it can't be used for CCUNSmode and <MODE>mode == Pmode. But it looks we can extend it for unsigned int on 64-bit. Even if I did a hack to extend it with one *subf<mode>3_dot3 for UNSCC + SImode + Pmode==DImode, it still failed to match in combine, the pattern looks like: (insn 24 23 25 4 (set (reg/v:DI 127 [ n ]) (reg/v:DI 131 [ n ])) 682 {*movdi_internal64} (nil)) (insn 25 24 26 4 (set (reg:SI 142) (minus:SI (subreg/s/v:SI (reg/v:DI 131 [ n ]) 0) (subreg/s/v:SI (reg/v:DI 132 [ s ]) 0))) "test1.c":20:9 94 {*subfsi3} (expr_list:REG_DEAD (reg/v:DI 131 [ n ]) (nil))) (insn 26 25 28 4 (set (reg/v:DI 131 [ n ]) (zero_extend:DI (reg:SI 142))) "test1.c":20:9 16 {zero_extendsidi2} (expr_list:REG_DEAD (reg:SI 142) (nil))) (insn 28 26 29 4 (set (reg:CCUNS 143) (compare:CCUNS (subreg/s/v:SI (reg/v:DI 127 [ n ]) 0) (subreg/s/v:SI (reg/v:DI 132 [ s ]) 0))) "test1.c":22:15 discrim 1 805 {*cmpsi_unsigned} (expr_list:REG_DEAD (reg/v:DI 127 [ n ]) (nil))) ------- At expand, we have # n_9 = PHI <n_12(D)(2), n_19(3)> # sum_10 = PHI <0(2), sum_18(3)> _1 = MIN_EXPR <n_9, s_13(D)>; len_14 = (int) _1; _2 = (long unsigned int) len_14; _3 = _2 * 4; _4 = a_15(D) + _3; _5 = *_4; _6 = b_17(D) + _3; _7 = *_6; _8 = _5 + _7; sum_18 = _8 + sum_10; n_46 = n_9; n_19 = n_9 - s_13(D); // different operand here and below, if (n_46 > s_13(D)) // we don't need this n_46? goto <bb 3>; [89.00%] else goto <bb 4>; [11.00%] ;; succ: 3 ;; 4 --------- btw, I tried with a constant value for s like 16, it can exploit hardware loop (bdnz) on Power: #define TYPE unsigned int #define MIN(a,b) ((a>b)?b:a) int foo (int *__restrict a, int *__restrict b, TYPE n) { if (n <= 0) return 0; int len; int sum = 0; TYPE oldn; do { len = MIN (n, 16); sum += a[len] + b[len]; oldn = n; n = n - 16; } while (oldn > 16); return sum; } .L3: cmplwi 0,5,16 addi 8,5,-16 isel 9,7,5,1 rldicl 5,8,0,32 rldic 9,9,2,30 lwzx 8,3,9 lwzx 9,4,9 add 10,8,10 add 10,10,9 bdnz .L3