microblaze unroll loops optimization

David Holsgrove Mon, 07 Jan 2013 21:59:41 -0800

Loop unrolling (-funroll-loops) for microblaze is ineffectual on the gcc
4.6/4.7/4.8 branches.


This previously worked on an out of tree gcc 4.1.2, and I believe the relevant
diff to be the use of UNSPEC_CMP and UNSPEC_CMPU to create two unique
instructions for signed_compare and unsigned_compare in microblaze's machine
description, which means that the iv_analyze_expr in loop-iv.c of the compare
instruction is unable to understand the expression.

Details follow below,

thanks,
David



Looking at the resultant (-fdump-rtl-loop2_unroll-slim) dump file from compiling
the following small example (extracted from larger benchmarking coremark);


void matrix_mul_const(signed int N, signed int *C,
                      signed short *A, signed short val) {
        signed int i,j;
        for (i=0; i<N; i++) {
                for (j=0; j<N; j++) {
                        C[i*N+j]=(signed int)A[i*N+j] * (signed int)val;
                }
        }
}


we get the following analysis of the insns from gcc/loop-unroll.c and
gcc/loop-iv.c;


Analyzing operand (reg:SI 99) of insn (jump_insn 29 28 47 4 (set (pc)
        (if_then_else (lt:SI (reg:SI 99)
                (const_int 0 [0]))
            (label_ref:SI 47)
            (pc))) core_matrix_min.c:6 69 {branch_zero}
     (expr_list:REG_DEAD (reg:SI 99)
        (expr_list:REG_BR_PROB (const_int 9100 [0x238c])
            (nil)))
 -> 47)
Analyzing def of (reg:SI 99) in insn (insn 28 26 29 4 (set (reg:SI 99)
        (unspec [
                (reg/v:SI 87 [ N ])
                (reg/v:SI 56 [ j ])
            ] 104)) core_matrix_min.c:6 66 {signed_compare}
     (expr_list:REG_DEAD (reg/v:SI 87 [ N ])
        (nil)))
(reg:SI 99) in insn (insn 28 26 29 4 (set (reg:SI 99)
        (unspec [
                (reg/v:SI 87 [ N ])
                (reg/v:SI 56 [ j ])
            ] 104)) core_matrix_min.c:6 66 {signed_compare}
     (expr_list:REG_DEAD (reg/v:SI 87 [ N ])
        (nil)))
  is not simple
Loop 2 is not simple.


where signed_compare is defined in microblaze.md as;


(define_insn "signed_compare"
  [(set (match_operand:SI 0 "register_operand" "=d")
        (unspec
                [(match_operand:SI 1 "register_operand" "d")
                 (match_operand:SI 2 "register_operand" "d")] UNSPEC_CMP))]
  ""
  "cmp\t%0,%1,%2"
  [(set_attr "type"     "arith")
  (set_attr "mode"      "SI")
  (set_attr "length"    "4")])


During iv_analyze_expr in loop-iv.c, the expression RHS of this insn is
determined to be;


(gdb) call debug_rtx(rhs)
(unspec [
        (reg/v:SI 87 [ N ])
        (reg/v:SI 56 [ j ])
    ] 104)


meaning that GET_CODE (rhs) will be UNSPEC and no further analysis of the insn
is carried out.

Adjusting the signed_compare insn to the following;


(define_insn "signed_compare"
  [(set (match_operand:SI 0 "register_operand" "=d")
        (compare:SI (match_operand:SI 1 "register_operand" "d")
                 (match_operand:SI 2 "register_operand" "d")))]
  ""
  "cmp\t%0,%1,%2"
  [(set_attr "type"     "arith")
  (set_attr "mode"      "SI")
  (set_attr "length"    "4")])


I get much further through the unrolling analysis;


Analyzing operand (reg/v:SI 87 [ N ]) of insn (insn 28 26 29 4 (set (reg:SI 99)
        (compare:SI (reg/v:SI 87 [ N ])
            (reg/v:SI 56 [ j ]))) core_matrix_min.c:6 66 {signed_compare}
     (expr_list:REG_DEAD (reg/v:SI 87 [ N ])
        (nil)))
  invariant (reg/v:SI 87 [ N ]) (in SI)
Analyzing operand (reg/v:SI 56 [ j ]) of insn (insn 28 26 29 4 (set (reg:SI 99)
        (compare:SI (reg/v:SI 87 [ N ])
            (reg/v:SI 56 [ j ]))) core_matrix_min.c:6 66 {signed_compare}
     (expr_list:REG_DEAD (reg/v:SI 87 [ N ])
        (nil)))
Analyzing def of (reg/v:SI 56 [ j ]) in insn (insn 26 25 28 4 (set
(reg/v:SI 56 [ j ])
        (plus:SI (reg/v:SI 56 [ j ])
            (const_int 1 [0x1]))) core_matrix_min.c:6 10 {addsi3}
     (nil))
Analyzing operand (reg/v:SI 56 [ j ]) of insn (insn 26 25 28 4 (set
(reg/v:SI 56 [ j ])
        (plus:SI (reg/v:SI 56 [ j ])
            (const_int 1 [0x1]))) core_matrix_min.c:6 10 {addsi3}
     (nil))
Analyzing (reg/v:SI 56 [ j ]) for bivness.
  (reg/v:SI 56 [ j ]) + (const_int 1 [0x1]) * iteration (in SI)
Analyzing operand (const_int 1 [0x1]) of insn (insn 26 25 28 4 (set
(reg/v:SI 56 [ j ])
        (plus:SI (reg/v:SI 56 [ j ])
            (const_int 1 [0x1]))) core_matrix_min.c:6 10 {addsi3}
     (nil))
  invariant (const_int 1 [0x1]) (in VOID)
(reg/v:SI 56 [ j ]) in insn (insn 26 25 28 4 (set (reg/v:SI 56 [ j ])
        (plus:SI (reg/v:SI 56 [ j ])
            (const_int 1 [0x1]))) core_matrix_min.c:6 10 {addsi3}
     (nil))
  is (plus:SI (reg/v:SI 56 [ j ])
    (const_int 1 [0x1])) + (const_int 1 [0x1]) * iteration (in SI)
Loop 2 is not simple.
;; Unable to prove that the loop rolls exactly once


The remaining issue appears to be that the loop is identified as being of type
'while (i-- < 10)' and goes to the 'fail' path in iv_number_of_iterations.


A previous out of tree microblaze port based on gcc 4.1.2 successfully unrolled
this loop, but its machine description had a very large instruction for
branch_compare which used a switch on all the codes of the comparison operator
(GTU,LEU,GEU,etc) to output asm instructions directly to compare and branch,
which would have made the loop analysis easier?


My question is, does this appear to be the correct approach to take to allowing
microblaze to unroll loops? (The use of UNSPEC in the signed_compare /
unsigned_compare appears to have been used to make the instructions unique, but
could be handled in a single instruction which examines the operand to determine
if cmp or cmpu asm should be output)

microblaze unroll loops optimization

Reply via email to