I'm porting gcc to a new VLIW architecture. There are 11 function units in the chip, and 4 of them are DSPs. Now I'm designing the SIMD instruction patterns, and I wouldn't like use the built-in functions to support it.
If I wrote some instruction patterns which involved many V4QI packing/unpacking/arithmetic operations, could gcc try to select them automatically and smartly? (Of course I never wrote any define_expand/define_split to generate any V4QI operations myself.)
For example: 1. my packing instruction patterns ('D' means DSP register): (define_insn "*packqi_from_mem" [(set (vec_select:QI (match_operand:V4QI 0 "register_operand" "D") (parallel [(match_operand:SI 2 "const_int_operand" "i")])) (match_operand:QI 1 "memory_operand" "m"))] "" "ldub.b%2\\t%0, %1" )
2. my V4QI + V4QI SIMD operation (define_insn "*SIMD_addqi3" [(set (match_operand:V4QI 0 "register_operand" "=D") (plus:V4QI (match_operand:V4QI 1 "register_operand" "%D") (match_operand:V4QI 2 "register_operand" "D")))] "" "add.ub\\t%0, %1, %2" )
Is it possible that gcc can try to load 4 QImode value to a register by the pattern "*packqi_from_mem" and perform the V4QI + V4QI SIMD add by the pattern "*SIMD_addqi3" itself?