http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400
--- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> 2012-09-01 09:40:14
UTC ---
The code below seems to optimize v[0]-v[1] and v[1]+v[0]. It doesn't recognize
v[0]+v[1], but that would not be too hard to add I guess. Compared to the true
hadd insn, I removed the setattr "type" "sseadd" because it crashed the
compiler (in cost computation maybe). Apart from the things left in here that
may not make sense, I don't know if a peephole would be more relevant. Maybe
the insn helps more if I want to recognize dot products (dppd) later on? At
least thanks to it {v[0]-v[1],w[0]-w[1]} is now recognized as a hsub (although
it doesn't work if v==w because vec_duplicate doesn't match vec_concat).
(define_insn "*sse3_h<plusminus_insn>v2df3_low_MARC"
[(set (match_operand:DF 0 "register_operand" "=x,x")
(plusminus:DF
(vec_select:DF
(match_operand:V2DF 1 "register_operand" "0,x")
(parallel [(const_int 0)]))
(vec_select:DF
(match_dup 1)
(parallel [(const_int 1)]))))]
"TARGET_SSE3"
"@
h<plusminus_mnemonic>pd\t{%0, %0|%0, %0}
vh<plusminus_mnemonic>pd\t{%1, %1, %0|%0, %1, %1}"
[(set_attr "isa" "noavx,avx")
(set_attr "prefix" "orig,vex")
(set_attr "mode" "V2DF")])