http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54400
--- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> 2012-09-01 09:40:14 UTC --- The code below seems to optimize v[0]-v[1] and v[1]+v[0]. It doesn't recognize v[0]+v[1], but that would not be too hard to add I guess. Compared to the true hadd insn, I removed the setattr "type" "sseadd" because it crashed the compiler (in cost computation maybe). Apart from the things left in here that may not make sense, I don't know if a peephole would be more relevant. Maybe the insn helps more if I want to recognize dot products (dppd) later on? At least thanks to it {v[0]-v[1],w[0]-w[1]} is now recognized as a hsub (although it doesn't work if v==w because vec_duplicate doesn't match vec_concat). (define_insn "*sse3_h<plusminus_insn>v2df3_low_MARC" [(set (match_operand:DF 0 "register_operand" "=x,x") (plusminus:DF (vec_select:DF (match_operand:V2DF 1 "register_operand" "0,x") (parallel [(const_int 0)])) (vec_select:DF (match_dup 1) (parallel [(const_int 1)]))))] "TARGET_SSE3" "@ h<plusminus_mnemonic>pd\t{%0, %0|%0, %0} vh<plusminus_mnemonic>pd\t{%1, %1, %0|%0, %1, %1}" [(set_attr "isa" "noavx,avx") (set_attr "prefix" "orig,vex") (set_attr "mode" "V2DF")])