Paulo Matos <pma...@broadcom.com> writes: >> -----Original Message----- >> From: Richard Sandiford [mailto:rdsandif...@googlemail.com] >> Sent: 27 January 2014 16:06 >> To: Paulo Matos >> Cc: gcc@gcc.gnu.org >> Subject: Re: Mode change for bswap pattern expansion >> >> Paulo Matos <pma...@broadcom.com> writes: >> > On a vector processor we can do a bswapsi with two instructions, by first >> rotating half-words (16 bits) by 8 and then rotating full words by 16. >> > However, this means expanding: >> > (set (match_operand:SI 0 "register_operand" "") >> > (bswap:SI (match_operand:SI 1 "register_operand" ""))) >> > >> > to: >> > (set (match_dup:V2HI 0) >> > (rotate:V2HI (match_dup:V2HI 1) >> > (const_int 8))) >> > (set (match_dup:SI 0) >> > (rotate:SI (match_dup:SI 0) >> > (const_int 16))) >> > >> > This is obviously not correct, because match_dup cannot set the mode. The >> > point >> I am trying to make is that I can't find a good way to deal with the mode >> changes. I don't think GCC is too happy if I change the modes of the >> same operand >> from one instruction to the other right? The only other way is to emit >> paradoxical subregs. So something along these lines: >> > (set (subreg:V2HI (match_dup 0) 0) >> > (rotate:V2HI (subreg:V2HI (match_dup 1) 0) >> > (const_int 8))) >> > (set (match_dup 0) >> > (rotate:SI (match_dup 0) >> > (const_int 16))) >> >> It's usually better not to hard-code the subregs in the pattern. >> Instead you could use C code to create the subregs, e.g.: >> >> [(set (match_dup 3) >> (rotate:V2HI (match_dup 2) >> (const_int 8))) >> (set (match_dup 0) >> (rotate:SI (match_dup 4) >> (const_int 16)))] >> "" >> { >> operands[2] = gen_lowpart (V2HImode, operands[1]); >> operands[3] = gen_reg_rtx (V2HImode); >> operands[4] = gen_lowpart (SImode, operands[3]); >> } >> >> so that any hard regs are correctly handled. Or it might be easier to code >> it using emit_insn (gen_* (...))s instead. >> >> BTW, paradoxical subregs are where the outer mode is strictly larger >> than the inner mode. >> > > That's right. My mis-understanding. > >> MIPS uses essentially the same sequence, except that it has a special >> instruction to do the first rotate (WSBH), rather than it being an instance >> of a general vector rotate. For MIPS we just model it as an unspec SImode >> operation. Maybe that would be easier here too. >> > > I will look at how MIPS is doing it. > > However, the unspec SI has severe performance penalties on my port since > it is able to issue more that one instruction per cycle, therefore > having each instruction separately allows the scheduler to issue each of > the bswapsi parts into different slots with other instructions.
Sorry, I meant we use an unspec for the first ("V2HI") rotate. I.e. rather than: (set (subreg:V2HI (match_dup 2) 0) (rotate:V2HI (subreg:V2HI (match_dup 1) 0) (const_int 8))) (set (match_dup 0) (rotate:SI (match_dup 2) (const_int 16))) we have: (set (match_dup 2) (unspec:SI [(match_dup 1)] UNSPEC_FOO)) (set (match_dup 0) (rotate:SI (match_dup 2) (const_int 16))) In your case the define_insn for the UNSPEC_FOO pattern would have the same attributes as a V2HI rotate, so it should get scheduled in the same way. Thanks, Richard