Hi, I forgot to mention that I didn't include a test case. Carl's upcoming patch will cause this to be well tested with the existing test suite, so I think that's not needed. Let me know if you disagree.
Thanks, Bill > On Aug 15, 2017, at 4:14 PM, Bill Schmidt <wschm...@linux.vnet.ibm.com> wrote: > > Hi, > > One of Carl Love's proposed built-in function patches exposed a bug in the > Power > code that recognizes specific permute control vector patterns for a permute, > and > changes the permute to a more specific and more efficient instruction. The > patterns for p8_vmrgew_v4si and p8_vmrgow are generated regardless of > endianness, > leading to problems on the little-endian port. > > The normal way that would cause us to generate these patterns is via the > vec_widen_[su]mult_{even,odd}_<mode> interfaces, which are not yet > instantiated > for Power; hence it appears that we've gotten lucky not to run into this > before. > Carl's proposed patch instantiated these interfaces, triggering the discovery > of > the problem. > > This patch simply changes the handling for p8_vmrg[eo]w to match how it's done > for all of the other common pack/merge/etc. patterns. > > In altivec.md, we already had a p8_vmrgew_v4sf_direct insn that does what we > want. > I generalized this for both V4SF and V4SI modes. I then added a similar > p8_vmrgow_<mode>_direct define_insn. > > The use in rs6000.c of p8_vmrgew_v4sf_direct, rather than > p8_vmrgew_v4si_direct, > is arbitrary. The existing code already handles converting (for free) a V4SI > operand to a V4SF one, so there's no need to specify the mode directly; and it > would actually complicate the code to extract the mode so the "proper" pattern > would match. I think what I have here is better, but if you disagree I can > change it. > > Bootstrapped and tested on powerpc64le-linux-gnu (P8 64-bit) and on > powerpc64-linux-gnu (P7 32- and 64-bit) with no regressions. Is this okay for > trunk? > > Thanks, > Bill > > > 2017-08-15 Bill Schmidt <wschm...@linux.vnet.ibm.com> > > * config/rs6000/altivec.md (UNSPEC_VMRGOW_DIRECT): New constant. > (p8_vmrgew_v4sf_direct): Generalize to p8_vmrgew_<mode>_direct. > (p8_vmrgow_<mode>_direct): New define_insn. > * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Properly > handle endianness for vmrgew and vmrgow permute patterns. > > > Index: gcc/config/rs6000/altivec.md > =================================================================== > --- gcc/config/rs6000/altivec.md (revision 250965) > +++ gcc/config/rs6000/altivec.md (working copy) > @@ -148,6 +148,7 @@ > UNSPEC_VMRGL_DIRECT > UNSPEC_VSPLT_DIRECT > UNSPEC_VMRGEW_DIRECT > + UNSPEC_VMRGOW_DIRECT > UNSPEC_VSUMSWS_DIRECT > UNSPEC_VADDCUQ > UNSPEC_VADDEUQM > @@ -1357,15 +1358,24 @@ > } > [(set_attr "type" "vecperm")]) > > -(define_insn "p8_vmrgew_v4sf_direct" > - [(set (match_operand:V4SF 0 "register_operand" "=v") > - (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "v") > - (match_operand:V4SF 2 "register_operand" "v")] > +(define_insn "p8_vmrgew_<mode>_direct" > + [(set (match_operand:VSX_W 0 "register_operand" "=v") > + (unspec:VSX_W [(match_operand:VSX_W 1 "register_operand" "v") > + (match_operand:VSX_W 2 "register_operand" "v")] > UNSPEC_VMRGEW_DIRECT))] > "TARGET_P8_VECTOR" > "vmrgew %0,%1,%2" > [(set_attr "type" "vecperm")]) > > +(define_insn "p8_vmrgow_<mode>_direct" > + [(set (match_operand:VSX_W 0 "register_operand" "=v") > + (unspec:VSX_W [(match_operand:VSX_W 1 "register_operand" "v") > + (match_operand:VSX_W 2 "register_operand" "v")] > + UNSPEC_VMRGOW_DIRECT))] > + "TARGET_P8_VECTOR" > + "vmrgow %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > (define_expand "vec_widen_umult_even_v16qi" > [(use (match_operand:V8HI 0 "register_operand" "")) > (use (match_operand:V16QI 1 "register_operand" "")) > Index: gcc/config/rs6000/rs6000.c > =================================================================== > --- gcc/config/rs6000/rs6000.c (revision 250965) > +++ gcc/config/rs6000/rs6000.c (working copy) > @@ -35209,9 +35209,13 @@ altivec_expand_vec_perm_const (rtx operands[4]) > (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct > : CODE_FOR_altivec_vmrghw_direct), > { 8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } }, > - { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgew_v4si, > + { OPTION_MASK_P8_VECTOR, > + (BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct > + : CODE_FOR_p8_vmrgow_v4sf_direct), > { 0, 1, 2, 3, 16, 17, 18, 19, 8, 9, 10, 11, 24, 25, 26, 27 } }, > - { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgow, > + { OPTION_MASK_P8_VECTOR, > + (BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgow_v4sf_direct > + : CODE_FOR_p8_vmrgew_v4sf_direct), > { 4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31 } } > }; > >