Hi,

I forgot to mention that I didn't include a test case.  Carl's upcoming patch 
will cause
this to be well tested with the existing test suite, so I think that's not 
needed.  Let me
know if you disagree.

Thanks,
Bill

> On Aug 15, 2017, at 4:14 PM, Bill Schmidt <wschm...@linux.vnet.ibm.com> wrote:
> 
> Hi,
> 
> One of Carl Love's proposed built-in function patches exposed a bug in the 
> Power
> code that recognizes specific permute control vector patterns for a permute, 
> and
> changes the permute to a more specific and more efficient instruction.  The
> patterns for p8_vmrgew_v4si and p8_vmrgow are generated regardless of 
> endianness,
> leading to problems on the little-endian port.
> 
> The normal way that would cause us to generate these patterns is via the
> vec_widen_[su]mult_{even,odd}_<mode> interfaces, which are not yet 
> instantiated
> for Power; hence it appears that we've gotten lucky not to run into this 
> before.
> Carl's proposed patch instantiated these interfaces, triggering the discovery 
> of
> the problem.
> 
> This patch simply changes the handling for p8_vmrg[eo]w to match how it's done
> for all of the other common pack/merge/etc. patterns.
> 
> In altivec.md, we already had a p8_vmrgew_v4sf_direct insn that does what we 
> want.
> I generalized this for both V4SF and V4SI modes.  I then added a similar
> p8_vmrgow_<mode>_direct define_insn.
> 
> The use in rs6000.c of p8_vmrgew_v4sf_direct, rather than 
> p8_vmrgew_v4si_direct,
> is arbitrary.  The existing code already handles converting (for free) a V4SI
> operand to a V4SF one, so there's no need to specify the mode directly; and it
> would actually complicate the code to extract the mode so the "proper" pattern
> would match.  I think what I have here is better, but if you disagree I can
> change it.
> 
> Bootstrapped and tested on powerpc64le-linux-gnu (P8 64-bit) and on
> powerpc64-linux-gnu (P7 32- and 64-bit) with no regressions.  Is this okay for
> trunk?
> 
> Thanks,
> Bill
> 
> 
> 2017-08-15  Bill Schmidt  <wschm...@linux.vnet.ibm.com>
> 
>       * config/rs6000/altivec.md (UNSPEC_VMRGOW_DIRECT): New constant.
>       (p8_vmrgew_v4sf_direct): Generalize to p8_vmrgew_<mode>_direct.
>       (p8_vmrgow_<mode>_direct): New define_insn.
>       * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Properly
>       handle endianness for vmrgew and vmrgow permute patterns.
> 
> 
> Index: gcc/config/rs6000/altivec.md
> ===================================================================
> --- gcc/config/rs6000/altivec.md      (revision 250965)
> +++ gcc/config/rs6000/altivec.md      (working copy)
> @@ -148,6 +148,7 @@
>    UNSPEC_VMRGL_DIRECT
>    UNSPEC_VSPLT_DIRECT
>    UNSPEC_VMRGEW_DIRECT
> +   UNSPEC_VMRGOW_DIRECT
>    UNSPEC_VSUMSWS_DIRECT
>    UNSPEC_VADDCUQ
>    UNSPEC_VADDEUQM
> @@ -1357,15 +1358,24 @@
> }
>   [(set_attr "type" "vecperm")])
> 
> -(define_insn "p8_vmrgew_v4sf_direct"
> -  [(set (match_operand:V4SF 0 "register_operand" "=v")
> -     (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "v")
> -                   (match_operand:V4SF 2 "register_operand" "v")]
> +(define_insn "p8_vmrgew_<mode>_direct"
> +  [(set (match_operand:VSX_W 0 "register_operand" "=v")
> +     (unspec:VSX_W [(match_operand:VSX_W 1 "register_operand" "v")
> +                    (match_operand:VSX_W 2 "register_operand" "v")]
>                    UNSPEC_VMRGEW_DIRECT))]
>   "TARGET_P8_VECTOR"
>   "vmrgew %0,%1,%2"
>   [(set_attr "type" "vecperm")])
> 
> +(define_insn "p8_vmrgow_<mode>_direct"
> +  [(set (match_operand:VSX_W 0 "register_operand" "=v")
> +     (unspec:VSX_W [(match_operand:VSX_W 1 "register_operand" "v")
> +                    (match_operand:VSX_W 2 "register_operand" "v")]
> +                  UNSPEC_VMRGOW_DIRECT))]
> +  "TARGET_P8_VECTOR"
> +  "vmrgow %0,%1,%2"
> +  [(set_attr "type" "vecperm")])
> +
> (define_expand "vec_widen_umult_even_v16qi"
>   [(use (match_operand:V8HI 0 "register_operand" ""))
>    (use (match_operand:V16QI 1 "register_operand" ""))
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c        (revision 250965)
> +++ gcc/config/rs6000/rs6000.c        (working copy)
> @@ -35209,9 +35209,13 @@ altivec_expand_vec_perm_const (rtx operands[4])
>       (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct
>        : CODE_FOR_altivec_vmrghw_direct),
>       {  8,  9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } },
> -    { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgew_v4si,
> +    { OPTION_MASK_P8_VECTOR,
> +      (BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct
> +       : CODE_FOR_p8_vmrgow_v4sf_direct),
>       {  0,  1,  2,  3, 16, 17, 18, 19,  8,  9, 10, 11, 24, 25, 26, 27 } },
> -    { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgow,
> +    { OPTION_MASK_P8_VECTOR,
> +      (BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgow_v4sf_direct
> +       : CODE_FOR_p8_vmrgew_v4sf_direct),
>       {  4,  5,  6,  7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31 } }
>   };
> 
> 

Reply via email to