On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <[email protected]> wrote:
>
> Hi Richi,
>
> Thanks for the comments!
>
> on 2021/7/13 下午5:35, Richard Biener wrote:
> > On Tue, Jul 13, 2021 at 10:53 AM Kewen.Lin <[email protected]> wrote:
> >>
> >> Hi,
> >>
> >> When I added the support for Power10 newly introduced multiply
> >> highpart instrutions, I noticed that currently vectorizer
> >> doesn't try to vectorize multiply highpart pattern, I hope
> >> this isn't intentional?
> >>
> >> This patch is to extend the existing pattern mulhs handlings
> >> to cover multiply highpart. Another alternative seems to
> >> recog mul_highpart operation in a general place applied for
> >> scalar code when the target supports the optab for the scalar
> >> operation, it's based on the assumption that one target which
> >> supports vector version of multiply highpart should have the
> >> scalar version. I noticed that the function can_mult_highpart_p
> >> can check/handle mult_highpart well even without mul_highpart
> >> optab support, I think to recog this pattern in vectorizer
> >> is better. Is it on the right track?
> >
> > I think it's on the right track, using IFN_LAST is a bit awkward
> > in case yet another case pops up so maybe you can use
> > a code_helper instance instead which unifies tree_code,
> > builtin_code and internal_fn?
> >
>
> If there is one new requirement which doesn't have/introduce IFN
> stuffs but have one existing tree_code, can we add one more field
> with type tree_code, then for the IFN_LAST path we can check the
> different requirements under the guard with that tree_code variable?
>
> > I also notice that can_mult_highpart_p will return true if
> > only vec_widen_[us]mult_{even,odd,hi,lo} are available,
> > but then the result might be less optimal (or even not
> > handled later)?
> >
>
> I think it will be handled always? The expander calls
>
> rtx
> expand_mult_highpart (machine_mode mode, rtx op0, rtx op1,
> rtx target, bool uns_p)
>
> which will further check with can_mult_highpart_p.
>
> For the below case,
>
> #define SHT_CNT 16
>
> __attribute__ ((noipa)) void
> test ()
> {
> for (int i = 0; i < N; i++)
> sh_c[i] = ((SI) sh_a[i] * (SI) sh_b[i]) >> 16;
> }
>
> Without this patch, it use widen_mult like below:
>
> vect__1.5_19 = MEM <vector(8) short int> [(short int *)&sh_a + ivtmp.18_24
> * 1];
> vect__3.8_14 = MEM <vector(8) short int> [(short int *)&sh_b + ivtmp.18_24
> * 1];
> vect_patt_22.9_13 = WIDEN_MULT_LO_EXPR <vect__3.8_14, vect__1.5_19>;
> vect_patt_22.9_9 = WIDEN_MULT_HI_EXPR <vect__3.8_14, vect__1.5_19>;
> vect__6.10_25 = vect_patt_22.9_13 >> 16;
> vect__6.10_26 = vect_patt_22.9_9 >> 16;
> vect__7.11_27 = VEC_PACK_TRUNC_EXPR <vect__6.10_25, vect__6.10_26>;
> MEM <vector(8) short int> [(short int *)&sh_c + ivtmp.18_24 * 1] =
> vect__7.11_27;
>
> .L2:
> lxvx 33,7,9
> lxvx 32,8,9
> vmulosh 13,0,1 // widen mult
> vmulesh 0,0,1
> xxmrglw 33,32,45 // merge
> xxmrghw 32,32,45
> vsraw 1,1,12 // shift
> vsraw 0,0,12
> vpkuwum 0,0,1 // pack
> stxvx 32,10,9
> addi 9,9,16
> bdnz .L2
>
>
> With this patch, it ends up with:
>
> vect__1.5_14 = MEM <vector(8) short int> [(short int *)&sh_a + ivtmp.17_24
> * 1];
> vect__3.8_8 = MEM <vector(8) short int> [(short int *)&sh_b + ivtmp.17_24 *
> 1];
> vect_patt_21.9_25 = vect__3.8_8 h* vect__1.5_14;
> MEM <vector(8) short int> [(short int *)&sh_c + ivtmp.17_24 * 1] =
> vect_patt_21.9_25;
Yes, so I'm curious what it ends up with/without the patch on x86_64 which
can do vec_widen_[us]mult_{even,odd} but not [us]mul_highpart.
Richard.
> .L2:
> lxvx 32,8,9
> lxvx 33,10,9
> vmulosh 13,0,1 // widen mult
> vmulesh 0,0,1
> vperm 0,0,13,12 // perm on widen mults
> stxvx 32,7,9
> addi 9,9,16
> bdnz .L2
>
>
> > That is, what about adding optab internal functions
> > for [us]mul_highpart instead, much like the existing
> > ones for MULH{R,}S?
> >
>
> OK, I was thinking the IFN way at the beginning, but was worried
> that it's easy to be blamed saying it's not necessary since there
> is one existing tree_code. :-) Will update it with IFN way.
>
> BR,
> Kewen
>
> > Richard.
> >
> >>
> >> Bootstrapped & regtested on powerpc64le-linux-gnu P9,
> >> x86_64-redhat-linux and aarch64-linux-gnu.
> >>
> >> BR,
> >> Kewen
> >> -----
> >> gcc/ChangeLog:
> >>
> >> * tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to
> >> recog normal multiply highpart.
> >>
>
>