On Fri, Jul 16, 2021 at 7:33 AM Kewen.Lin <li...@linux.ibm.com> wrote:
>
> on 2021/7/15 下午7:58, Richard Biener wrote:
> > On Thu, Jul 15, 2021 at 10:41 AM Kewen.Lin <li...@linux.ibm.com> wrote:
> >>
> >> on 2021/7/15 下午4:04, Kewen.Lin via Gcc-patches wrote:
> >>> Hi Uros,
> >>>
> >>> on 2021/7/15 下午3:17, Uros Bizjak wrote:
> >>>> On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin <li...@linux.ibm.com> wrote:
> >>>>>
> >>>>> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
> >>>>>> on 2021/7/14 下午2:38, Richard Biener wrote:
> >>>>>>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <li...@linux.ibm.com> wrote:
> >>>>>>>>
> >>>>>>>> on 2021/7/13 下午8:42, Richard Biener wrote:
> >>>>>>>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <li...@linux.ibm.com> 
> >>>>>>>>> wrote:
> >>>>>>>
> >>>>>>>> I guess the proposed IFN would be directly mapped for 
> >>>>>>>> [us]mul_highpart?
> >>>>>>>
> >>>>>>> Yes.
> >>>>>>>
> >>>>>>
> >>>>>> Thanks for confirming!  The related patch v2 is attached and the 
> >>>>>> testing
> >>>>>> is ongoing.
> >>>>>>
> >>>>>
> >>>>> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
> >>>>> aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as 
> >>>>> below:
> >>>>>
> >>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> >>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
> >>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> >>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
> >>>>
> >>>> These XFAILs should be removed after your patch.
> >>>>
> >>> I'm curious whether it's intentional not to specify -fno-vect-cost-model
> >>> for this test case.  As noted above, this case is sensitive on how we
> >>> cost mult_highpart.  Without cost modeling, the XFAILs can be removed
> >>> only with this mul_highpart pattern support, no matter how we model it
> >>> (x86 part of this patch exists or not).
> >>>
> >>>> This is PR100696 [1], we want PMULH.W here, so x86 part of the patch
> >>>> is actually not needed.
> >>>>
> >>>
> >>> Thanks for the information!  The justification for the x86 part is that:
> >>> the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart
> >>> optab support, i386 port has already customized costing for
> >>> MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab
> >>> support), if we don't follow the same way for IFN_MULH, I'm worried that
> >>> we may cost the IFN_MULH wrongly.  If taking IFN_MULH as normal stmt is
> >>> a right thing (we shouldn't cost it specially), it at least means we
> >>> have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it
> >>> has direct mul_highpart optab support, I think they should be costed
> >>> consistently.  Does it sound reasonable?
> >>>
> >>
> >> Hi Richard(s),
> >>
> >> This possibly inconsistent handling problem seems like a counter example
> >> better to use a new IFN rather than the existing tree_code, it seems hard
> >> to maintain (should remember to keep consistent for its handlings).  ;)
> >> From this perspective, maybe it's better to move backward to use tree_code
> >> and guard it under can_mult_highpart_p == 1 (just like IFN and avoid
> >> costing issue Richi pointed out before)?
> >>
> >> What do you think?
> >
> > No, whenever we want to do code generation based on machine
> > capabilities the canonical way to test for those is to look at optabs
> > and then it's most natural to keep that 1:1 relation and emit
> > internal function calls which directly map to supported optabs
> > instead of going back to some tree codes.
> >
> > When targets "lie" and provide expanders for something they can
> > only emulate then they have to compensate in their costing.
> > But as I understand this isn't the case for x86 here.
> >
> > Now, in this case we already have the MULT_HIGHPART_EXPR tree,
> > so yes, it might make sense to use that instead of introducing an
> > alternate way via the direct internal function.  Somebody decided
> > that MULT_HIGHPART is generic enough to warrant this - but I
> > see that expand_mult_highpart can fail unless can_mult_highpart_p
> > and this is exactly one of the cases we want to avoid - either
> > we can handle something generally in which case it can be a
> > tree code or we can't, then it should be 1:1 tied to optabs at best
> > (mult_highpart has scalar support only for the direct optab,
> > vector support also for widen_mult).
> >
>
> Thanks for the detailed explanation!  The attached v4 follows the
> preferred IFN way like v3, just with extra test case updates.
>
> Bootstrapped & regtested again on powerpc64le-linux-gnu P9,
> x86_64-redhat-linux and aarch64-linux-gnu.
>
> Is it ok for trunk?

OK.

Thanks,
Richard.

> BR,
> Kewen
> -----
> gcc/ChangeLog:
>
>         PR tree-optimization/100696
>         * internal-fn.c (first_commutative_argument): Add info for IFN_MULH.
>         * internal-fn.def (IFN_MULH): New internal function.
>         * tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to
>         recog normal multiply highpart as IFN_MULH.
>         * config/i386/i386.c (ix86_add_stmt_cost): Adjust for combined
>         function CFN_MULH.
>
> gcc/testsuite/ChangeLog:
>
>         PR tree-optimization/100696
>         * gcc.target/i386/pr100637-3w.c: Adjust for mul_highpart recog.

Reply via email to