On Fri, Jul 16, 2021 at 7:33 AM Kewen.Lin <li...@linux.ibm.com> wrote: > > on 2021/7/15 下午7:58, Richard Biener wrote: > > On Thu, Jul 15, 2021 at 10:41 AM Kewen.Lin <li...@linux.ibm.com> wrote: > >> > >> on 2021/7/15 下午4:04, Kewen.Lin via Gcc-patches wrote: > >>> Hi Uros, > >>> > >>> on 2021/7/15 下午3:17, Uros Bizjak wrote: > >>>> On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin <li...@linux.ibm.com> wrote: > >>>>> > >>>>> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote: > >>>>>> on 2021/7/14 下午2:38, Richard Biener wrote: > >>>>>>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <li...@linux.ibm.com> wrote: > >>>>>>>> > >>>>>>>> on 2021/7/13 下午8:42, Richard Biener wrote: > >>>>>>>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <li...@linux.ibm.com> > >>>>>>>>> wrote: > >>>>>>> > >>>>>>>> I guess the proposed IFN would be directly mapped for > >>>>>>>> [us]mul_highpart? > >>>>>>> > >>>>>>> Yes. > >>>>>>> > >>>>>> > >>>>>> Thanks for confirming! The related patch v2 is attached and the > >>>>>> testing > >>>>>> is ongoing. > >>>>>> > >>>>> > >>>>> It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and > >>>>> aarch64-linux-gnu. But on x86_64-redhat-linux there are XPASSes as > >>>>> below: > >>>>> > >>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw > >>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw > >>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw > >>>>> XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw > >>>> > >>>> These XFAILs should be removed after your patch. > >>>> > >>> I'm curious whether it's intentional not to specify -fno-vect-cost-model > >>> for this test case. As noted above, this case is sensitive on how we > >>> cost mult_highpart. Without cost modeling, the XFAILs can be removed > >>> only with this mul_highpart pattern support, no matter how we model it > >>> (x86 part of this patch exists or not). > >>> > >>>> This is PR100696 [1], we want PMULH.W here, so x86 part of the patch > >>>> is actually not needed. > >>>> > >>> > >>> Thanks for the information! The justification for the x86 part is that: > >>> the IFN_MULH essentially covers MULT_HIGHPART_EXPR with mul_highpart > >>> optab support, i386 port has already customized costing for > >>> MULT_HIGHPART_EXPR (should mean/involve the case with mul_highpart optab > >>> support), if we don't follow the same way for IFN_MULH, I'm worried that > >>> we may cost the IFN_MULH wrongly. If taking IFN_MULH as normal stmt is > >>> a right thing (we shouldn't cost it specially), it at least means we > >>> have to adjust ix86_multiplication_cost for MULT_HIGHPART_EXPR when it > >>> has direct mul_highpart optab support, I think they should be costed > >>> consistently. Does it sound reasonable? > >>> > >> > >> Hi Richard(s), > >> > >> This possibly inconsistent handling problem seems like a counter example > >> better to use a new IFN rather than the existing tree_code, it seems hard > >> to maintain (should remember to keep consistent for its handlings). ;) > >> From this perspective, maybe it's better to move backward to use tree_code > >> and guard it under can_mult_highpart_p == 1 (just like IFN and avoid > >> costing issue Richi pointed out before)? > >> > >> What do you think? > > > > No, whenever we want to do code generation based on machine > > capabilities the canonical way to test for those is to look at optabs > > and then it's most natural to keep that 1:1 relation and emit > > internal function calls which directly map to supported optabs > > instead of going back to some tree codes. > > > > When targets "lie" and provide expanders for something they can > > only emulate then they have to compensate in their costing. > > But as I understand this isn't the case for x86 here. > > > > Now, in this case we already have the MULT_HIGHPART_EXPR tree, > > so yes, it might make sense to use that instead of introducing an > > alternate way via the direct internal function. Somebody decided > > that MULT_HIGHPART is generic enough to warrant this - but I > > see that expand_mult_highpart can fail unless can_mult_highpart_p > > and this is exactly one of the cases we want to avoid - either > > we can handle something generally in which case it can be a > > tree code or we can't, then it should be 1:1 tied to optabs at best > > (mult_highpart has scalar support only for the direct optab, > > vector support also for widen_mult). > > > > Thanks for the detailed explanation! The attached v4 follows the > preferred IFN way like v3, just with extra test case updates. > > Bootstrapped & regtested again on powerpc64le-linux-gnu P9, > x86_64-redhat-linux and aarch64-linux-gnu. > > Is it ok for trunk?
OK. Thanks, Richard. > BR, > Kewen > ----- > gcc/ChangeLog: > > PR tree-optimization/100696 > * internal-fn.c (first_commutative_argument): Add info for IFN_MULH. > * internal-fn.def (IFN_MULH): New internal function. > * tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to > recog normal multiply highpart as IFN_MULH. > * config/i386/i386.c (ix86_add_stmt_cost): Adjust for combined > function CFN_MULH. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/100696 > * gcc.target/i386/pr100637-3w.c: Adjust for mul_highpart recog.