Re: Help with PR97872

Prathamesh Kulkarni via Gcc Tue, 08 Dec 2020 01:07:14 -0800

On Mon, 7 Dec 2020 at 17:37, Hongtao Liu <[email protected]> wrote:
>
> On Mon, Dec 7, 2020 at 7:11 PM Prathamesh Kulkarni
> <[email protected]> wrote:
> >
> > On Mon, 7 Dec 2020 at 16:15, Hongtao Liu <[email protected]> wrote:
> > >
> > > On Mon, Dec 7, 2020 at 5:47 PM Richard Biener <[email protected]> wrote:
> > > >
> > > > On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote:
> > > >
> > > > > On Mon, 7 Dec 2020 at 13:01, Richard Biener <[email protected]> wrote:
> > > > > >
> > > > > > On Mon, 7 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > >
> > > > > > > On Fri, 4 Dec 2020 at 17:18, Richard Biener <[email protected]> 
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Fri, 4 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > >
> > > > > > > > > On Thu, 3 Dec 2020 at 16:35, Richard Biener 
> > > > > > > > > <[email protected]> wrote:
> > > > > > > > > >
> > > > > > > > > > On Thu, 3 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > > >
> > > > > > > > > > > On Tue, 1 Dec 2020 at 16:39, Richard Biener 
> > > > > > > > > > > <[email protected]> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, 1 Dec 2020, Prathamesh Kulkarni wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > For the test mentioned in PR, I was trying to see if 
> > > > > > > > > > > > > we could do
> > > > > > > > > > > > > specialized expansion for vcond in target when 
> > > > > > > > > > > > > operands are -1 and 0.
> > > > > > > > > > > > > arm_expand_vcond gets the following operands:
> > > > > > > > > > > > > (reg:V8QI 113 [ _2 ])
> > > > > > > > > > > > > (reg:V8QI 117)
> > > > > > > > > > > > > (reg:V8QI 118)
> > > > > > > > > > > > > (lt (reg/v:V8QI 115 [ a ])
> > > > > > > > > > > > >     (reg/v:V8QI 116 [ b ]))
> > > > > > > > > > > > > (reg/v:V8QI 115 [ a ])
> > > > > > > > > > > > > (reg/v:V8QI 116 [ b ])
> > > > > > > > > > > > >
> > > > > > > > > > > > > where r117 and r118 are set to vector constants -1 
> > > > > > > > > > > > > and 0 respectively.
> > > > > > > > > > > > > However, I am not sure if there's a way to check if 
> > > > > > > > > > > > > the register is
> > > > > > > > > > > > > constant during expansion time (since we don't have 
> > > > > > > > > > > > > df analysis yet) ?
> > >
> > > It seems to me that all you need to do is relax the predicates of op1
> > > and op2 in vcondmn to accept const0_rtx and constm1_rtx. I haven't
> > > debugged it, but I see that vcondmn in neon.md only accepts
> > > s_register_operand.
> > >
> > > (define_expand "vcond<mode><mode>"
> > >   [(set (match_operand:VDQW 0 "s_register_operand")
> > >         (if_then_else:VDQW
> > >           (match_operator 3 "comparison_operator"
> > >             [(match_operand:VDQW 4 "s_register_operand")
> > >              (match_operand:VDQW 5 "reg_or_zero_operand")])
> > >           (match_operand:VDQW 1 "s_register_operand")
> > >           (match_operand:VDQW 2 "s_register_operand")))]
> > >   "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
> > > {
> > >   arm_expand_vcond (operands, <V_cmp_result>mode);
> > >   DONE;
> > > })
> > >
> > > in sse.md it's defined as
> > > (define_expand "vcondu<V_512:mode><VI_AVX512BW:mode>"
> > >   [(set (match_operand:V_512 0 "register_operand")
> > >         (if_then_else:V_512
> > >           (match_operator 3 ""
> > >             [(match_operand:VI_AVX512BW 4 "nonimmediate_operand")
> > >              (match_operand:VI_AVX512BW 5 "nonimmediate_operand")])
> > >           (match_operand:V_512 1 "general_operand")
> > >           (match_operand:V_512 2 "general_operand")))]
> > >   "TARGET_AVX512F
> > >    && (GET_MODE_NUNITS (<V_512:MODE>mode)
> > >        == GET_MODE_NUNITS (<VI_AVX512BW:MODE>mode))"
> > > {
> > >   bool ok = ix86_expand_int_vcond (operands);
> > >   gcc_assert (ok);
> > >   DONE;
> > > })
> > >
> > > then we can get operands[1] and operands[2] as
> > >
> > > (gdb) p debug_rtx (operands[1])
> > >  (const_vector:V16QI [
> > >         (const_int -1 [0xffffffffffffffff]) repeated x16
> > >     ])
> > > (gdb) p debug_rtx (operands[2])
> > > (reg:V16QI 82 [ _2 ])
> > > (const_vector:V16QI [
> > >         (const_int 0 [0]) repeated x16
> > >     ])
> > Hi Hongtao,
> > Thanks for the suggestions!
> > However IIUC from vector extensions doc page, the result of vector
> > comparison is defined to be 0
> > or -1, so would it be better to canonicalize
> > x cmp y ? -1 : 0 to x cmp y, on GIMPLE itself during gimple-isel and
> > adjust targets if required ?
>
> Yes, it would be more straightforward to handle it in gimple isel, I
> would adjust the backend and testcase after you check in the patch.
Thanks! I have committed the attached patch in
3a6e3ad38a17a03ee0139b49a0946e7b9ded1eb1.


Regards,
Prathamesh
>
> > Alternatively, I could try fixing this in backend as you suggest above.
> >
> > Thanks,
> > Prathamesh
> > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Alternatively, should we add a target hook that 
> > > > > > > > > > > > > returns true if the
> > > > > > > > > > > > > result of vector comparison is set to all-ones or 
> > > > > > > > > > > > > all-zeros, and then
> > > > > > > > > > > > > use this hook in gimple ISEL to effectively turn 
> > > > > > > > > > > > > VEC_COND_EXPR into nop ?
> > > > > > > > > > > >
> > > > > > > > > > > > Would everything match-up for a .VEC_CMP IFN producing 
> > > > > > > > > > > > a non-mask
> > > > > > > > > > > > vector type?  ISEL could special case the a ? -1 : 0 
> > > > > > > > > > > > case this way.
> > > > > > > > > > > I think the vec_cmp pattern matches but it produces a 
> > > > > > > > > > > masked vector type.
> > > > > > > > > > > In the attached patch, I simply replaced:
> > > > > > > > > > > _1 = a < b
> > > > > > > > > > > x = _1 ? -1 : 0
> > > > > > > > > > > with
> > > > > > > > > > > x = view_convert_expr<_1>
> > > > > > > > > > >
> > > > > > > > > > > For the test-case, isel generates:
> > > > > > > > > > >   vector(8) <signed-boolean:8> _1;
> > > > > > > > > > >   vector(8) signed char _2;
> > > > > > > > > > >   uint8x8_t _5;
> > > > > > > > > > >
> > > > > > > > > > >   <bb 2> [local count: 1073741824]:
> > > > > > > > > > >   _1 = a_3(D) < b_4(D);
> > > > > > > > > > >   _2 = VIEW_CONVERT_EXPR<vector(8) signed char>(_1);
> > > > > > > > > > >   _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_2);
> > > > > > > > > > >   return _5;
> > > > > > > > > > >
> > > > > > > > > > > and results in desired code-gen:
> > > > > > > > > > > f1:
> > > > > > > > > > >         vcgt.s8 d0, d1, d0
> > > > > > > > > > >         bx      lr
> > > > > > > > > > >
> > > > > > > > > > > Altho I guess, we should remove the redundant conversions 
> > > > > > > > > > > during isel itself ?
> > > > > > > > > > > and result in:
> > > > > > > > > > > _1 = a_3(D) < b_4(D)
> > > > > > > > > > > _5 = VIEW_CONVERT_EXPR<uint8x8_t>(_1)
> > > > > > > > > > >
> > > > > > > > > > > (Patch is lightly tested with only vect.exp)
> > > > > > > > > >
> > > > > > > > > > +  /* For targets where result of comparison is all-ones or 
> > > > > > > > > > all-zeros,
> > > > > > > > > > +     a < b ? -1 : 0 can be reduced to a < b.  */
> > > > > > > > > > +
> > > > > > > > > > +  if (integer_minus_onep (op1) && integer_zerop (op2))
> > > > > > > > > > +    {
> > > > > > > > > >
> > > > > > > > > > So this really belongs here:
> > > > > > > > > >
> > > > > > > > > >           tree op0_type = TREE_TYPE (op0);
> > > > > > > > > >           tree op0a_type = TREE_TYPE (op0a);
> > > > > > > > > >
> > > > > > > > > > <---
> > > > > > > > > >
> > > > > > > > > >           if (used_vec_cond_exprs >= 2
> > > > > > > > > >               && (get_vcond_mask_icode (mode, TYPE_MODE 
> > > > > > > > > > (op0_type))
> > > > > > > > > >                   != CODE_FOR_nothing)
> > > > > > > > > >               && expand_vec_cmp_expr_p (op0a_type, 
> > > > > > > > > > op0_type, tcode))
> > > > > > > > > >             {
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > +      gassign *def_stmt = dyn_cast<gassign *> 
> > > > > > > > > > (SSA_NAME_DEF_STMT (op0));
> > > > > > > > > > +      tree op0a = gimple_assign_rhs1 (def_stmt);
> > > > > > > > > > +      tree op0_type = TREE_TYPE (op0);
> > > > > > > > > > +      tree op0a_type = TREE_TYPE (op0a);
> > > > > > > > > > +      enum tree_code tcode = gimple_assign_rhs_code 
> > > > > > > > > > (def_stmt);
> > > > > > > > > > +
> > > > > > > > > > +      if (expand_vec_cmp_expr_p (op0a_type, op0_type, 
> > > > > > > > > > tcode))
> > > > > > > > > > +       {
> > > > > > > > > > +         tree conv_op = build1 (VIEW_CONVERT_EXPR, 
> > > > > > > > > > TREE_TYPE (lhs), op0);
> > > > > > > > > > +         gassign *new_stmt = gimple_build_assign (lhs, 
> > > > > > > > > > conv_op);
> > > > > > > > > > +         gsi_replace (gsi, new_stmt, true);
> > > > > > > > > >
> > > > > > > > > > and you need to verify that the mode of the lhs and the 
> > > > > > > > > > mode of op0
> > > > > > > > > > agree and that the target can actually expand_vec_cmp_expr_p
> > > > > > > > > Thanks for the suggestions, does the attached patch look OK ?
> > > > > > > > > Sorry, I am not sure how to check if target can actually 
> > > > > > > > > expand vec_cmp ?
> > > > > > > > > I assume that since expand_vec_cmp_expr_p queries optab and 
> > > > > > > > > if it gets
> > > > > > > > > a valid cmp icode, that
> > > > > > > > > should be sufficient ?
> > > > > > > >
> > > > > > > > Yes
> > > > > > > Hi Richard,
> > > > > > > I tested the patch, and it shows one regression for pr78102.c, 
> > > > > > > because
> > > > > > > of extra pcmpeqq in code-gen for x != y on x86.
> > > > > > > For the test-case:
> > > > > > > __v2di
> > > > > > > baz (const __v2di x, const __v2di y)
> > > > > > > {
> > > > > > >   return x != y;
> > > > > > > }
> > > > > > >
> > > > > > > Before patch:
> > > > > > > baz:
> > > > > > >         pcmpeqq %xmm1, %xmm0
> > > > > > >         pcmpeqd %xmm1, %xmm1
> > > > > > >         pandn   %xmm1, %xmm0
> > > > > > >         ret
> > > > > > >
> > > > > > > After patch,
> > > > > > > Before ISEL:
> > > > > > >   vector(2) <signed-boolean:64> _1;
> > > > > > >   __v2di _4;
> > > > > > >
> > > > > > >   <bb 2> [local count: 1073741824]:
> > > > > > >   _1 = x_2(D) != y_3(D);
> > > > > > >   _4 = VEC_COND_EXPR <_1, { -1, -1 }, { 0, 0 }>;
> > > > > > >   return _4;
> > > > > > >
> > > > > > > After ISEL:
> > > > > > >   vector(2) <signed-boolean:64> _1;
> > > > > > >   __v2di _4;
> > > > > > >
> > > > > > >   <bb 2> [local count: 1073741824]:
> > > > > > >   _1 = x_2(D) != y_3(D);
> > > > > > >   _4 = VIEW_CONVERT_EXPR<__v2di>(_1);
> > > > > > >   return _4;
> > > > > > >
> > > > > > > which results in:
> > > > > > >         pcmpeqq %xmm1, %xmm0
> > > > > > >         pxor    %xmm1, %xmm1
> > > > > > >         pcmpeqq %xmm1, %xmm0
> > > > > > >         ret
> seems better to be
>
>          pcmpeqq %xmm1, %xmm0
>          pxor    %xmm1, %xmm1
>          pxor %xmm1, %xmm0
>          ret
>
> Anyway, it needs backend adjustment.
>
> > > > > > > IIUC, the new code-gen is essentially comparing two args for 
> > > > > > > equality, and then
> > > > > > > comparing the result against zero to invert it, so it looks 
> > > > > > > correct ?
> > > > > > > I am not sure which of the above two sequences is better tho ?
> > > > > > > If the new code-gen is OK, would it be OK to adjust the test-case 
> > > > > > > ?
> > > > > >
> > > > > > In case pcmpeqq is double-issue the first variant might be faster 
> > > > > > while
> > > > > > the second variant has the advantage of the "free" pxor, but 
> > > > > > back-to-back
> > > > > > pcmpeqq might have an issue.
> > > > > >
> > > > > > I think on GIMPLE the new code is preferable and adjustments are
> > > > > > target business.  I wouldn't be surprised if the x86 backend
> > > > > > special-cases vcond to {-1,-1}, {0,0} already to arrive at the first
> > > > > > variant.
> > > > > >
> > > > > > Did you check how
> > > > > >
> > > > > > a = x != y ? { -1, -1 } : {0, 0 };
> > > > > > b = x != y ? { 1, 2 } : { 3, 4 };
> > > > > >
> > > > > > is handled before/after your patch?  That is, make the comparison
> > > > > > CSEd between two VEC_COND_EXPRs?
> > > > > For the test-case:
> > > > > __v2di f(__v2di, __v2di);
> > > > >
> > > > > __v2di
> > > > > baz (const __v2di x, const __v2di y)
> > > > > {
> > > > >   __v2di a = (x != y);
> > > > >   __v2di b = (x != y) ? (__v2di) {1, 2} : (__v2di) {3, 4};
> > > > >   return f (a, b);
> > > > > }
> > > > >
> > > > > Before patch, isel converts both to .vcondeq:
> > > > >   __v2di b;
> > > > >   __v2di a;
> > > > >   __v2di _8;
> > > > >
> > > > >   <bb 2> [local count: 1073741824]:
> > > > >   a_4 = .VCONDEQ (x_2(D), y_3(D), { -1, -1 }, { 0, 0 }, 114);
> > > > >   b_5 = .VCONDEQ (x_2(D), y_3(D), { 1, 2 }, { 3, 4 }, 114);
> > > > >   _8 = f (a_4, b_5); [tail call]
> > > > >   return _8;
> > > > >
> > > > > and results in following code-gen:
> > > > > _Z3bazDv2_xS_:
> > > > > .LFB5666:
> > > > >         pcmpeqq %xmm1, %xmm0
> > > > >         pcmpeqd %xmm1, %xmm1
> > > > >         movdqa  %xmm0, %xmm2
> > > > >         pandn   %xmm1, %xmm2
> > > > >         movdqa  .LC0(%rip), %xmm1
> > > > >         pblendvb        %xmm0, .LC1(%rip), %xmm1
> > > > >         movdqa  %xmm2, %xmm0
> > > > >         jmp     _Z1fDv2_xS_
> > > > >
> > > > > With patch, isel converts a = (x != y) ? {-1, -1} : {0, 0} to
> > > > > view_convert_expr and the other
> > > > > to vcondeq:
> > > > >   __v2di b;
> > > > >   __v2di a;
> > > > >   vector(2) <signed-boolean:64> _1;
> > > > >   __v2di _8;
> > > > >
> > > > >   <bb 2> [local count: 1073741824]:
> > > > >   _1 = x_2(D) != y_3(D);
> > > > >   a_4 = VIEW_CONVERT_EXPR<__v2di>(_1);
> > > > >   b_5 = .VCONDEQ (x_2(D), y_3(D), { 1, 2 }, { 3, 4 }, 114);
> > > > >   _8 = f (a_4, b_5); [tail call]
> > > > >   return _8;
> > > > >
> > > > > which results in following code-gen:
> > > > > _Z3bazDv2_xS_:
> > > > > .LFB5666:
> > > > >         pcmpeqq %xmm1, %xmm0
> > > > >         pxor    %xmm2, %xmm2
> > > > >         movdqa  .LC0(%rip), %xmm1
> > > > >         pblendvb        %xmm0, .LC1(%rip), %xmm1
> > > > >         pcmpeqq %xmm0, %xmm2
> > > > >         movdqa  %xmm2, %xmm0
> > > > >         jmp     _Z1fDv2_xS_
> > > >
> > > > Ok, thanks for checking.  I think the patch is OK but please let
> > > > Hongtao the chance to comment.
> > > >
> > > > Richard.
> > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > > >
> > > > > > Thanks,
> > > > > > Richard.
> > > > > >
> > > > > >
> > > > > > > Thanks,
> > > > > > > Prathamesh
> > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Prathamesh
> > > > > > > > > >
> > > > > > > > > > Richard.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Prathamesh
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Prathamesh
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Richard Biener <[email protected]>
> > > > > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 
> > > > > > > > > > > > 90409 Nuernberg,
> > > > > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Richard Biener <[email protected]>
> > > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 
> > > > > > > > > > 90409 Nuernberg,
> > > > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Richard Biener <[email protected]>
> > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 
> > > > > > > > Nuernberg,
> > > > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <[email protected]>
> > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 
> > > > > > Nuernberg,
> > > > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > > >
> > > >
> > > > --
> > > > Richard Biener <[email protected]>
> > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
>
>
>
> --
> BR,
> Hongtao

pr97872-3.diff
Description: Binary data

Re: Help with PR97872

Reply via email to