https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #26 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 19 Oct 2023, juzhe.zhong at rivai dot ai wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720
> 
> --- Comment #25 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> (In reply to rguent...@suse.de from comment #24)
> > On Thu, 19 Oct 2023, juzhe.zhong at rivai dot ai wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720
> > > 
> > > --- Comment #23 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> > > (In reply to rguent...@suse.de from comment #22)
> > > > On Thu, 19 Oct 2023, juzhe.zhong at rivai dot ai wrote:
> > > > 
> > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720
> > > > > 
> > > > > --- Comment #21 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> > > > > (In reply to rguent...@suse.de from comment #20)
> > > > > > On Thu, 19 Oct 2023, juzhe.zhong at rivai dot ai wrote:
> > > > > > 
> > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720
> > > > > > > 
> > > > > > > --- Comment #19 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> > > > > > > (In reply to Richard Biener from comment #18)
> > > > > > > > With RVV you have intrinsic calls in GIMPLE so nothing to 
> > > > > > > > optimize:
> > > > > > > > 
> > > > > > > > vbool8_t fn ()
> > > > > > > > {
> > > > > > > >   vbool8_t vmask;
> > > > > > > >   vuint8m1_t vand_m;
> > > > > > > >   vuint8m1_t varr;
> > > > > > > >   uint8_t arr[32];
> > > > > > > > 
> > > > > > > >   <bb 2> [local count: 1073741824]:
> > > > > > > >   arr =
> > > > > > > > "\x01\x02\x07\x01\x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t\x01\x02\x07\x01
> > > > > > > > \x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t";
> > > > > > > >   varr_3 = __riscv_vle8_v_u8m1 (&arr, 32); [return slot 
> > > > > > > > optimization]
> > > > > > > >   vand_m_4 = __riscv_vand_vx_u8m1 (varr_3, 1, 32); [return slot 
> > > > > > > > optimization]
> > > > > > > >   vmask_5 = __riscv_vreinterpret_v_u8m1_b8 (vand_m_4); [return 
> > > > > > > > slot
> > > > > > > > optimization]
> > > > > > > >   <retval> = vmask_5;
> > > > > > > >   arr ={v} {CLOBBER(eol)};
> > > > > > > >   return <retval>;
> > > > > > > > 
> > > > > > > > and on RTL I see lots of UNSPECs, RTL opts cannot do anything 
> > > > > > > > with those.
> > > > > > > > 
> > > > > > > > This is what Andrew said already.
> > > > > > > 
> > > > > > > Ok. I wonder why this issue is gone when I change it into:
> > > > > > > 
> > > > > > > arr as static
> > > > > > > 
> > > > > > > https://godbolt.org/z/Tdoshdfr6
> > > > > > 
> > > > > > Because the stacik initialization isn't required then.
> > > > > 
> > > > > I have experiment with a simplifed pattern:
> > > > > 
> > > > > 
> > > > > (insn 14 13 15 2 (set (reg/v:RVVM1QI 134 [ varr ])
> > > > >         (if_then_else:RVVM1QI (unspec:RVVMF8BI [
> > > > >                     (const_vector:RVVMF8BI repeat [
> > > > >                             (const_int 1 [0x1])
> > > > >                         ])
> > > > >                     (reg:DI 143)
> > > > >                     (const_int 2 [0x2]) repeated x2
> > > > >                     (const_int 0 [0])
> > > > >                     (reg:SI 66 vl)
> > > > >                     (reg:SI 67 vtype)
> > > > >                 ] UNSPEC_VPREDICATE)
> > > > >             (mem:RVVM1QI (reg:DI 142) [0  S[16, 16] A8])
> > > > >             (const_vector:RVVM1QI repeat [
> > > > >                     (const_int 0 [0])
> > > > >                 ]))) "rvv.c":5:23 1476 {*pred_movrvvm1qi}
> > > > >      (nil))
> > > > > (insn 15 14 16 2 (set (reg:DI 144)
> > > > >         (const_int 32 [0x20])) "rvv.c":6:5 206 {*movdi_64bit}
> > > > >      (nil))
> > > > > (insn 16 15 0 2 (set (mem:RVVM1QI (reg/v/f:DI 135 [ out ]) [0  S[16, 
> > > > > 16] A8])
> > > > >         (if_then_else:RVVM1QI (unspec:RVVMF8BI [
> > > > >                     (const_vector:RVVMF8BI repeat [
> > > > >                             (const_int 1 [0x1])
> > > > >                         ])
> > > > >                     (reg:DI 144)
> > > > >                     (const_int 0 [0])
> > > > >                     (reg:SI 66 vl)
> > > > >                     (reg:SI 67 vtype)
> > > > >                 ] UNSPEC_VPREDICATE)
> > > > >             (reg/v:RVVM1QI 134 [ varr ])
> > > > >             (mem:RVVM1QI (reg/v/f:DI 135 [ out ]) [0  S[16, 16] A8])))
> > > > > "rvv.c":6:5 1592 {pred_storervvm1qi}
> > > > >      (nil))
> > > > > 
> > > > > You can see there is only one UNSPEC now. Still has redundant stack
> > > > > transferring.
> > > > > 
> > > > > Is it because the pattern too complicated?
> > > > 
> > > > It's because it has an UNSPEC in it - that makes it have target
> > > > specific (unknown to the middle-end) behavior so nothing can
> > > > be optimized here.
> > > > 
> > > > Specifically passes likely refuse to replace MEM operands in
> > > > such a construct.
> > > 
> > > I saw ARM SVE load/store intrinsic also have UNSPEC.
> > > They don't have such issues.
> > > 
> > > https://godbolt.org/z/fsW6Ko93z
> > > 
> > > But their patterns are much simplier than RVV patterns. 
> > > 
> > > I am still trying find a way to optimize the RVV pattern for that.
> > > However, it seems to be very diffcult since we are trying to merge each 
> > > type
> > > intrinsics into same single pattern to avoid explosion of the 
> > > insn-ouput.cc
> > > and insn-emit.cc
> > 
> > They also expose the semantics to GIMPLE instead of keeping
> > builtin function calls:
> > 
> > void fn (svbool_t pg, uint8_t * out)
> > {
> >   svuint8_t varr;
> >   static uint8_t arr[32] = 
> > "\x01\x02\x07\x01\x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t\x01\x02\x07\x01
> > \x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t";
> > 
> >   <bb 2> [local count: 1073741824]:
> >   varr_3 = .MASK_LOAD (&arr, 8B, pg_2(D));
> >   .MASK_STORE (out_4(D), 8B, pg_2(D), varr_3); [tail call]
> >   return;
> 
> Yeah. I noticed but the autovectorization patterns doesn't match RVV
> intrinsics.
> So I can't fold them into MASK_LEN_LOAD... since RVV intrinsics are more
> complicated.
> 
> It seems that it's impossible that we can't fix it in middle-end.
> Maybe we should add a RISC-V specific PASS to optimize it?

You can look what combine tries to recognize, maybe it needs some
helper patterns.  Other than that, what's the issue with GIMPLE
and the intrinsics?

Reply via email to