https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720

--- Comment #23 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to rguent...@suse.de from comment #22)
> On Thu, 19 Oct 2023, juzhe.zhong at rivai dot ai wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720
> > 
> > --- Comment #21 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> > (In reply to rguent...@suse.de from comment #20)
> > > On Thu, 19 Oct 2023, juzhe.zhong at rivai dot ai wrote:
> > > 
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111720
> > > > 
> > > > --- Comment #19 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> > > > (In reply to Richard Biener from comment #18)
> > > > > With RVV you have intrinsic calls in GIMPLE so nothing to optimize:
> > > > > 
> > > > > vbool8_t fn ()
> > > > > {
> > > > >   vbool8_t vmask;
> > > > >   vuint8m1_t vand_m;
> > > > >   vuint8m1_t varr;
> > > > >   uint8_t arr[32];
> > > > > 
> > > > >   <bb 2> [local count: 1073741824]:
> > > > >   arr =
> > > > > "\x01\x02\x07\x01\x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t\x01\x02\x07\x01
> > > > > \x03\x04\x05\x03\x01\x00\x01\x02\x04\x04\t\t";
> > > > >   varr_3 = __riscv_vle8_v_u8m1 (&arr, 32); [return slot optimization]
> > > > >   vand_m_4 = __riscv_vand_vx_u8m1 (varr_3, 1, 32); [return slot 
> > > > > optimization]
> > > > >   vmask_5 = __riscv_vreinterpret_v_u8m1_b8 (vand_m_4); [return slot
> > > > > optimization]
> > > > >   <retval> = vmask_5;
> > > > >   arr ={v} {CLOBBER(eol)};
> > > > >   return <retval>;
> > > > > 
> > > > > and on RTL I see lots of UNSPECs, RTL opts cannot do anything with 
> > > > > those.
> > > > > 
> > > > > This is what Andrew said already.
> > > > 
> > > > Ok. I wonder why this issue is gone when I change it into:
> > > > 
> > > > arr as static
> > > > 
> > > > https://godbolt.org/z/Tdoshdfr6
> > > 
> > > Because the stacik initialization isn't required then.
> > 
> > I have experiment with a simplifed pattern:
> > 
> > 
> > (insn 14 13 15 2 (set (reg/v:RVVM1QI 134 [ varr ])
> >         (if_then_else:RVVM1QI (unspec:RVVMF8BI [
> >                     (const_vector:RVVMF8BI repeat [
> >                             (const_int 1 [0x1])
> >                         ])
> >                     (reg:DI 143)
> >                     (const_int 2 [0x2]) repeated x2
> >                     (const_int 0 [0])
> >                     (reg:SI 66 vl)
> >                     (reg:SI 67 vtype)
> >                 ] UNSPEC_VPREDICATE)
> >             (mem:RVVM1QI (reg:DI 142) [0  S[16, 16] A8])
> >             (const_vector:RVVM1QI repeat [
> >                     (const_int 0 [0])
> >                 ]))) "rvv.c":5:23 1476 {*pred_movrvvm1qi}
> >      (nil))
> > (insn 15 14 16 2 (set (reg:DI 144)
> >         (const_int 32 [0x20])) "rvv.c":6:5 206 {*movdi_64bit}
> >      (nil))
> > (insn 16 15 0 2 (set (mem:RVVM1QI (reg/v/f:DI 135 [ out ]) [0  S[16, 16] 
> > A8])
> >         (if_then_else:RVVM1QI (unspec:RVVMF8BI [
> >                     (const_vector:RVVMF8BI repeat [
> >                             (const_int 1 [0x1])
> >                         ])
> >                     (reg:DI 144)
> >                     (const_int 0 [0])
> >                     (reg:SI 66 vl)
> >                     (reg:SI 67 vtype)
> >                 ] UNSPEC_VPREDICATE)
> >             (reg/v:RVVM1QI 134 [ varr ])
> >             (mem:RVVM1QI (reg/v/f:DI 135 [ out ]) [0  S[16, 16] A8])))
> > "rvv.c":6:5 1592 {pred_storervvm1qi}
> >      (nil))
> > 
> > You can see there is only one UNSPEC now. Still has redundant stack
> > transferring.
> > 
> > Is it because the pattern too complicated?
> 
> It's because it has an UNSPEC in it - that makes it have target
> specific (unknown to the middle-end) behavior so nothing can
> be optimized here.
> 
> Specifically passes likely refuse to replace MEM operands in
> such a construct.

I saw ARM SVE load/store intrinsic also have UNSPEC.
They don't have such issues.

https://godbolt.org/z/fsW6Ko93z

But their patterns are much simplier than RVV patterns. 

I am still trying find a way to optimize the RVV pattern for that.
However, it seems to be very diffcult since we are trying to merge each type
intrinsics into same single pattern to avoid explosion of the insn-ouput.cc
and insn-emit.cc

Reply via email to