https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118945

--- Comment #10 from Vineet Gupta <vineetg at gcc dot gnu.org> ---
(In reply to JuzheZhong from comment #9)
> 
> I think we should consider many more different situation and consider it
> carefully. Like:
> 
> vsetvli ... e8,mf8 ta ma (demand ratio)
> ...
> vservli zero zero e32 mf2 tu ma (demand ratio)
> ...
> vservli zero zero e64 m1 ta ma (demand SEW and LMUL)
> ...
> vservli zero zero e64 m1 ta mu (demand ratio)
> ...
> vservli zero zero e16 mf4 tu mu(demand ratio)
> ...
> vservli zero zero e32 mf2 ta ma(demand ratio)
> ...
> vservli zero zero e8 mf8 ta ma(demand ratio)
> 
> In current strategy, 7 "vsetvli" will be fused into 1 single "vsetvli":
> 
> vservli ... e64 m1 tu mu
> 
> However, if you just keep agnostic not allow to fuse it, you will end up
> with 6 more "vsetvli"s. I don't think this codegen can better in any
> micro-architecture design.

While the orig test was too simple and contrived, this is too complex and
contrived :-)  I'd argue that if there's such toggling of tail and mask
policies then yeah its fine to have so many vsetvls.

We all agree this will be a cpu tune to retain the existing behavior while
providing new behavior as opt-in for uarches that deem fit.

Reply via email to