https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118945
--- Comment #10 from Vineet Gupta <vineetg at gcc dot gnu.org> --- (In reply to JuzheZhong from comment #9) > > I think we should consider many more different situation and consider it > carefully. Like: > > vsetvli ... e8,mf8 ta ma (demand ratio) > ... > vservli zero zero e32 mf2 tu ma (demand ratio) > ... > vservli zero zero e64 m1 ta ma (demand SEW and LMUL) > ... > vservli zero zero e64 m1 ta mu (demand ratio) > ... > vservli zero zero e16 mf4 tu mu(demand ratio) > ... > vservli zero zero e32 mf2 ta ma(demand ratio) > ... > vservli zero zero e8 mf8 ta ma(demand ratio) > > In current strategy, 7 "vsetvli" will be fused into 1 single "vsetvli": > > vservli ... e64 m1 tu mu > > However, if you just keep agnostic not allow to fuse it, you will end up > with 6 more "vsetvli"s. I don't think this codegen can better in any > micro-architecture design. While the orig test was too simple and contrived, this is too complex and contrived :-) I'd argue that if there's such toggling of tail and mask policies then yeah its fine to have so many vsetvls. We all agree this will be a cpu tune to retain the existing behavior while providing new behavior as opt-in for uarches that deem fit.