[Bug middle-end/110832] [14 Regression] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core

crazylht at gmail dot com via Gcc-bugs Sun, 30 Jul 2023 21:58:12 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832


Hongtao.liu <crazylht at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |crazylht at gmail dot com

--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #8)
> (In reply to Richard Biener from comment #6)
> > Do we know whether we could in theory improve the sanitizing by optimization
> > without -funsafe-math-optimizations (I think -fno-trapping-math,
> > -ffinite-math-only -fno-signalling-nans should be a better guard?)?
> 
> Regarding the sanitizing, we can remove all sanitizing MOVQ instructions
> between trapping instructions (IOW, the result of ADDPS is guaranteed to
> have zeros in the high part outside V2SF, so MOVQ is unnecessary in front of
> a follow-up MULPS).
> 
> I think that some instruction back-walking pass on the RTL insn stream would
> be able to identify these unnecessary instructions and remove them.
> 

V2SFmode operand can be produced by direct patterns or SUBREG,
I'm thinking about only sanitizing those V2SFmode operations when there's a
subreg in source operand and make sure every other patterns which set V2SFmode
dest will clear upper bits.(inlucde
mov<mode>_internal,vec_concatv2sf_sse4_1,sse_storehps,sse_storehps,*vec_concatv2sf_sse)
for mov<mode>_internal, we can just set alternative (v,v) with mode DI, then it
will use vmovq, for other alternatives which set sse_regs, the instructions has
already cleared the upper bits.

For vec_concatv2sf_sse4_1/sse_storehps/sse_storehps/*vec_concatv2sf_sse, we can
change them into define_insn_and_split,  splitting into a V4SF instruction(like
we did for those V2SFmode patterns), and use SUBREG for the dest or explicitly
sanitizing the dest.


BTW looks like *vec_concatv2df_sse4_1 can be merged into *vec_concatv2sf_sse

[Bug middle-end/110832] [14 Regression] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core

Reply via email to