https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832
Hongtao.liu <crazylht at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |crazylht at gmail dot com --- Comment #9 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Uroš Bizjak from comment #8) > (In reply to Richard Biener from comment #6) > > Do we know whether we could in theory improve the sanitizing by optimization > > without -funsafe-math-optimizations (I think -fno-trapping-math, > > -ffinite-math-only -fno-signalling-nans should be a better guard?)? > > Regarding the sanitizing, we can remove all sanitizing MOVQ instructions > between trapping instructions (IOW, the result of ADDPS is guaranteed to > have zeros in the high part outside V2SF, so MOVQ is unnecessary in front of > a follow-up MULPS). > > I think that some instruction back-walking pass on the RTL insn stream would > be able to identify these unnecessary instructions and remove them. > V2SFmode operand can be produced by direct patterns or SUBREG, I'm thinking about only sanitizing those V2SFmode operations when there's a subreg in source operand and make sure every other patterns which set V2SFmode dest will clear upper bits.(inlucde mov<mode>_internal,vec_concatv2sf_sse4_1,sse_storehps,sse_storehps,*vec_concatv2sf_sse) for mov<mode>_internal, we can just set alternative (v,v) with mode DI, then it will use vmovq, for other alternatives which set sse_regs, the instructions has already cleared the upper bits. For vec_concatv2sf_sse4_1/sse_storehps/sse_storehps/*vec_concatv2sf_sse, we can change them into define_insn_and_split, splitting into a V4SF instruction(like we did for those V2SFmode patterns), and use SUBREG for the dest or explicitly sanitizing the dest. BTW looks like *vec_concatv2df_sse4_1 can be merged into *vec_concatv2sf_sse