https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762
--- Comment #17 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Uroš Bizjak from comment #16) > Created attachment 55636 [details] > Proposed patch > > Proposed patch clears the upper half of a V4SFmode operand register before > all potentially trapping instructions. The testcase from comment #12 now > compiles to: > > movq %xmm1, %xmm1 # 9 [c=4 l=4] *vec_concatv4sf_0 > movq %xmm0, %xmm0 # 10 [c=4 l=4] *vec_concatv4sf_0 > addps %xmm1, %xmm0 # 11 [c=12 l=3] *addv4sf3/0 > > This approach addresses issues with traps (Comment #0), as well as with > denormal/invalid values (Comment #14). An obvious exception to the rule is a > division, where the value != 0.0 should be loaded into the upper half of the > denominator. > > The patch effectively tightens the solution from PR95046 by clearing upper > halves of all operand registers before every potentially trapping > instruction. The testcase: > > --cut here-- > typedef float __attribute__((vector_size(8))) v2sf; > > v2sf test (v2sf a, v2sf b, v2sf c) > { > return a * b - c; > } > --cut here-- > > compiles to: > > movq %xmm1, %xmm1 # 8 [c=4 l=4] *vec_concatv4sf_0 > movq %xmm0, %xmm0 # 9 [c=4 l=4] *vec_concatv4sf_0 > movq %xmm2, %xmm2 # 12 [c=4 l=4] *vec_concatv4sf_0 > mulps %xmm1, %xmm0 # 10 [c=16 l=3] *mulv4sf3/0 > movq %xmm0, %xmm0 # 13 [c=4 l=4] *vec_concatv4sf_0 so this one is obviously redundant - I suppose at the RTL level we have no chance of noticing this. I hope for integer vector operations we avoid these ops? I think this will make epilog vectorization with V2SFmode a bad idea, we'd need to appropriately disqualify this in the costing hooks. I wonder if combine could for example combine a v2sf load with the upper half zeroing for the next use? Likewise for arithmetics. > subps %xmm2, %xmm0 # 14 [c=12 l=3] *subv4sf3/0 > > The implementation simply calls V4SFmode operation, so we can remove all > "emulated" SSE2 V2SFmode instructions and SSE2 V2SFmode alternatives from > 3dNOW! insn patterns.