https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762

--- Comment #17 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Uroš Bizjak from comment #16)
> Created attachment 55636 [details]
> Proposed patch
> 
> Proposed patch clears the upper half of a V4SFmode operand register before
> all potentially trapping instructions. The testcase from comment #12 now
> compiles to:
> 
>         movq    %xmm1, %xmm1    # 9     [c=4 l=4]  *vec_concatv4sf_0
>         movq    %xmm0, %xmm0    # 10    [c=4 l=4]  *vec_concatv4sf_0
>         addps   %xmm1, %xmm0    # 11    [c=12 l=3]  *addv4sf3/0
> 
> This approach addresses issues with traps (Comment #0), as well as with
> denormal/invalid values (Comment #14). An obvious exception to the rule is a
> division, where the value != 0.0 should be loaded into the upper half of the
> denominator.
> 
> The patch effectively tightens the solution from PR95046 by clearing upper
> halves of all operand registers before every potentially trapping
> instruction. The testcase:
> 
> --cut here--
> typedef float __attribute__((vector_size(8))) v2sf;
> 
> v2sf test (v2sf a, v2sf b, v2sf c)
> {
>   return a * b - c;
> }
> --cut here--
> 
> compiles to:
> 
>         movq    %xmm1, %xmm1    # 8     [c=4 l=4]  *vec_concatv4sf_0
>         movq    %xmm0, %xmm0    # 9     [c=4 l=4]  *vec_concatv4sf_0
>         movq    %xmm2, %xmm2    # 12    [c=4 l=4]  *vec_concatv4sf_0
>         mulps   %xmm1, %xmm0    # 10    [c=16 l=3]  *mulv4sf3/0
>         movq    %xmm0, %xmm0    # 13    [c=4 l=4]  *vec_concatv4sf_0

so this one is obviously redundant - I suppose at the RTL level we have
no chance of noticing this.  I hope for integer vector operations we
avoid these ops?  I think this will make epilog vectorization with V2SFmode
a bad idea, we'd need to appropriately disqualify this in the costing
hooks.

I wonder if combine could for example combine a v2sf load with the
upper half zeroing for the next use?  Likewise for arithmetics.

>         subps   %xmm2, %xmm0    # 14    [c=12 l=3]  *subv4sf3/0
> 
> The implementation simply calls V4SFmode operation, so we can remove all
> "emulated" SSE2 V2SFmode instructions and SSE2 V2SFmode alternatives from
> 3dNOW! insn patterns.

Reply via email to