https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118949

Vineet Gupta <vineetg at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #6 from Vineet Gupta <vineetg at gcc dot gnu.org> ---
(In reply to Vineet Gupta from comment #4)
> Also slightly better test so avoid cpp/installed headers and use bare cc1
> 
> void func(const float *a, const float *b, float *c)
> {
>     for (long i = 0; i < 1024; ++i) {
>         float a_l = __builtin_lround(a[i]);
>         float b_l = __builtin_lround(b[i]);
>         c[i] = a_l + b_l;
>     }
> }

Interestingly the codegen for this test changes with a baremin cc1 vs. one from
a  glibc toolchain build.

In the toolchain version, cc1 is able to transform 
   float output = (float) long __builtin_lround ((double) float input )
to
   float output = (float) long __builtin_lroundf (float input )

Due to 
   TARGET_LIBC_HAS_FUNCTION linux_libc_has_function

So with cc1 what we see is:

  W/ gdc0dea98c96e02c  |  Revert gdc0dea98c96e02c 
                       |
  fsrmi 4              |  fsrmi 4
  vfwcvt.f.f.v  v2,v4  |  vfwcvt.f.f.v  v2,v4
  vfwcvt.f.f.v  v4,v1  |  vfwcvt.f.f.v  v4,v1
                       | 
  vfcvt.x.f.v   v2,v2  |  vfcvt.x.f.v   v2,v2
                       |  vfcvt.x.f.v   v4,v4
  fsrm  a5             | 
  vfncvt.f.x.w  v2,v2  |  fsrm  a4
                       |  vfncvt.f.x.w  v2,v2
  fsrmi 4              |  vfncvt.f.x.w  v4,v4
  vfcvt.x.f.v   v4,v4  |
                       |
  fsrm  a5             |
  vfncvt.f.x.w  v4,v4  |


Please note that for this test, I don't see any tree dump delta.
The first diff is in expand and that too is the insn from floatrvvm2dirvvm1sf2
getting emitted earlier vs. it getting emitted later - leading to the the
alternation of VFCVT and VFNCVT, vs. them being lumped together.

Reply via email to