https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118949
Vineet Gupta <vineetg at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rguenth at gcc dot gnu.org --- Comment #6 from Vineet Gupta <vineetg at gcc dot gnu.org> --- (In reply to Vineet Gupta from comment #4) > Also slightly better test so avoid cpp/installed headers and use bare cc1 > > void func(const float *a, const float *b, float *c) > { > for (long i = 0; i < 1024; ++i) { > float a_l = __builtin_lround(a[i]); > float b_l = __builtin_lround(b[i]); > c[i] = a_l + b_l; > } > } Interestingly the codegen for this test changes with a baremin cc1 vs. one from a glibc toolchain build. In the toolchain version, cc1 is able to transform float output = (float) long __builtin_lround ((double) float input ) to float output = (float) long __builtin_lroundf (float input ) Due to TARGET_LIBC_HAS_FUNCTION linux_libc_has_function So with cc1 what we see is: W/ gdc0dea98c96e02c | Revert gdc0dea98c96e02c | fsrmi 4 | fsrmi 4 vfwcvt.f.f.v v2,v4 | vfwcvt.f.f.v v2,v4 vfwcvt.f.f.v v4,v1 | vfwcvt.f.f.v v4,v1 | vfcvt.x.f.v v2,v2 | vfcvt.x.f.v v2,v2 | vfcvt.x.f.v v4,v4 fsrm a5 | vfncvt.f.x.w v2,v2 | fsrm a4 | vfncvt.f.x.w v2,v2 fsrmi 4 | vfncvt.f.x.w v4,v4 vfcvt.x.f.v v4,v4 | | fsrm a5 | vfncvt.f.x.w v4,v4 | Please note that for this test, I don't see any tree dump delta. The first diff is in expand and that too is the insn from floatrvvm2dirvvm1sf2 getting emitted earlier vs. it getting emitted later - leading to the the alternation of VFCVT and VFNCVT, vs. them being lumped together.