[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

Andy Kaylor via Phabricator via cfe-commits Thu, 10 Mar 2022 14:53:13 -0800

andrew.w.kaylor added a comment.

This example illustrates the problem this patch intends to fix: 
https://godbolt.org/z/j445sxPMc


For Intel microarchitectures before Skylake, the LLVM cost model says that 
vector fsqrt is slow, so if fast-math is enabled, we'll use an approximation 
rather than the vsqrtps instruction when vectorizing a call to sqrtf(). If the 
code is compiled with -march=skylake or -mtune=skylake, we'll choose the 
vsqrtps instruction, but with any earlier base target, we'll choose the 
approximation even if there is a cpu_specific(skylake) implementation in the 
source code.

For example

  __attribute__((cpu_specific(skylake))) void foo(void) {
    for (int i = 0; i < 8; ++i)
      x[i] = sqrtf(y[i]);
  }

compiles to

  foo.b:
          vmovaps ymm0, ymmword ptr [rip + y]
          vrsqrtps        ymm1, ymm0
          vmulps  ymm2, ymm0, ymm1
          vbroadcastss    ymm3, dword ptr [rip + .LCPI2_0] # ymm3 = 
[-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0]
          vfmadd231ps     ymm3, ymm2, ymm1        # ymm3 = (ymm2 * ymm1) + ymm3
          vbroadcastss    ymm1, dword ptr [rip + .LCPI2_1] # ymm1 = 
[-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1]
          vmulps  ymm1, ymm2, ymm1
          vmulps  ymm1, ymm1, ymm3
          vbroadcastss    ymm2, dword ptr [rip + .LCPI2_2] # ymm2 = 
[NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN]
          vandps  ymm0, ymm0, ymm2
          vbroadcastss    ymm2, dword ptr [rip + .LCPI2_3] # ymm2 = 
[1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38]
          vcmpleps        ymm0, ymm2, ymm0
          vandps  ymm0, ymm0, ymm1
          vmovaps ymmword ptr [rip + x], ymm0
          vzeroupper
          ret

but it should compile to

  foo.b:
          vsqrtps ymm0, ymmword ptr [rip + y]
          vmovaps ymmword ptr [rip + x], ymm0
          vzeroupper
          ret


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121410/new/

https://reviews.llvm.org/D121410

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

Reply via email to