from:"Kevin B. Smith via Phabricator via cfe\-commits"

[PATCH] D107004: Turn off fp reassociation in IA intrinsic header files.

2021-07-28 Thread Kevin B. Smith via Phabricator via cfe-commits

kbsmith1 created this revision.
kbsmith1 added reviewers: craig.topper, aaron.ballman, erichkeane, 
andrew.w.kaylor.
kbsmith1 requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

This change uses #pragma float_control and
floating point fast-math reassociation flags for IA intrinsics
that happen to be implemented using LLVM IR's regular FP
operations.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D107004

Files:
  clang/lib/Headers/emmintrin.h
  clang/lib/Headers/immintrin.h
  clang/lib/Headers/x86intrin.h
  clang/lib/Headers/xmmintrin.h
  clang/test/Headers/emmintrin.c
  clang/test/Headers/immintrin.c
  clang/test/Headers/x86intrin-3.c
  clang/test/Headers/xmmintrin.c

Index: clang/test/Headers/xmmintrin.c
===
--- clang/test/Headers/xmmintrin.c
+++ clang/test/Headers/xmmintrin.c
@@ -1,5 +1,8 @@
 // RUN: %clang_cc1 %s -ffreestanding -triple x86_64-apple-macosx10.9.0 -emit-llvm -o - | FileCheck %s
 //
+// RUN: %clang_cc1 %s -ffreestanding -ffast-math -triple x86_64-apple-macosx10.9.0 -emit-llvm -o - \
+// RUN: | FileCheck -check-prefix=CKFMATH %s
+//
 // RUN: rm -rf %t
 // RUN: %clang_cc1 %s -ffreestanding -triple x86_64-apple-macosx10.9.0 -emit-llvm -o - \
 // RUN: -fmodules -fimplicit-module-maps -fmodules-cache-path=%t -isystem %S/Inputs/include \
@@ -26,6 +29,21 @@
   return _mm_add_sd(__a, __b);
 }
 
+// Make sure that the llvm IR for _mm_add_ps doesn't have fast attribute and
+// doesn't have reassoc set.
+// CKFMATH: define{{.*}} <4 x float> @test_xmmintrin_no_reassoc
+// CKFMATH: fadd nnan ninf nsz arcp afn <4 x float>
+__m128 test_xmmintrin_no_reassoc(__m128 __a, __m128 __b) {
+  return _mm_add_ps(__a, __b);
+}
+
+// Make sure that all fast flags were restored outside of the include file.
+// CKFMATH: define{{.*}} double @test_fast
+// CKFMATH: fadd reassoc nnan ninf nsz arcp afn double
+double test_fast(double __a, double __b) {
+  return __a + __b;
+}
+
 #if __STDC_HOSTED__
 // Make sure stdlib.h symbols are accessible.
 void *p = NULL;
Index: clang/test/Headers/x86intrin-3.c
===
--- /dev/null
+++ clang/test/Headers/x86intrin-3.c
@@ -0,0 +1,50 @@
+// RUN: %clang_cc1 %s -ffreestanding -ffast-math -triple i386-unknown-unknown -emit-llvm -o - \
+// RUN: | FileCheck -check-prefix=CKFMATH %s
+//
+// RUN: %clang_cc1 %s -ffreestanding -ffast-math -triple x86_64-unknown-unknown -emit-llvm -o - \
+// RUN: | FileCheck -check-prefix=CKFMATH %s
+
+// Include the metaheader that includes all x86 intrinsic headers.
+#include 
+
+// Make sure that the llvm IR for _mm_add_ps doesn't have fast attribute and
+// doesn't have reassoc set.
+// CKFMATH: define{{.*}} <4 x float> @test_xmmintrin_no_reassoc
+// CKFMATH: fadd nnan ninf nsz arcp afn <4 x float>
+__m128 __attribute__((__target__("sse"))) test_xmmintrin_no_reassoc(__m128 __a, __m128 __b) {
+  return _mm_add_ps(__a, __b);
+}
+
+// Make sure that the llvm IR for _mm_add_pd doesn't have fast attribute and
+// doesn't have reassoc set.
+// CKFMATH: define{{.*}} <2 x double> @test_emmintrin_no_reassoc
+// CKFMATH: fadd nnan ninf nsz arcp afn <2 x double>
+__m128d __attribute__((__target__("sse2"))) test_emmintrin_no_reassoc(__m128d __a, __m128d __b) {
+  return _mm_add_pd(__a, __b);
+}
+
+// Make sure that the llvm IR for _mm256_add_ps doesn't have fast attribute and
+// doesn't have reassoc set.
+// This intrinsic comes from avxintrin.h, and so is checking that
+// changes in immintrin.h affect the files it includes as well.
+// CKFMATH: define{{.*}} <8 x float> @test_mm256intrin_no_reassoc
+// CKFMATH: fadd nnan ninf nsz arcp afn <8 x float>
+__m256 __attribute__((__target__(("avx" test_mm256intrin_no_reassoc(__m256 __a, __m256 __b) {
+  return _mm256_add_ps(__a, __b);
+}
+
+// Make sure that the llvm IR for _mm512_add_ps doesn't have fast attribute and
+// doesn't have reassoc set.
+// This intrinsic comes from avxintrin.h
+// CKFMATH: define{{.*}} <16 x float> @test_mm512intrin_no_reassoc
+// CKFMATH: fadd nnan ninf nsz arcp afn <16 x float>
+__m512 __attribute__((__target__(("avx512f" test_mm512intrin_no_reassoc(__m512 __a, __m512 __b) {
+  return _mm512_add_ps(__a, __b);
+}
+
+// Make sure that all fast flags were restored outside of the include file.
+// CKFMATH: define{{.*}} double @test_fast
+// CKFMATH: fadd reassoc nnan ninf nsz arcp afn double
+double test_fast(double __a, double __b) {
+  return __a + __b;
+}
Index: clang/test/Headers/immintrin.c
===
--- /dev/null
+++ clang/test/Headers/immintrin.c
@@ -0,0 +1,50 @@
+// RUN: %clang_cc1 %s -ffreestanding -ffast-math -triple i386-unknown-unknown -emit-llvm -o - \
+// RUN: | FileCheck -check-prefix=CKFMATH %s
+//
+// RUN: %clang_cc1 %s -ffreestanding -ffast-math -triple x86_64-unknown-unknown -emit-llvm -o - \
+//

[PATCH] D107004: Turn off fp reassociation in IA intrinsic header files.

2021-07-28 Thread Kevin B. Smith via Phabricator via cfe-commits

kbsmith1 added a comment.

In D107004#2911654 , @lebedev.ri 
wrote:

> Patch description states what the change is, but not why the change is what 
> it is.
> I don't think this is a good direction to second-guess the user's intentions.

I added the reason for the change to the description.  I think when a user 
programs
t = _mm_add_ps(a, b);
t = _mm_sub_ps(t, b);

that they truly did want those intrinsics evaluated like they were written, 
even with
fast math flags.  This change to the intrinsic headers makes programming with 
intrinsics
more accurately reflect what the user wrote, even when fast math flags are used.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107004/new/

https://reviews.llvm.org/D107004

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107004: Turn off fp reassociation in IA intrinsic header files.

2021-08-02 Thread Kevin B. Smith via Phabricator via cfe-commits

kbsmith1 abandoned this revision.
kbsmith1 added a comment.

Doesn't seem to have community support, abandoning.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107004/new/

https://reviews.llvm.org/D107004

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D99675: RFC [llvm][clang] Create new intrinsic llvm.arith.fence to control FP optimization at expression level

2021-04-06 Thread Kevin B. Smith via Phabricator via cfe-commits

kbsmith1 added a comment.

In D99675#2671924 , @efriedma wrote:

>> The expression “llvm.arith.fence(a * b) + c” means that “a * b” must happen 
>> before “+ c” and FMA guarantees that, but to prevent later optimizations 
>> from unpacking the FMA the correct transformation needs to be:
>>
>> llvm.arith.fence(a * b) + c  →  llvm.arith.fence(FMA(a, b, c))
>
> Does this actually block later transforms from unpacking the FMA?  Maybe if 
> the FMA isn't marked "fast"...

I think we could define llvm.arith.fence to be such that this FMA contraction 
isn't legal/correct, or it could be left as is.  In the implementation that was 
used for the Intel compiler FMA contraction did not occur across an an __fence 
boundary.  It is unclear whether that was intended as the semantic, or if we 
just never bothered to implement that contraction.
Not allowing the FMA contraction across the llvm.arith.fence would make 
unpacking an FMA allowed under the same circumstances that LLVM currently 
allows that.

> 
>
> How is llvm.arith.fence() different from using "freeze" on a floating-point 
> value?  The goal isn't really the same, sure, but the effects seem similar at 
> first glance.

They are similar.  However, fence is a no-op if the operand can be proven not 
to be undef or poison, and in such circumstances could be removed by an 
optimizer.  llvm.arith.fence cannot be removed by an optimizer, because doing 
so might allow instructions that were "outside" the fence from being 
reassociated/distrbuted with the instructions/operands that were inside the 
fence.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99675/new/

https://reviews.llvm.org/D99675

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107004: Turn off fp reassociation in IA intrinsic header files.

[PATCH] D107004: Turn off fp reassociation in IA intrinsic header files.

[PATCH] D107004: Turn off fp reassociation in IA intrinsic header files.

[PATCH] D99675: RFC [llvm][clang] Create new intrinsic llvm.arith.fence to control FP optimization at expression level

4 matches

Site Navigation

Mail list logo

Footer information