from:"Phoebe Wang via Phabricator via cfe\-commits"

[PATCH] D132742: [X86][BF16] Add type mangling for Windows

2022-08-29 Thread Phoebe Wang via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rGa845d8fc57b6: [X86][BF16] Add type mangling for Windows 
(authored by pengfei).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132742/new/

https://reviews.llvm.org/D132742

Files:
  clang/lib/AST/MicrosoftMangle.cpp
  clang/test/CodeGen/X86/bfloat-mangle.cpp


Index: clang/test/CodeGen/X86/bfloat-mangle.cpp
===
--- clang/test/CodeGen/X86/bfloat-mangle.cpp
+++ clang/test/CodeGen/X86/bfloat-mangle.cpp
@@ -1,5 +1,8 @@
-// RUN: %clang_cc1 -triple i386-unknown-unknown -target-feature +sse2 
-emit-llvm -o - %s | FileCheck %s
-// RUN: %clang_cc1 -triple x86_64-unknown-unknown -target-feature +sse2 
-emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple i386-unknown-unknown -target-feature +sse2 
-emit-llvm -o - %s | FileCheck %s --check-prefixes=LINUX
+// RUN: %clang_cc1 -triple x86_64-unknown-unknown -target-feature +sse2 
-emit-llvm -o - %s | FileCheck %s --check-prefixes=LINUX
+// RUN: %clang_cc1 -triple i386-windows-msvc -target-feature +sse2 -emit-llvm 
-o - %s | FileCheck %s --check-prefixes=WINDOWS
+// RUN: %clang_cc1 -triple x86_64-windows-msvc -target-feature +sse2 
-emit-llvm -o - %s | FileCheck %s --check-prefixes=WINDOWS
 
-// CHECK: define {{.*}}void @_Z3foou6__bf16(bfloat noundef %b)
+// LINUX: define {{.*}}void @_Z3foou6__bf16(bfloat noundef %b)
+// WINDOWS: define {{.*}}void @"?foo@@YAXU__bf16@__clang@@@Z"(bfloat noundef 
%b)
 void foo(__bf16 b) {}
Index: clang/lib/AST/MicrosoftMangle.cpp
===
--- clang/lib/AST/MicrosoftMangle.cpp
+++ clang/lib/AST/MicrosoftMangle.cpp
@@ -2469,6 +2469,10 @@
   Out << "$halff@";
 break;
 
+  case BuiltinType::BFloat16:
+mangleArtificialTagType(TTK_Struct, "__bf16", {"__clang"});
+break;
+
 #define SVE_TYPE(Name, Id, SingletonId) \
   case BuiltinType::Id:
 #include "clang/Basic/AArch64SVEACLETypes.def"
@@ -2501,7 +2505,6 @@
   case BuiltinType::SatUShortFract:
   case BuiltinType::SatUFract:
   case BuiltinType::SatULongFract:
-  case BuiltinType::BFloat16:
   case BuiltinType::Ibm128:
   case BuiltinType::Float128: {
 DiagnosticsEngine &Diags = Context.getDiags();


Index: clang/test/CodeGen/X86/bfloat-mangle.cpp
===
--- clang/test/CodeGen/X86/bfloat-mangle.cpp
+++ clang/test/CodeGen/X86/bfloat-mangle.cpp
@@ -1,5 +1,8 @@
-// RUN: %clang_cc1 -triple i386-unknown-unknown -target-feature +sse2 -emit-llvm -o - %s | FileCheck %s
-// RUN: %clang_cc1 -triple x86_64-unknown-unknown -target-feature +sse2 -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple i386-unknown-unknown -target-feature +sse2 -emit-llvm -o - %s | FileCheck %s --check-prefixes=LINUX
+// RUN: %clang_cc1 -triple x86_64-unknown-unknown -target-feature +sse2 -emit-llvm -o - %s | FileCheck %s --check-prefixes=LINUX
+// RUN: %clang_cc1 -triple i386-windows-msvc -target-feature +sse2 -emit-llvm -o - %s | FileCheck %s --check-prefixes=WINDOWS
+// RUN: %clang_cc1 -triple x86_64-windows-msvc -target-feature +sse2 -emit-llvm -o - %s | FileCheck %s --check-prefixes=WINDOWS
 
-// CHECK: define {{.*}}void @_Z3foou6__bf16(bfloat noundef %b)
+// LINUX: define {{.*}}void @_Z3foou6__bf16(bfloat noundef %b)
+// WINDOWS: define {{.*}}void @"?foo@@YAXU__bf16@__clang@@@Z"(bfloat noundef %b)
 void foo(__bf16 b) {}
Index: clang/lib/AST/MicrosoftMangle.cpp
===
--- clang/lib/AST/MicrosoftMangle.cpp
+++ clang/lib/AST/MicrosoftMangle.cpp
@@ -2469,6 +2469,10 @@
   Out << "$halff@";
 break;
 
+  case BuiltinType::BFloat16:
+mangleArtificialTagType(TTK_Struct, "__bf16", {"__clang"});
+break;
+
 #define SVE_TYPE(Name, Id, SingletonId) \
   case BuiltinType::Id:
 #include "clang/Basic/AArch64SVEACLETypes.def"
@@ -2501,7 +2505,6 @@
   case BuiltinType::SatUShortFract:
   case BuiltinType::SatUFract:
   case BuiltinType::SatULongFract:
-  case BuiltinType::BFloat16:
   case BuiltinType::Ibm128:
   case BuiltinType::Float128: {
 DiagnosticsEngine &Diags = Context.getDiags();
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D132742: [X86][BF16] Add type mangling for Windows

2022-08-29 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

Thanks @FreddyYe


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132742/new/

https://reviews.llvm.org/D132742

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D133920: [X86][fastcall] Move capability check before free register update

2022-09-15 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei created this revision.
pengfei added a reviewer: rnk.
Herald added a project: All.
pengfei requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Fixes: #57737


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D133920

Files:
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGen/mangle-windows.c


Index: clang/test/CodeGen/mangle-windows.c
===
--- clang/test/CodeGen/mangle-windows.c
+++ clang/test/CodeGen/mangle-windows.c
@@ -47,7 +47,7 @@
 // X64: define dso_local void @f8(
 
 void __fastcall f9(long long a, char b, char c, short d) {}
-// CHECK: define dso_local x86_fastcallcc void @"\01@f9@20"(i64 noundef %a, i8 
noundef signext %b, i8 noundef signext %c, i16 noundef signext %d)
+// CHECK: define dso_local x86_fastcallcc void @"\01@f9@20"(i64 noundef %a, i8 
inreg noundef signext %b, i8 inreg noundef signext %c, i16 noundef signext %d)
 // X64: define dso_local void @f9(
 
 void f12(void) {}
Index: clang/lib/CodeGen/TargetInfo.cpp
===
--- clang/lib/CodeGen/TargetInfo.cpp
+++ clang/lib/CodeGen/TargetInfo.cpp
@@ -1771,21 +1771,25 @@
 }
 
 bool X86_32ABIInfo::shouldPrimitiveUseInReg(QualType Ty, CCState &State) const 
{
-  if (!updateFreeRegs(Ty, State))
+  auto CanUseInReg = [this](QualType Ty, CCState &State) {
+if (State.CC == llvm::CallingConv::X86_FastCall ||
+State.CC == llvm::CallingConv::X86_VectorCall ||
+State.CC == llvm::CallingConv::X86_RegCall) {
+  if (getContext().getTypeSize(Ty) > 32)
+return false;
+
+  return (Ty->isIntegralOrEnumerationType() || Ty->isPointerType() ||
+  Ty->isReferenceType());
+}
+return true;
+  };
+
+  if (!CanUseInReg(Ty, State) || !updateFreeRegs(Ty, State))
 return false;
 
   if (IsMCUABI)
 return false;
 
-  if (State.CC == llvm::CallingConv::X86_FastCall ||
-  State.CC == llvm::CallingConv::X86_VectorCall ||
-  State.CC == llvm::CallingConv::X86_RegCall) {
-if (getContext().getTypeSize(Ty) > 32)
-  return false;
-
-return (Ty->isIntegralOrEnumerationType() || Ty->isPointerType() ||
-Ty->isReferenceType());
-  }
 
   return true;
 }


Index: clang/test/CodeGen/mangle-windows.c
===
--- clang/test/CodeGen/mangle-windows.c
+++ clang/test/CodeGen/mangle-windows.c
@@ -47,7 +47,7 @@
 // X64: define dso_local void @f8(
 
 void __fastcall f9(long long a, char b, char c, short d) {}
-// CHECK: define dso_local x86_fastcallcc void @"\01@f9@20"(i64 noundef %a, i8 noundef signext %b, i8 noundef signext %c, i16 noundef signext %d)
+// CHECK: define dso_local x86_fastcallcc void @"\01@f9@20"(i64 noundef %a, i8 inreg noundef signext %b, i8 inreg noundef signext %c, i16 noundef signext %d)
 // X64: define dso_local void @f9(
 
 void f12(void) {}
Index: clang/lib/CodeGen/TargetInfo.cpp
===
--- clang/lib/CodeGen/TargetInfo.cpp
+++ clang/lib/CodeGen/TargetInfo.cpp
@@ -1771,21 +1771,25 @@
 }
 
 bool X86_32ABIInfo::shouldPrimitiveUseInReg(QualType Ty, CCState &State) const {
-  if (!updateFreeRegs(Ty, State))
+  auto CanUseInReg = [this](QualType Ty, CCState &State) {
+if (State.CC == llvm::CallingConv::X86_FastCall ||
+State.CC == llvm::CallingConv::X86_VectorCall ||
+State.CC == llvm::CallingConv::X86_RegCall) {
+  if (getContext().getTypeSize(Ty) > 32)
+return false;
+
+  return (Ty->isIntegralOrEnumerationType() || Ty->isPointerType() ||
+  Ty->isReferenceType());
+}
+return true;
+  };
+
+  if (!CanUseInReg(Ty, State) || !updateFreeRegs(Ty, State))
 return false;
 
   if (IsMCUABI)
 return false;
 
-  if (State.CC == llvm::CallingConv::X86_FastCall ||
-  State.CC == llvm::CallingConv::X86_VectorCall ||
-  State.CC == llvm::CallingConv::X86_RegCall) {
-if (getContext().getTypeSize(Ty) > 32)
-  return false;
-
-return (Ty->isIntegralOrEnumerationType() || Ty->isPointerType() ||
-Ty->isReferenceType());
-  }
 
   return true;
 }
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D86855: Convert __m64 intrinsics to unconditionally use SSE2 instead of MMX instructions.

2023-11-19 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

Reverse ping. Any progress or plan for this patch?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86855/new/

https://reviews.llvm.org/D86855

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D128571: [X86] Support `_Float16` on SSE2 and up

2022-06-27 Thread Phoebe Wang via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rGf5d781d6273c: [X86] Support `_Float16` on SSE2 and up 
(authored by pengfei).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128571/new/

https://reviews.llvm.org/D128571

Files:
  clang/docs/LanguageExtensions.rst
  clang/docs/ReleaseNotes.rst
  clang/lib/Basic/Targets/X86.cpp
  clang/test/CodeGen/X86/Float16-arithmetic.c
  clang/test/CodeGen/X86/Float16-complex.c
  clang/test/CodeGen/X86/avx512fp16-complex.c
  clang/test/Sema/Float16.c
  clang/test/Sema/conversion-target-dep.c
  clang/test/SemaCXX/Float16.cpp

Index: clang/test/SemaCXX/Float16.cpp
===
--- clang/test/SemaCXX/Float16.cpp
+++ clang/test/SemaCXX/Float16.cpp
@@ -1,4 +1,6 @@
-// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc %s
+// RUN: %clang_cc1 -fsyntax-only -verify -triple i686-linux-pc %s
+// RUN: %clang_cc1 -fsyntax-only -verify -triple i686-linux-pc -target-feature +sse2 %s -DHAVE
+// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple spir-unknown-unknown %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple armv7a-linux-gnu %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple aarch64-linux-gnu %s -DHAVE
Index: clang/test/Sema/conversion-target-dep.c
===
--- clang/test/Sema/conversion-target-dep.c
+++ clang/test/Sema/conversion-target-dep.c
@@ -6,7 +6,7 @@
 
 long double ld;
 double d;
-_Float16 f16; // x86-error {{_Float16 is not supported on this target}}
+_Float16 f16;
 
 int main(void) {
   ld = d; // x86-warning {{implicit conversion increases floating-point precision: 'double' to 'long double'}}
Index: clang/test/Sema/Float16.c
===
--- clang/test/Sema/Float16.c
+++ clang/test/Sema/Float16.c
@@ -1,5 +1,6 @@
-// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc %s
-// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc -target-feature +avx512fp16 %s -DHAVE
+// RUN: %clang_cc1 -fsyntax-only -verify -triple i686-linux-pc %s
+// RUN: %clang_cc1 -fsyntax-only -verify -triple i686-linux-pc -target-feature +sse2 %s -DHAVE
+// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple spir-unknown-unknown %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple armv7a-linux-gnu %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple aarch64-linux-gnu %s -DHAVE
Index: clang/test/CodeGen/X86/Float16-complex.c
===
--- clang/test/CodeGen/X86/Float16-complex.c
+++ clang/test/CodeGen/X86/Float16-complex.c
@@ -1,4 +1,5 @@
 // RUN: %clang_cc1 %s -O0 -emit-llvm -triple x86_64-unknown-unknown -target-feature +avx512fp16 -o - | FileCheck %s --check-prefix=X86
+// RUN: %clang_cc1 %s -O0 -emit-llvm -triple x86_64-unknown-unknown -o - | FileCheck %s --check-prefix=X86
 
 _Float16 _Complex add_half_rr(_Float16 a, _Float16 b) {
   // X86-LABEL: @add_half_rr(
Index: clang/test/CodeGen/X86/Float16-arithmetic.c
===
--- /dev/null
+++ clang/test/CodeGen/X86/Float16-arithmetic.c
@@ -0,0 +1,29 @@
+// RUN: %clang_cc1 -triple  x86_64-unknown-unknown \
+// RUN: -emit-llvm -o - %s  | FileCheck %s --check-prefixes=CHECK
+
+// CHECK-NOT: fpext
+// CHECK-NOT: fptrunc
+
+_Float16 add1(_Float16 a, _Float16 b) {
+  return a + b;
+}
+
+_Float16 add2(_Float16 a, _Float16 b, _Float16 c) {
+  return a + b + c;
+}
+
+_Float16 div(_Float16 a, _Float16 b) {
+  return a / b;
+}
+
+_Float16 mul(_Float16 a, _Float16 b) {
+  return a * b;
+}
+
+_Float16 add_and_mul1(_Float16 a, _Float16 b, _Float16 c, _Float16 d) {
+  return a * b + c * d;
+}
+
+_Float16 add_and_mul2(_Float16 a, _Float16 b, _Float16 c, _Float16 d) {
+  return (a - 6 * b) + c;
+}
Index: clang/lib/Basic/Targets/X86.cpp
===
--- clang/lib/Basic/Targets/X86.cpp
+++ clang/lib/Basic/Targets/X86.cpp
@@ -239,7 +239,6 @@
   HasAVX512ER = true;
 } else if (Feature == "+avx512fp16") {
   HasAVX512FP16 = true;
-  HasFloat16 = true;
 } else if (Feature == "+avx512pf") {
   HasAVX512PF = true;
 } else if (Feature == "+avx512dq") {
@@ -355,6 +354,9 @@
.Default(NoSSE);
 SSELevel = std::max(SSELevel, Level);
 
+// Turn on _float16 for x86 (feature sse2)
+HasFloat16 = SSELevel >= SSE2;
+
 MMX3DNowEnum ThreeDNowLevel = llvm::StringSwitch(Feature)
   .Case("+3dnowa", AMD3DNowAthlon)
   .Case("+3dnow", AMD3DNow)
Index: clang/docs/ReleaseNotes.rst
=

[PATCH] D128571: [X86] Support `_Float16` on SSE2 and up

2022-06-27 Thread Phoebe Wang via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rG527ef8ca981e: Reland "[X86] Support `_Float16` on SSE2 
and up" (authored by pengfei).
Herald added subscribers: Sanitizers, Enna1, mgorny.
Herald added a project: Sanitizers.

Changed prior to commit:
  https://reviews.llvm.org/D128571?vs=440199&id=440490#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128571/new/

https://reviews.llvm.org/D128571

Files:
  clang/docs/LanguageExtensions.rst
  clang/docs/ReleaseNotes.rst
  clang/lib/Basic/Targets/X86.cpp
  clang/test/CodeGen/X86/Float16-arithmetic.c
  clang/test/CodeGen/X86/Float16-complex.c
  clang/test/CodeGen/X86/avx512fp16-complex.c
  clang/test/Sema/Float16.c
  clang/test/Sema/conversion-target-dep.c
  clang/test/SemaCXX/Float16.cpp
  compiler-rt/test/builtins/CMakeLists.txt

Index: compiler-rt/test/builtins/CMakeLists.txt
===
--- compiler-rt/test/builtins/CMakeLists.txt
+++ compiler-rt/test/builtins/CMakeLists.txt
@@ -44,7 +44,7 @@
 string(REPLACE ";" " " BUILTINS_TEST_TARGET_CFLAGS "${BUILTINS_TEST_TARGET_CFLAGS}")
   endif()
 
-  if (${arch} MATCHES "arm|aarch64|arm64" AND COMPILER_RT_HAS_FLOAT16)
+  if (${arch} MATCHES "arm|aarch64|arm64|i?86|x86_64|AMD64" AND COMPILER_RT_HAS_FLOAT16)
 list(APPEND BUILTINS_TEST_TARGET_CFLAGS -DCOMPILER_RT_HAS_FLOAT16)
 string(REPLACE ";" " " BUILTINS_TEST_TARGET_CFLAGS "${BUILTINS_TEST_TARGET_CFLAGS}")
   endif()
Index: clang/test/SemaCXX/Float16.cpp
===
--- clang/test/SemaCXX/Float16.cpp
+++ clang/test/SemaCXX/Float16.cpp
@@ -1,4 +1,6 @@
-// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc %s
+// RUN: %clang_cc1 -fsyntax-only -verify -triple i686-linux-pc %s
+// RUN: %clang_cc1 -fsyntax-only -verify -triple i686-linux-pc -target-feature +sse2 %s -DHAVE
+// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple spir-unknown-unknown %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple armv7a-linux-gnu %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple aarch64-linux-gnu %s -DHAVE
Index: clang/test/Sema/conversion-target-dep.c
===
--- clang/test/Sema/conversion-target-dep.c
+++ clang/test/Sema/conversion-target-dep.c
@@ -6,7 +6,7 @@
 
 long double ld;
 double d;
-_Float16 f16; // x86-error {{_Float16 is not supported on this target}}
+_Float16 f16;
 
 int main(void) {
   ld = d; // x86-warning {{implicit conversion increases floating-point precision: 'double' to 'long double'}}
Index: clang/test/Sema/Float16.c
===
--- clang/test/Sema/Float16.c
+++ clang/test/Sema/Float16.c
@@ -1,5 +1,6 @@
-// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc %s
-// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc -target-feature +avx512fp16 %s -DHAVE
+// RUN: %clang_cc1 -fsyntax-only -verify -triple i686-linux-pc %s
+// RUN: %clang_cc1 -fsyntax-only -verify -triple i686-linux-pc -target-feature +sse2 %s -DHAVE
+// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple spir-unknown-unknown %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple armv7a-linux-gnu %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple aarch64-linux-gnu %s -DHAVE
Index: clang/test/CodeGen/X86/Float16-complex.c
===
--- clang/test/CodeGen/X86/Float16-complex.c
+++ clang/test/CodeGen/X86/Float16-complex.c
@@ -1,4 +1,5 @@
 // RUN: %clang_cc1 %s -O0 -emit-llvm -triple x86_64-unknown-unknown -target-feature +avx512fp16 -o - | FileCheck %s --check-prefix=X86
+// RUN: %clang_cc1 %s -O0 -emit-llvm -triple x86_64-unknown-unknown -o - | FileCheck %s --check-prefix=X86
 
 _Float16 _Complex add_half_rr(_Float16 a, _Float16 b) {
   // X86-LABEL: @add_half_rr(
Index: clang/test/CodeGen/X86/Float16-arithmetic.c
===
--- /dev/null
+++ clang/test/CodeGen/X86/Float16-arithmetic.c
@@ -0,0 +1,29 @@
+// RUN: %clang_cc1 -triple  x86_64-unknown-unknown \
+// RUN: -emit-llvm -o - %s  | FileCheck %s --check-prefixes=CHECK
+
+// CHECK-NOT: fpext
+// CHECK-NOT: fptrunc
+
+_Float16 add1(_Float16 a, _Float16 b) {
+  return a + b;
+}
+
+_Float16 add2(_Float16 a, _Float16 b, _Float16 c) {
+  return a + b + c;
+}
+
+_Float16 div(_Float16 a, _Float16 b) {
+  return a / b;
+}
+
+_Float16 mul(_Float16 a, _Float16 b) {
+  return a * b;
+}
+
+_Float16 add_and_mul1(_Float16 a, _Float16 b, _Float16 c, _Float16 d) {
+  return a * b + c * d;
+}
+
+_Float16 add_and_mul2(_Float16 a, _Float16 b, _Float16 c, _Float16 d) {
+  return (a - 6 * b) + c;
+}
Index: clang/lib/Basic/Targe

[PATCH] D128571: [X86] Support `_Float16` on SSE2 and up

2022-06-27 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a subscriber: vitalybuka.
pengfei added a comment.

Thanks @vitalybuka ! I believe the fail was caused by missing 
`COMPILER_RT_HAS_FLOAT16` in these tests. Relanded.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128571/new/

https://reviews.llvm.org/D128571

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D128571: [X86] Support `_Float16` on SSE2 and up

2022-06-29 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

Thanks @benlangmuir for the revert. The problem seems Darwin supports the 
`_Float16` type already but with a different ABI. I have no idea how to solve 
the problem ATM. Post a question on discourse: 
https://discourse.llvm.org/t/compiler-rt-tests-fail-on-darwin-stage1-build-after-the-abi-change-of-half-type-on-x86/63508


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128571/new/

https://reviews.llvm.org/D128571

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D128571: [X86] Support `_Float16` on SSE2 and up

2022-06-29 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

In D128571#3619265 , @alexfh wrote:

> @pengfei could you fix the Darwin tests as well? And a general comment 
> regarding the ongoing `_Float16` effort: I think that this change should have 
> been a part of https://reviews.llvm.org/D107082 to make it possible to build 
> a consistently working toolchain. Thus, if this commit can't be landed in a 
> reasonable time, I'd suggest reverting https://reviews.llvm.org/D107082.

@alexfh I'm working on that. I'm asking suggestion on solving it in a better 
way, but at least we can disable the test for Darwin (maybe just for stage1 if 
possible) since it's expected due to the ABI change.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128571/new/

https://reviews.llvm.org/D128571

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D128571: [X86] Support `_Float16` on SSE2 and up

2022-06-29 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei updated this revision to Diff 441222.
pengfei added a comment.

Disable `extendhfsf2/truncsfhf2` tests on Darwin to avoid the fail.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128571/new/

https://reviews.llvm.org/D128571

Files:
  clang/docs/LanguageExtensions.rst
  clang/docs/ReleaseNotes.rst
  clang/lib/Basic/Targets/X86.cpp
  clang/test/CodeGen/X86/Float16-arithmetic.c
  clang/test/CodeGen/X86/Float16-complex.c
  clang/test/CodeGen/X86/avx512fp16-complex.c
  clang/test/Sema/Float16.c
  clang/test/Sema/conversion-target-dep.c
  clang/test/SemaCXX/Float16.cpp
  compiler-rt/test/builtins/CMakeLists.txt
  compiler-rt/test/builtins/Unit/extendhfsf2_test.c
  compiler-rt/test/builtins/Unit/truncdfhf2_test.c
  compiler-rt/test/builtins/Unit/truncsfhf2_test.c

Index: compiler-rt/test/builtins/Unit/truncsfhf2_test.c
===
--- compiler-rt/test/builtins/Unit/truncsfhf2_test.c
+++ compiler-rt/test/builtins/Unit/truncsfhf2_test.c
@@ -1,4 +1,7 @@
 // RUN: %clang_builtins %s %librt -o %t && %run %t
+// FIXME: Darwin used a different ABI for FP16 type. Disable the test to avoid
+// it fails on stage1 build.
+// UNSUPPORTED: darwin
 // REQUIRES: librt_has_truncsfhf2
 
 #include 
Index: compiler-rt/test/builtins/Unit/truncdfhf2_test.c
===
--- compiler-rt/test/builtins/Unit/truncdfhf2_test.c
+++ compiler-rt/test/builtins/Unit/truncdfhf2_test.c
@@ -1,4 +1,7 @@
 // RUN: %clang_builtins %s %librt -o %t && %run %t
+// FIXME: Darwin used a different ABI for FP16 type. Disable the test to avoid
+// it fails on stage1 build.
+// UNSUPPORTED: darwin
 // REQUIRES: librt_has_truncdfhf2
 
 #include 
Index: compiler-rt/test/builtins/Unit/extendhfsf2_test.c
===
--- compiler-rt/test/builtins/Unit/extendhfsf2_test.c
+++ compiler-rt/test/builtins/Unit/extendhfsf2_test.c
@@ -1,4 +1,7 @@
 // RUN: %clang_builtins %s %librt -o %t && %run %t
+// FIXME: Darwin used a different ABI for FP16 type. Disable the test to avoid
+// it fails on stage1 build.
+// UNSUPPORTED: darwin
 // REQUIRES: librt_has_extendhfsf2
 
 #include 
Index: compiler-rt/test/builtins/CMakeLists.txt
===
--- compiler-rt/test/builtins/CMakeLists.txt
+++ compiler-rt/test/builtins/CMakeLists.txt
@@ -44,7 +44,7 @@
 string(REPLACE ";" " " BUILTINS_TEST_TARGET_CFLAGS "${BUILTINS_TEST_TARGET_CFLAGS}")
   endif()
 
-  if (${arch} MATCHES "arm|aarch64|arm64" AND COMPILER_RT_HAS_FLOAT16)
+  if (${arch} MATCHES "arm|aarch64|arm64|i?86|x86_64|AMD64" AND COMPILER_RT_HAS_FLOAT16)
 list(APPEND BUILTINS_TEST_TARGET_CFLAGS -DCOMPILER_RT_HAS_FLOAT16)
 string(REPLACE ";" " " BUILTINS_TEST_TARGET_CFLAGS "${BUILTINS_TEST_TARGET_CFLAGS}")
   endif()
Index: clang/test/SemaCXX/Float16.cpp
===
--- clang/test/SemaCXX/Float16.cpp
+++ clang/test/SemaCXX/Float16.cpp
@@ -1,4 +1,6 @@
-// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc %s
+// RUN: %clang_cc1 -fsyntax-only -verify -triple i686-linux-pc %s
+// RUN: %clang_cc1 -fsyntax-only -verify -triple i686-linux-pc -target-feature +sse2 %s -DHAVE
+// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple spir-unknown-unknown %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple armv7a-linux-gnu %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple aarch64-linux-gnu %s -DHAVE
Index: clang/test/Sema/conversion-target-dep.c
===
--- clang/test/Sema/conversion-target-dep.c
+++ clang/test/Sema/conversion-target-dep.c
@@ -6,7 +6,7 @@
 
 long double ld;
 double d;
-_Float16 f16; // x86-error {{_Float16 is not supported on this target}}
+_Float16 f16;
 
 int main(void) {
   ld = d; // x86-warning {{implicit conversion increases floating-point precision: 'double' to 'long double'}}
Index: clang/test/Sema/Float16.c
===
--- clang/test/Sema/Float16.c
+++ clang/test/Sema/Float16.c
@@ -1,5 +1,6 @@
-// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc %s
-// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc -target-feature +avx512fp16 %s -DHAVE
+// RUN: %clang_cc1 -fsyntax-only -verify -triple i686-linux-pc %s
+// RUN: %clang_cc1 -fsyntax-only -verify -triple i686-linux-pc -target-feature +sse2 %s -DHAVE
+// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple spir-unknown-unknown %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple armv7a-linux-gnu %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple aarch64-linux-gnu %s -DHAVE
Index: clang/test/CodeGen/X86/Float16-

[PATCH] D128571: [X86] Support `_Float16` on SSE2 and up

2022-06-29 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

In D128571#3619438 , @pengfei wrote:

> In D128571#3619265 , @alexfh wrote:
>
>> @pengfei could you fix the Darwin tests as well? And a general comment 
>> regarding the ongoing `_Float16` effort: I think that this change should 
>> have been a part of https://reviews.llvm.org/D107082 to make it possible to 
>> build a consistently working toolchain. Thus, if this commit can't be landed 
>> in a reasonable time, I'd suggest reverting https://reviews.llvm.org/D107082.
>
> @alexfh I'm working on that. I'm asking suggestion on solving it in a better 
> way, but at least we can disable the test for Darwin (maybe just for stage1 
> if possible) since it's expected due to the ABI change.

Disabled these tests for Darwin. I'll reland the patch in one day if no 
objections.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128571/new/

https://reviews.llvm.org/D128571

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D128571: [X86] Support `_Float16` on SSE2 and up

2022-06-29 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei updated this revision to Diff 441236.
pengfei added a comment.

Exclude the ABI change on Darwin platform. Will enable it by a followup.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128571/new/

https://reviews.llvm.org/D128571

Files:
  clang/docs/LanguageExtensions.rst
  clang/docs/ReleaseNotes.rst
  clang/lib/Basic/Targets/X86.cpp
  clang/test/CodeGen/X86/Float16-arithmetic.c
  clang/test/CodeGen/X86/Float16-complex.c
  clang/test/CodeGen/X86/avx512fp16-complex.c
  clang/test/Sema/Float16.c
  clang/test/Sema/conversion-target-dep.c
  clang/test/SemaCXX/Float16.cpp
  compiler-rt/test/builtins/CMakeLists.txt

Index: compiler-rt/test/builtins/CMakeLists.txt
===
--- compiler-rt/test/builtins/CMakeLists.txt
+++ compiler-rt/test/builtins/CMakeLists.txt
@@ -44,9 +44,17 @@
 string(REPLACE ";" " " BUILTINS_TEST_TARGET_CFLAGS "${BUILTINS_TEST_TARGET_CFLAGS}")
   endif()
 
-  if (${arch} MATCHES "arm|aarch64|arm64" AND COMPILER_RT_HAS_FLOAT16)
-list(APPEND BUILTINS_TEST_TARGET_CFLAGS -DCOMPILER_RT_HAS_FLOAT16)
-string(REPLACE ";" " " BUILTINS_TEST_TARGET_CFLAGS "${BUILTINS_TEST_TARGET_CFLAGS}")
+  if(APPLE)
+# TODO: Support the new ABI on Apple platforms.
+if (${arch} MATCHES "arm|aarch64|arm64" AND COMPILER_RT_HAS_FLOAT16)
+  list(APPEND BUILTINS_TEST_TARGET_CFLAGS -DCOMPILER_RT_HAS_FLOAT16)
+  string(REPLACE ";" " " BUILTINS_TEST_TARGET_CFLAGS "${BUILTINS_TEST_TARGET_CFLAGS}")
+endif()
+  else()
+if (${arch} MATCHES "arm|aarch64|arm64|i?86|x86_64|AMD64" AND COMPILER_RT_HAS_FLOAT16)
+  list(APPEND BUILTINS_TEST_TARGET_CFLAGS -DCOMPILER_RT_HAS_FLOAT16)
+  string(REPLACE ";" " " BUILTINS_TEST_TARGET_CFLAGS "${BUILTINS_TEST_TARGET_CFLAGS}")
+endif()
   endif()
 
   if(COMPILER_RT_ENABLE_CET)
Index: clang/test/SemaCXX/Float16.cpp
===
--- clang/test/SemaCXX/Float16.cpp
+++ clang/test/SemaCXX/Float16.cpp
@@ -1,4 +1,6 @@
-// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc %s
+// RUN: %clang_cc1 -fsyntax-only -verify -triple i686-linux-pc %s
+// RUN: %clang_cc1 -fsyntax-only -verify -triple i686-linux-pc -target-feature +sse2 %s -DHAVE
+// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple spir-unknown-unknown %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple armv7a-linux-gnu %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple aarch64-linux-gnu %s -DHAVE
Index: clang/test/Sema/conversion-target-dep.c
===
--- clang/test/Sema/conversion-target-dep.c
+++ clang/test/Sema/conversion-target-dep.c
@@ -6,7 +6,7 @@
 
 long double ld;
 double d;
-_Float16 f16; // x86-error {{_Float16 is not supported on this target}}
+_Float16 f16;
 
 int main(void) {
   ld = d; // x86-warning {{implicit conversion increases floating-point precision: 'double' to 'long double'}}
Index: clang/test/Sema/Float16.c
===
--- clang/test/Sema/Float16.c
+++ clang/test/Sema/Float16.c
@@ -1,5 +1,6 @@
-// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc %s
-// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc -target-feature +avx512fp16 %s -DHAVE
+// RUN: %clang_cc1 -fsyntax-only -verify -triple i686-linux-pc %s
+// RUN: %clang_cc1 -fsyntax-only -verify -triple i686-linux-pc -target-feature +sse2 %s -DHAVE
+// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple spir-unknown-unknown %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple armv7a-linux-gnu %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple aarch64-linux-gnu %s -DHAVE
Index: clang/test/CodeGen/X86/Float16-complex.c
===
--- clang/test/CodeGen/X86/Float16-complex.c
+++ clang/test/CodeGen/X86/Float16-complex.c
@@ -1,4 +1,5 @@
 // RUN: %clang_cc1 %s -O0 -emit-llvm -triple x86_64-unknown-unknown -target-feature +avx512fp16 -o - | FileCheck %s --check-prefix=X86
+// RUN: %clang_cc1 %s -O0 -emit-llvm -triple x86_64-unknown-unknown -o - | FileCheck %s --check-prefix=X86
 
 _Float16 _Complex add_half_rr(_Float16 a, _Float16 b) {
   // X86-LABEL: @add_half_rr(
Index: clang/test/CodeGen/X86/Float16-arithmetic.c
===
--- /dev/null
+++ clang/test/CodeGen/X86/Float16-arithmetic.c
@@ -0,0 +1,29 @@
+// RUN: %clang_cc1 -triple  x86_64-unknown-unknown \
+// RUN: -emit-llvm -o - %s  | FileCheck %s --check-prefixes=CHECK
+
+// CHECK-NOT: fpext
+// CHECK-NOT: fptrunc
+
+_Float16 add1(_Float16 a, _Float16 b) {
+  return a + b;
+}
+
+_Float16 add2(_Float16 a, _Float16 b, _Float16 c) {
+  return a + b + c;
+}
+
+_Float16 div(_Float16

[PATCH] D128571: [X86] Support `_Float16` on SSE2 and up

2022-06-29 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei updated this revision to Diff 441272.
pengfei added a comment.

Address review comments. Thanks @MaskRay !


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128571/new/

https://reviews.llvm.org/D128571

Files:
  clang/docs/LanguageExtensions.rst
  clang/docs/ReleaseNotes.rst
  clang/lib/Basic/Targets/X86.cpp
  clang/test/CodeGen/X86/Float16-arithmetic.c
  clang/test/CodeGen/X86/Float16-complex.c
  clang/test/CodeGen/X86/avx512fp16-complex.c
  clang/test/Sema/Float16.c
  clang/test/Sema/conversion-target-dep.c
  clang/test/SemaCXX/Float16.cpp
  compiler-rt/test/builtins/CMakeLists.txt

Index: compiler-rt/test/builtins/CMakeLists.txt
===
--- compiler-rt/test/builtins/CMakeLists.txt
+++ compiler-rt/test/builtins/CMakeLists.txt
@@ -44,9 +44,17 @@
 string(REPLACE ";" " " BUILTINS_TEST_TARGET_CFLAGS "${BUILTINS_TEST_TARGET_CFLAGS}")
   endif()
 
-  if (${arch} MATCHES "arm|aarch64|arm64" AND COMPILER_RT_HAS_FLOAT16)
-list(APPEND BUILTINS_TEST_TARGET_CFLAGS -DCOMPILER_RT_HAS_FLOAT16)
-string(REPLACE ";" " " BUILTINS_TEST_TARGET_CFLAGS "${BUILTINS_TEST_TARGET_CFLAGS}")
+  if(APPLE)
+# TODO: Support the new ABI on Apple platforms.
+if (${arch} MATCHES "arm|aarch64|arm64" AND COMPILER_RT_HAS_FLOAT16)
+  list(APPEND BUILTINS_TEST_TARGET_CFLAGS -DCOMPILER_RT_HAS_FLOAT16)
+  string(REPLACE ";" " " BUILTINS_TEST_TARGET_CFLAGS "${BUILTINS_TEST_TARGET_CFLAGS}")
+endif()
+  else()
+if (${arch} MATCHES "arm|aarch64|arm64|i?86|x86_64|AMD64" AND COMPILER_RT_HAS_FLOAT16)
+  list(APPEND BUILTINS_TEST_TARGET_CFLAGS -DCOMPILER_RT_HAS_FLOAT16)
+  string(REPLACE ";" " " BUILTINS_TEST_TARGET_CFLAGS "${BUILTINS_TEST_TARGET_CFLAGS}")
+endif()
   endif()
 
   if(COMPILER_RT_ENABLE_CET)
Index: clang/test/SemaCXX/Float16.cpp
===
--- clang/test/SemaCXX/Float16.cpp
+++ clang/test/SemaCXX/Float16.cpp
@@ -1,4 +1,6 @@
-// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc %s
+// RUN: %clang_cc1 -fsyntax-only -verify -triple i686-linux-pc %s
+// RUN: %clang_cc1 -fsyntax-only -verify -triple i686-linux-pc -target-feature +sse2 %s -DHAVE
+// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple spir-unknown-unknown %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple armv7a-linux-gnu %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple aarch64-linux-gnu %s -DHAVE
Index: clang/test/Sema/conversion-target-dep.c
===
--- clang/test/Sema/conversion-target-dep.c
+++ clang/test/Sema/conversion-target-dep.c
@@ -6,7 +6,7 @@
 
 long double ld;
 double d;
-_Float16 f16; // x86-error {{_Float16 is not supported on this target}}
+_Float16 f16;
 
 int main(void) {
   ld = d; // x86-warning {{implicit conversion increases floating-point precision: 'double' to 'long double'}}
Index: clang/test/Sema/Float16.c
===
--- clang/test/Sema/Float16.c
+++ clang/test/Sema/Float16.c
@@ -1,5 +1,6 @@
-// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc %s
-// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc -target-feature +avx512fp16 %s -DHAVE
+// RUN: %clang_cc1 -fsyntax-only -verify -triple i686-linux-pc %s
+// RUN: %clang_cc1 -fsyntax-only -verify -triple i686-linux-pc -target-feature +sse2 %s -DHAVE
+// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple spir-unknown-unknown %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple armv7a-linux-gnu %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple aarch64-linux-gnu %s -DHAVE
Index: clang/test/CodeGen/X86/Float16-complex.c
===
--- clang/test/CodeGen/X86/Float16-complex.c
+++ clang/test/CodeGen/X86/Float16-complex.c
@@ -1,4 +1,5 @@
 // RUN: %clang_cc1 %s -O0 -emit-llvm -triple x86_64-unknown-unknown -target-feature +avx512fp16 -o - | FileCheck %s --check-prefix=X86
+// RUN: %clang_cc1 %s -O0 -emit-llvm -triple x86_64-unknown-unknown -o - | FileCheck %s --check-prefix=X86
 
 _Float16 _Complex add_half_rr(_Float16 a, _Float16 b) {
   // X86-LABEL: @add_half_rr(
Index: clang/test/CodeGen/X86/Float16-arithmetic.c
===
--- /dev/null
+++ clang/test/CodeGen/X86/Float16-arithmetic.c
@@ -0,0 +1,112 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang_cc1 -triple x86_64-unknown-unknown -emit-llvm -o - %s | FileCheck %s
+
+
+// CHECK-LABEL: @add1(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[A_ADDR:%.*]] = alloca half, align 2
+// CHECK-NEXT:[[B_ADDR:%.*]] = alloca half, align 2
+// CHECK-NEXT:store half [[A

[PATCH] D128571: [X86] Support `_Float16` on SSE2 and up

2022-06-29 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added inline comments.



Comment at: clang/lib/Basic/Targets/X86.cpp:357
 
+// Turn on _float16 for x86 (feature sse2)
+HasFloat16 = SSELevel >= SSE2;

MaskRay wrote:
> MaskRay wrote:
> > `_Float16`
> > 
> > `for x86` convey no extra information since this file is for x86.
> Thinking again: The comment just repeats what the code does. So it can be 
> deleted.
Yeah, I had the same feeling when updating. Will delete, thanks! :)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128571/new/

https://reviews.llvm.org/D128571

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D128571: [X86] Support `_Float16` on SSE2 and up

2022-06-30 Thread Phoebe Wang via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGabeeae570eff: [X86] Support `_Float16` on SSE2 and up 
(authored by pengfei).

Changed prior to commit:
  https://reviews.llvm.org/D128571?vs=441272&id=441315#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128571/new/

https://reviews.llvm.org/D128571

Files:
  clang/docs/LanguageExtensions.rst
  clang/docs/ReleaseNotes.rst
  clang/lib/Basic/Targets/X86.cpp
  clang/test/CodeGen/X86/Float16-arithmetic.c
  clang/test/CodeGen/X86/Float16-complex.c
  clang/test/CodeGen/X86/avx512fp16-complex.c
  clang/test/Sema/Float16.c
  clang/test/Sema/conversion-target-dep.c
  clang/test/SemaCXX/Float16.cpp
  compiler-rt/test/builtins/CMakeLists.txt

Index: compiler-rt/test/builtins/CMakeLists.txt
===
--- compiler-rt/test/builtins/CMakeLists.txt
+++ compiler-rt/test/builtins/CMakeLists.txt
@@ -44,9 +44,17 @@
 string(REPLACE ";" " " BUILTINS_TEST_TARGET_CFLAGS "${BUILTINS_TEST_TARGET_CFLAGS}")
   endif()
 
-  if (${arch} MATCHES "arm|aarch64|arm64" AND COMPILER_RT_HAS_FLOAT16)
-list(APPEND BUILTINS_TEST_TARGET_CFLAGS -DCOMPILER_RT_HAS_FLOAT16)
-string(REPLACE ";" " " BUILTINS_TEST_TARGET_CFLAGS "${BUILTINS_TEST_TARGET_CFLAGS}")
+  if(APPLE)
+# TODO: Support the new ABI on Apple platforms.
+if (${arch} MATCHES "arm|aarch64|arm64" AND COMPILER_RT_HAS_FLOAT16)
+  list(APPEND BUILTINS_TEST_TARGET_CFLAGS -DCOMPILER_RT_HAS_FLOAT16)
+  string(REPLACE ";" " " BUILTINS_TEST_TARGET_CFLAGS "${BUILTINS_TEST_TARGET_CFLAGS}")
+endif()
+  else()
+if (${arch} MATCHES "arm|aarch64|arm64|i?86|x86_64|AMD64" AND COMPILER_RT_HAS_FLOAT16)
+  list(APPEND BUILTINS_TEST_TARGET_CFLAGS -DCOMPILER_RT_HAS_FLOAT16)
+  string(REPLACE ";" " " BUILTINS_TEST_TARGET_CFLAGS "${BUILTINS_TEST_TARGET_CFLAGS}")
+endif()
   endif()
 
   if(COMPILER_RT_ENABLE_CET)
Index: clang/test/SemaCXX/Float16.cpp
===
--- clang/test/SemaCXX/Float16.cpp
+++ clang/test/SemaCXX/Float16.cpp
@@ -1,4 +1,6 @@
-// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc %s
+// RUN: %clang_cc1 -fsyntax-only -verify -triple i686-linux-pc %s
+// RUN: %clang_cc1 -fsyntax-only -verify -triple i686-linux-pc -target-feature +sse2 %s -DHAVE
+// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple spir-unknown-unknown %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple armv7a-linux-gnu %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple aarch64-linux-gnu %s -DHAVE
Index: clang/test/Sema/conversion-target-dep.c
===
--- clang/test/Sema/conversion-target-dep.c
+++ clang/test/Sema/conversion-target-dep.c
@@ -6,7 +6,7 @@
 
 long double ld;
 double d;
-_Float16 f16; // x86-error {{_Float16 is not supported on this target}}
+_Float16 f16;
 
 int main(void) {
   ld = d; // x86-warning {{implicit conversion increases floating-point precision: 'double' to 'long double'}}
Index: clang/test/Sema/Float16.c
===
--- clang/test/Sema/Float16.c
+++ clang/test/Sema/Float16.c
@@ -1,5 +1,6 @@
-// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc %s
-// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc -target-feature +avx512fp16 %s -DHAVE
+// RUN: %clang_cc1 -fsyntax-only -verify -triple i686-linux-pc %s
+// RUN: %clang_cc1 -fsyntax-only -verify -triple i686-linux-pc -target-feature +sse2 %s -DHAVE
+// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64-linux-pc %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple spir-unknown-unknown %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple armv7a-linux-gnu %s -DHAVE
 // RUN: %clang_cc1 -fsyntax-only -verify -triple aarch64-linux-gnu %s -DHAVE
Index: clang/test/CodeGen/X86/Float16-complex.c
===
--- clang/test/CodeGen/X86/Float16-complex.c
+++ clang/test/CodeGen/X86/Float16-complex.c
@@ -1,4 +1,5 @@
 // RUN: %clang_cc1 %s -O0 -emit-llvm -triple x86_64-unknown-unknown -target-feature +avx512fp16 -o - | FileCheck %s --check-prefix=X86
+// RUN: %clang_cc1 %s -O0 -emit-llvm -triple x86_64-unknown-unknown -o - | FileCheck %s --check-prefix=X86
 
 _Float16 _Complex add_half_rr(_Float16 a, _Float16 b) {
   // X86-LABEL: @add_half_rr(
Index: clang/test/CodeGen/X86/Float16-arithmetic.c
===
--- /dev/null
+++ clang/test/CodeGen/X86/Float16-arithmetic.c
@@ -0,0 +1,112 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang_cc1 -triple x86_64-unknown-unknown -emit-llvm -o - %s | FileCheck %s
+
+

[PATCH] D113107: Support of expression granularity for _Float16.

2022-06-30 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

> Am I understanding correctly? @pengfei you are interested in the 
> -fexcess-precision=16 part of this right? @rjmccall what do yo think?

I agree with @rjmccall , we just need to disable what we do here for 
`-fexcess-precision=16`.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113107/new/

https://reviews.llvm.org/D113107

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-07-02 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

In D107082#3626632 , @sylvestre.ledru 
wrote:

> Same as in https://reviews.llvm.org/D114099
> It breaks the build on ubuntu bionic, Hirsute, etc on amd64:
>
>   
> "/build/llvm-toolchain-snapshot-15~++20220702091600+23ee84f43201/build-llvm/./bin/clang"
>  --target=x86_64-pc-linux-gnu -DVISIBILITY_HIDDEN  -fstack-protector-strong 
> -Wformat -Werror=format-security -Wno-unused-command-line-argument 
> -Wdate-time -D_FORTIFY_SOURCE=2 -O3 -DNDEBUG -m32 -DCOMPILER_RT_HAS_FLOAT16 
> -std=c11 -fPIC -fno-builtin -fvisibility=hidden -fomit-frame-pointer -MD -MT 
> CMakeFiles/clang_rt.builtins-i386.dir/extendhfsf2.c.o -MF 
> CMakeFiles/clang_rt.builtins-i386.dir/extendhfsf2.c.o.d -o 
> CMakeFiles/clang_rt.builtins-i386.dir/extendhfsf2.c.o -c 
> '/build/llvm-toolchain-snapshot-15~++20220702091600+23ee84f43201/compiler-rt/lib/builtins/extendhfsf2.c'
>   In file included from 
> /build/llvm-toolchain-snapshot-15~++20220702091600+23ee84f43201/compiler-rt/lib/builtins/extendhfsf2.c:11:
>   In file included from 
> /build/llvm-toolchain-snapshot-15~++20220702091600+23ee84f43201/compiler-rt/lib/builtins/fp_extend_impl.inc:38:
>   
> /build/llvm-toolchain-snapshot-15~++20220702091600+23ee84f43201/compiler-rt/lib/builtins/fp_extend.h:44:9:
>  error: _Float16 is not supported on this target
>   typedef _Float16 src_t;
>   ^
>   1 error generated.

Hi @sylvestre.ledru , thanks for reporting this issue.

It looks to me a configuration (or option mismatch) problem in compiler-rt. We 
support the `_Float16` type on targets that have SSE2 and/or up features. A 
32-bit target doesn't enable SSE2 feature by default. This should be fine 
because the cmake of compiler-rt will detect the buildable of `_Float16` first 
and set `COMPILER_RT_HAS_FLOAT16` accordingly. So this issue looks to me it 
passed the detection of `_Float16` with a SSE2 enabled option but built the 
compiler-rt with a different option (SSE2 disabled).

I'd suggest to add an extra `-msse2` when build it if possible. Otherwise, 
don't let `-DCOMPILER_RT_HAS_FLOAT16` been passed here.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-07-02 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

FYI, `COMPILER_RT_HAS_FLOAT16` is set according to 
https://github.com/llvm/llvm-project/blob/main/compiler-rt/cmake/builtin-config-ix.cmake#L25-L31
 and 
https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/builtins/CMakeLists.txt#L699


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D114099: Enable `_Float16` type support on X86 without the avx512fp16 flag

2022-07-02 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei commandeered this revision.
pengfei edited reviewers, added: zahiraam; removed: pengfei.
pengfei added a comment.

This patch was replaced by D128571 . Let me 
commandeer and abandon it.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114099/new/

https://reviews.llvm.org/D114099

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D113107: Support of expression granularity for _Float16.

2022-07-02 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

> Then we need to add the option -fexcess-precision. I am not sure for now 
> where and what values the _FLT_EVAL_METHOD should have when excess precision 
> is enabled/disabled.

I'm fine with a follow up patch to enable this option. Please notice LLVM15 
will branch on July 26. I wish we can enable the option before that and make it 
usable in LLVM15.




Comment at: clang/docs/ReleaseNotes.rst:537
 
+- Support for ``_Float16`` type has been added.
+

We don't need it anymore. It's in line 523 now.



Comment at: clang/lib/Basic/Targets/X86.cpp:374-375
   }
+  // Turn on _float16 for x86 (feature sse2)
+  HasFloat16 = SSELevel >= SSE2;
 

It's in line 358.



Comment at: clang/test/CodeGen/X86/Float16-arithmetic.c:5-14
+  // CHECK-LABEL: @add1
+  // CHECK: [[A:%.*]] = alloca half
+  // CHECK-NEXT: [[B:%.*]] = alloca half
+  // CHECK: [[A_LOAD:%.*]] = load half, ptr [[A]]
+  // CHECK-NEXT: [[A_EXT:%.*]] = fpext half [[A_LOAD]] to float
+  // CHECK-NEXT: [[B_LOAD:%.*]] = load half, ptr [[B]]
+  // CHECK-NEXT: [[B_EXT:%.*]] = fpext half [[B_LOAD]] to float

Are these code generated manually? It can be updated by command 
`llvm/utils/update_cc_test_checks.py 
clang/test/CodeGen/X86/Float16-arithmetic.c`.



Comment at: clang/test/CodeGen/X86/Float16-complex.c:5-19
+  // CHECK-LABEL: @add_half_rr(
+  // CHECK: [[A:%.*]] = alloca half
+  // CHECK-NEXT: [[B:%.*]] = alloca half
+  // CHECK: [[A_LOAD:%.*]] = load half, ptr [[A]]
+
+  // AVX-NEXT: [[B_LOAD:%.*]] = load half, ptr [[B]]
+  // AVX-NEXT: [[AB_ADD:%.*]] = fadd half [[A_LOAD]], [[B_LOAD]]

It can be generated by script too.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113107/new/

https://reviews.llvm.org/D113107

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-07-04 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

In D107082#3628120 , @sylvestre.ledru 
wrote:

> @pengfei I am not convinced it is an issue on my side. I don't have anything 
> particular in this area and using a stage2 build system.
>
> Anyway, this patch fixes the issue on my side:
> https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/-/blob/snapshot/debian/patches/force-sse2-compiler-rt.diff

I don't have much experience in compiler-rt and multi stage build. So I may be 
wrong. It looks to me like an existing problem just exposed by this patch. The 
diff is another proof.
The build command tells us it's a 32-bit build. But the change for `x86_64` 
solves it, which confirms my previous guess: You are using one configure for 
CMake (probobally 64 bit) but build for 32 bit target.
Although the diff works, it doesn't look a clean solution to me. But I don't 
have better suggestion either.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-07-06 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

Hi @jeanPerier , yes, you are right. This patch changes the calling conversion 
of fp16 from GPRs to XMMs. So you need to update the runtime. If you are using 
compiler-rt, you could simply re-build it with trunk code, or at least after 
rGabeeae57 
. If you 
are using your own runtime, you can solve the problem through the way in 
https://github.com/llvm/llvm-project/issues/56156


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-07-06 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

Thanks for confirming it! I don't have much experience in compiler-rt. But I 
think the version of clang matters much to compiler-rt particular in ABI 
changing cases like this :)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107082: [X86][RFC] Enable `_Float16` type support on X86 following the psABI

2022-07-07 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

Thanks @clementval for reporting it and the reproducer. Put a patch D129294 
 to address it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107082/new/

https://reviews.llvm.org/D107082

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D122104: [X86][regcall] Support passing / returning structures

2022-03-24 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

Ping? We have internal request for this.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122104/new/

https://reviews.llvm.org/D122104

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D122104: [X86][regcall] Support passing / returning structures

2022-03-27 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei updated this revision to Diff 418448.
pengfei marked 2 inline comments as done.
pengfei added a comment.

Address Yuanke's comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122104/new/

https://reviews.llvm.org/D122104

Files:
  clang/include/clang/CodeGen/CGFunctionInfo.h
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGen/X86/x86_64-arguments.c
  clang/test/CodeGen/aarch64-neon-tbl.c
  clang/test/CodeGen/regcall2.c

Index: clang/test/CodeGen/regcall2.c
===
--- /dev/null
+++ clang/test/CodeGen/regcall2.c
@@ -0,0 +1,28 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang_cc1 -emit-llvm %s -o - -ffreestanding -target-feature +avx512vl -triple=x86_64-pc-win32 | FileCheck %s --check-prefix=Win
+// RUN: %clang_cc1 -emit-llvm %s -o - -ffreestanding -target-feature +avx512vl -triple=x86_64-pc-linux-gnu | FileCheck %s --check-prefix=Lin
+
+#include 
+
+typedef struct {
+  __m512d r1[4];
+  __m512 r2[4];
+} __sVector;
+__sVector A;
+
+__sVector __regcall foo(int a) {
+  return A;
+}
+
+double __regcall bar(__sVector a) {
+  return a.r1[0][4];
+}
+
+// FIXME: Do we need to change for Windows?
+// Win: define dso_local x86_regcallcc void @__regcall3__foo(%struct.__sVector* noalias sret(%struct.__sVector) align 64 %agg.result, i32 noundef %a) #0
+// Win: define dso_local x86_regcallcc double @__regcall3__bar(%struct.__sVector* noundef %a) #0
+// Win: attributes #0 = { noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="0" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+avx,+avx2,+avx512f,+avx512vl,+crc32,+cx8,+f16c,+fma,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" }
+
+// Lin: define dso_local x86_regcallcc %struct.__sVector @__regcall3__foo(i32 noundef %a) #0
+// Lin: define dso_local x86_regcallcc double @__regcall3__bar([4 x <8 x double>] %a.coerce0, [4 x <16 x float>] %a.coerce1) #0
+// Lin: attributes #0 = { noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="512" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+avx,+avx2,+avx512f,+avx512vl,+crc32,+cx8,+f16c,+fma,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" }
Index: clang/test/CodeGen/aarch64-neon-tbl.c
===
--- clang/test/CodeGen/aarch64-neon-tbl.c
+++ clang/test/CodeGen/aarch64-neon-tbl.c
@@ -42,7 +42,7 @@
   return vtbl2_s8(a, b);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl2_s8([2 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl2_s8([2 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #1 {
 // CHECK:   [[__P0_I:%.*]] = alloca %struct.int8x16x2_t, align 16
 // CHECK:   [[A:%.*]] = alloca %struct.int8x16x2_t, align 16
 // CHECK:   [[COERCE_DIVE:%.*]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t* [[A]], i32 0, i32 0
@@ -89,7 +89,7 @@
   return vtbl3_s8(a, b);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl3_s8([3 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl3_s8([3 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #1 {
 // CHECK:   [[__P0_I:%.*]] = alloca %struct.int8x16x3_t, align 16
 // CHECK:   [[A:%.*]] = alloca %struct.int8x16x3_t, align 16
 // CHECK:   [[COERCE_DIVE:%.*]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t* [[A]], i32 0, i32 0
@@ -142,7 +142,7 @@
   return vtbl4_s8(a, b);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl4_s8([4 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl4_s8([4 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #1 {
 // CHECK:   [[__P0_I:%.*]] = alloca %struct.int8x16x4_t, align 16
 // CHECK:   [[A:%.*]] = alloca %struct.int8x16x4_t, align 16
 // CHECK:   [[COERCE_DIVE:%.*]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t* [[A]], i32 0, i32 0
@@ -352,7 +352,7 @@
   return vqtbx1_s8(a, b, c);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbx2_s8(<8 x i8> noundef %a, [2 x <16 x i8>] %b.coerce, <8 x i8> noundef %c) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbx2_s8(<8 x i8> noundef %a, [2 x <16 x i8>] %b.coerce, <8 x i8> noundef %c) #1 {
 // CHECK:   [[__P1_I:%.*]] = alloca %struct.int8x16x2_t, align 16
 // CHECK:   [[B:%.*]] = alloca %struct.int8x16x2_t, align 16
 // CHECK:   [[COERCE_DIVE:%.*]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t* [[B]], i32 0, i32 0
@@ -373,7 +373,7 @@
   return vqtbx2_s8(a, b, c);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbx3_s8(<8 x i8> noundef %a, [3 x <16 x i8>] %b.coerce, <8 x i8> noundef %c) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @tes

[PATCH] D122104: [X86][regcall] Support passing / returning structures

2022-03-27 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added inline comments.



Comment at: clang/include/clang/CodeGen/CGFunctionInfo.h:590
+  /// Log 2 of the maximum vector width.
+  unsigned MaxVectorWidth : 4;
+

LuoYuanke wrote:
> I notice some code would indicate it is log 2 size with Log2 suffix in the 
> variable name. Do you think it is more readable to add Log2 suffix?
I think it doesn't matter. We have getter and setter for the variable. People 
won't access it directly.



Comment at: clang/lib/CodeGen/CGCall.cpp:5238
+  for (unsigned i = 0; i < IRCallArgs.size(); ++i)
+LargestVectorWidth = std::max(LargestVectorWidth,
+  getMaxVectorWidth(IRCallArgs[i]->getType()));

LuoYuanke wrote:
> Does this also affect other calling convention besides fastcall?
I don't think so. The change here adds for the missing cases like `[2 x <4 x 
double>]` or `{ <2 x double>, <4 x double> }` which should also set 
`min-legal-width-width` to the maximum of the vector length.
There're several reasons why other calling convention won't be affected.
1. If a target has ability to pass arguments like `[2 x <4 x double>]`, it must 
have the ability for `<4 x double>` and have set `min-legal-width-width` to 256 
when passing it. So it makes more sense to set `min-legal-width-width` to 256 
for `[2 x <4 x double>]` rather than keeping it as 0;
2. AFAIK, targets other than X86 simply ignore `min-legal-width-width`. So the 
change won't affect them;
3. On x86, calling conventions other than regcall don't allow arguments size 
larger than 512, see `if (!IsRegCall && Size > 512)`. They will be turned into 
pointers, so they won't be affected by this change;
4. For arguments size no larger than 512 and only contain single vector 
element, we have already turned them into pure vectors. So they have already 
set `min-legal-width-width` to the correct value;
5. For arguments have more then one vector elements. Clang has bug which 
doesn't match with GCC and ICC. I have filed a bug here 
https://github.com/llvm/llvm-project/issues/54582
6. Thus, only regcall can generate arguments type like `[2 x <4 x double>]` on 
X86. So only it will be affected by this.



Comment at: clang/lib/CodeGen/TargetInfo.cpp:3031
 // than eight eightbytes, ..., it has class MEMORY.
-if (Size > 512)
+if (!IsRegCall && Size > 512)
   return;

LuoYuanke wrote:
> Would you add a test for non regcall? Pass 1024 bit vector parameter and 
> check if it is well handled both with regcall and without regcall.
> Would you add comments to depict why regcall accept the size which is more 
> than 512?
Added one to non regcall. regcall doesn't specify how to handle 1024 bit 
vector. I'd take it as UB, so we don't need such a test.
https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/c-c-calling-conventions.html



Comment at: clang/test/CodeGen/aarch64-neon-tbl.c:45
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl2_s8([2 x <16 x i8>] 
%a.coerce, <8 x i8> noundef %b) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl2_s8([2 x <16 x i8>] 
%a.coerce, <8 x i8> noundef %b) #1 {
 // CHECK:   [[__P0_I:%.*]] = alloca %struct.int8x16x2_t, align 16

LuoYuanke wrote:
> I'm curious why aarch64 test cases are affected by the patch.
As I explained above, `[2 x <16 x i8>]` should have the same value of 
`min-legal-vector-width` as `<16 x i8>`. The difference between `#0` and `#1` 
is the value of `min-legal-vector-width`.



Comment at: clang/test/CodeGen/regcall2.c:2
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang_cc1 -emit-llvm %s -o - -ffreestanding -target-feature +avx512vl 
-triple=x86_64-pc-win32 | FileCheck %s --check-prefix=Win
+// RUN: %clang_cc1 -emit-llvm %s -o - -ffreestanding -target-feature +avx512vl 
-triple=x86_64-pc-linux-gnu | FileCheck %s --check-prefix=Lin

LuoYuanke wrote:
> Add test case for target that has no avx512 feature?
I'd take it as UB for Clang and GCC using 512 bit vector without avx512 
feature. ICC always promotes to avx512 when it finds 512 bit vector. So we 
don't need such tests.



Comment at: clang/test/CodeGen/regcall2.c:9
+  __m512d r1[4];
+  __m512 r2[4];
+} __sVector;

LuoYuanke wrote:
> May add a test case to show what's the max register we can pass with regcall.
We have tests for it in `clang/test/CodeGen/regcall.c`. This patch doesn't 
affect the capability of regcall.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122104/new/

https://reviews.llvm.org/D122104

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D122104: [X86][regcall] Support passing / returning structures

2022-03-28 Thread Phoebe Wang via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rGcd26190a10fc: [X86][regcall] Support passing / returning 
structures (authored by pengfei).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122104/new/

https://reviews.llvm.org/D122104

Files:
  clang/include/clang/CodeGen/CGFunctionInfo.h
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGen/X86/x86_64-arguments.c
  clang/test/CodeGen/aarch64-neon-tbl.c
  clang/test/CodeGen/regcall2.c

Index: clang/test/CodeGen/regcall2.c
===
--- /dev/null
+++ clang/test/CodeGen/regcall2.c
@@ -0,0 +1,28 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang_cc1 -emit-llvm %s -o - -ffreestanding -target-feature +avx512vl -triple=x86_64-pc-win32 | FileCheck %s --check-prefix=Win
+// RUN: %clang_cc1 -emit-llvm %s -o - -ffreestanding -target-feature +avx512vl -triple=x86_64-pc-linux-gnu | FileCheck %s --check-prefix=Lin
+
+#include 
+
+typedef struct {
+  __m512d r1[4];
+  __m512 r2[4];
+} __sVector;
+__sVector A;
+
+__sVector __regcall foo(int a) {
+  return A;
+}
+
+double __regcall bar(__sVector a) {
+  return a.r1[0][4];
+}
+
+// FIXME: Do we need to change for Windows?
+// Win: define dso_local x86_regcallcc void @__regcall3__foo(%struct.__sVector* noalias sret(%struct.__sVector) align 64 %agg.result, i32 noundef %a) #0
+// Win: define dso_local x86_regcallcc double @__regcall3__bar(%struct.__sVector* noundef %a) #0
+// Win: attributes #0 = { noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="0" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+avx,+avx2,+avx512f,+avx512vl,+crc32,+cx8,+f16c,+fma,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" }
+
+// Lin: define dso_local x86_regcallcc %struct.__sVector @__regcall3__foo(i32 noundef %a) #0
+// Lin: define dso_local x86_regcallcc double @__regcall3__bar([4 x <8 x double>] %a.coerce0, [4 x <16 x float>] %a.coerce1) #0
+// Lin: attributes #0 = { noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="512" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+avx,+avx2,+avx512f,+avx512vl,+crc32,+cx8,+f16c,+fma,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" }
Index: clang/test/CodeGen/aarch64-neon-tbl.c
===
--- clang/test/CodeGen/aarch64-neon-tbl.c
+++ clang/test/CodeGen/aarch64-neon-tbl.c
@@ -42,7 +42,7 @@
   return vtbl2_s8(a, b);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl2_s8([2 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl2_s8([2 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #1 {
 // CHECK:   [[__P0_I:%.*]] = alloca %struct.int8x16x2_t, align 16
 // CHECK:   [[A:%.*]] = alloca %struct.int8x16x2_t, align 16
 // CHECK:   [[COERCE_DIVE:%.*]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t* [[A]], i32 0, i32 0
@@ -89,7 +89,7 @@
   return vtbl3_s8(a, b);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl3_s8([3 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl3_s8([3 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #1 {
 // CHECK:   [[__P0_I:%.*]] = alloca %struct.int8x16x3_t, align 16
 // CHECK:   [[A:%.*]] = alloca %struct.int8x16x3_t, align 16
 // CHECK:   [[COERCE_DIVE:%.*]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t* [[A]], i32 0, i32 0
@@ -142,7 +142,7 @@
   return vtbl4_s8(a, b);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl4_s8([4 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl4_s8([4 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #1 {
 // CHECK:   [[__P0_I:%.*]] = alloca %struct.int8x16x4_t, align 16
 // CHECK:   [[A:%.*]] = alloca %struct.int8x16x4_t, align 16
 // CHECK:   [[COERCE_DIVE:%.*]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t* [[A]], i32 0, i32 0
@@ -352,7 +352,7 @@
   return vqtbx1_s8(a, b, c);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbx2_s8(<8 x i8> noundef %a, [2 x <16 x i8>] %b.coerce, <8 x i8> noundef %c) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbx2_s8(<8 x i8> noundef %a, [2 x <16 x i8>] %b.coerce, <8 x i8> noundef %c) #1 {
 // CHECK:   [[__P1_I:%.*]] = alloca %struct.int8x16x2_t, align 16
 // CHECK:   [[B:%.*]] = alloca %struct.int8x16x2_t, align 16
 // CHECK:   [[COERCE_DIVE:%.*]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t* [[B]], i32 0, i32 0
@@ -373,7 +373,7 @@
   return vqtbx2_s8(a, b, c);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbx3_s8(<8 x i8> noundef %a, [3 x <16 x i8>] %b.coerce, <8 x i8> noundef %c) #0 {

[PATCH] D122104: [X86][regcall] Support passing / returning structures

2022-03-29 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added inline comments.



Comment at: clang/lib/CodeGen/CGCall.cpp:5238
+  for (unsigned i = 0; i < IRCallArgs.size(); ++i)
+LargestVectorWidth = std::max(LargestVectorWidth,
+  getMaxVectorWidth(IRCallArgs[i]->getType()));

pengfei wrote:
> LuoYuanke wrote:
> > Does this also affect other calling convention besides fastcall?
> I don't think so. The change here adds for the missing cases like `[2 x <4 x 
> double>]` or `{ <2 x double>, <4 x double> }` which should also set 
> `min-legal-width-width` to the maximum of the vector length.
> There're several reasons why other calling convention won't be affected.
> 1. If a target has ability to pass arguments like `[2 x <4 x double>]`, it 
> must have the ability for `<4 x double>` and have set `min-legal-width-width` 
> to 256 when passing it. So it makes more sense to set `min-legal-width-width` 
> to 256 for `[2 x <4 x double>]` rather than keeping it as 0;
> 2. AFAIK, targets other than X86 simply ignore `min-legal-width-width`. So 
> the change won't affect them;
> 3. On x86, calling conventions other than regcall don't allow arguments size 
> larger than 512, see `if (!IsRegCall && Size > 512)`. They will be turned 
> into pointers, so they won't be affected by this change;
> 4. For arguments size no larger than 512 and only contain single vector 
> element, we have already turned them into pure vectors. So they have already 
> set `min-legal-width-width` to the correct value;
> 5. For arguments have more then one vector elements. Clang has bug which 
> doesn't match with GCC and ICC. I have filed a bug here 
> https://github.com/llvm/llvm-project/issues/54582
> 6. Thus, only regcall can generate arguments type like `[2 x <4 x double>]` 
> on X86. So only it will be affected by this.
Update for bullet 5. The handling for multi vector elements is correct. It's 
also passed / returned by memory. So it's still not affected by this change.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122104/new/

https://reviews.llvm.org/D122104

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D122789: [compiler-rt] [scudo] Use -mcrc32 on x86 when available

2022-03-31 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei accepted this revision.
pengfei added a comment.
This revision is now accepted and ready to land.

LGTM, thanks!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122789/new/

https://reviews.llvm.org/D122789

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D122104: [X86][regcall] Support passing / returning structures

2022-03-31 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

Thanks @erichkeane @aaron.ballman ! Yeah, I didn't receive buildbots notice 
about that.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122104/new/

https://reviews.llvm.org/D122104

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D122789: [compiler-rt] [scudo] Use -mcrc32 on x86 when available

2022-04-01 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

In D122789#3423846 , @MaskRay wrote:

> Is this a problem with D105462 ? Should 
> -msse4.2 imply -mcrc32?

-msse4.2 implies -mcrc32: https://godbolt.org/z/xaPccrKx3


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122789/new/

https://reviews.llvm.org/D122789

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D122789: [compiler-rt] [scudo] Use -mcrc32 on x86 when available

2022-04-01 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

So it's interesting, it shouldn't fail that way https://godbolt.org/z/jcqx5x9j7


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122789/new/

https://reviews.llvm.org/D122789

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D122789: [compiler-rt] [scudo] Use -mcrc32 on x86 when available

2022-04-01 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

In D122789#3424226 , @MaskRay wrote:

> In D122789#3424213 , @pengfei wrote:
>
>> In D122789#3423846 , @MaskRay 
>> wrote:
>>
>>> Is this a problem with D105462 ? Should 
>>> -msse4.2 imply -mcrc32?
>>
>> -msse4.2 implies -mcrc32: https://godbolt.org/z/xaPccrKx3
>
> That is my understanding of clang/lib/Basic/Targets/X86.cpp:159 , but the 
> error suggested that D105462  might change 
> something in an interesting way.
> I am on a trip so do not spend time investigating the root cause.

I a bit suspect the -msse4.2 doesn't specified (correctly). The message in 
https://bugs.gentoo.org/835870 is a bit confusing. The options in first comment 
contain -msse4.2, but the following ones say "-msse4.2 [disabled]" or 
"-mno-sse4 [enabled]". If sse4.2 is disable, no doubt we should add -mcrc32 
instead.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122789/new/

https://reviews.llvm.org/D122789

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D130754: [X86] Support ``-mindirect-branch-cs-prefix`` for call and jmp to indirect thunk

2022-07-29 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei created this revision.
pengfei added a reviewer: nickdesaulniers.
Herald added a subscriber: hiraditya.
Herald added a project: All.
pengfei requested review of this revision.
Herald added subscribers: llvm-commits, cfe-commits, MaskRay.
Herald added projects: clang, LLVM.

This is to address feature request from 
https://github.com/ClangBuiltLinux/linux/issues/1665


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D130754

Files:
  clang/docs/ReleaseNotes.rst
  clang/include/clang/Basic/CodeGenOptions.def
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/CodeGen/X86/indirect-branch-cs-prefix.c
  llvm/lib/Target/X86/X86MCInstLower.cpp
  llvm/lib/Target/X86/X86ReturnThunks.cpp
  llvm/test/CodeGen/X86/attr-function-return.ll
  llvm/test/CodeGen/X86/lvi-hardening-indirectbr.ll

Index: llvm/test/CodeGen/X86/lvi-hardening-indirectbr.ll
===
--- llvm/test/CodeGen/X86/lvi-hardening-indirectbr.ll
+++ llvm/test/CodeGen/X86/lvi-hardening-indirectbr.ll
@@ -22,18 +22,22 @@
 ; X64:   callq bar
 ; X64-DAG:   movl %[[x]], %edi
 ; X64-DAG:   movq %[[fp]], %r11
-; X64:   callq __llvm_lvi_thunk_r11
+; X64:   cs
+; X64-NEXT:  callq __llvm_lvi_thunk_r11
 ; X64:   movl %[[x]], %edi
 ; X64:   callq bar
 ; X64-DAG:   movl %[[x]], %edi
 ; X64-DAG:   movq %[[fp]], %r11
-; X64:   jmp __llvm_lvi_thunk_r11 # TAILCALL
+; X64:   cs
+; X64-NEXT:  jmp __llvm_lvi_thunk_r11 # TAILCALL
 
 ; X64FAST-LABEL: icall_reg:
 ; X64FAST:   callq bar
-; X64FAST:   callq __llvm_lvi_thunk_r11
+; X64FAST:   cs
+; X64FAST-NEXT:  callq __llvm_lvi_thunk_r11
 ; X64FAST:   callq bar
-; X64FAST:   jmp __llvm_lvi_thunk_r11 # TAILCALL
+; X64FAST:   cs
+; X64FAST-NEXT:  jmp __llvm_lvi_thunk_r11 # TAILCALL
 
 
 @global_fp = external dso_local global ptr
@@ -50,16 +54,20 @@
 ; X64-LABEL: icall_global_fp:
 ; X64-DAG:   movl %edi, %[[x:[^ ]*]]
 ; X64-DAG:   movq global_fp(%rip), %r11
-; X64:   callq __llvm_lvi_thunk_r11
+; X64:   cs
+; X64-NEXT:  callq __llvm_lvi_thunk_r11
 ; X64-DAG:   movl %[[x]], %edi
 ; X64-DAG:   movq global_fp(%rip), %r11
-; X64:   jmp __llvm_lvi_thunk_r11 # TAILCALL
+; X64:   cs
+; X64-NEXT:  jmp __llvm_lvi_thunk_r11 # TAILCALL
 
 ; X64FAST-LABEL: icall_global_fp:
 ; X64FAST:   movq global_fp(%rip), %r11
-; X64FAST:   callq __llvm_lvi_thunk_r11
+; X64FAST:   cs
+; X64FAST-NEXT:  callq __llvm_lvi_thunk_r11
 ; X64FAST:   movq global_fp(%rip), %r11
-; X64FAST:   jmp __llvm_lvi_thunk_r11 # TAILCALL
+; X64FAST:   cs
+; X64FAST-NEXT:  jmp __llvm_lvi_thunk_r11 # TAILCALL
 
 
 %struct.Foo = type { ptr }
@@ -79,14 +87,18 @@
 ; X64:   movq (%rdi), %[[vptr:[^ ]*]]
 ; X64:   movq 8(%[[vptr]]), %[[fp:[^ ]*]]
 ; X64:   movq %[[fp]], %r11
-; X64:   callq __llvm_lvi_thunk_r11
+; X64:   cs
+; X64-NEXT:  callq __llvm_lvi_thunk_r11
 ; X64-DAG:   movq %[[obj]], %rdi
 ; X64-DAG:   movq %[[fp]], %r11
-; X64:   jmp __llvm_lvi_thunk_r11 # TAILCALL
+; X64:   cs
+; X64-NEXT:  jmp __llvm_lvi_thunk_r11 # TAILCALL
 
 ; X64FAST-LABEL: vcall:
-; X64FAST:   callq __llvm_lvi_thunk_r11
-; X64FAST:   jmp __llvm_lvi_thunk_r11 # TAILCALL
+; X64FAST:   cs
+; X64FAST-NEXT:  callq __llvm_lvi_thunk_r11
+; X64FAST:   cs
+; X64FAST-NEXT:  jmp __llvm_lvi_thunk_r11 # TAILCALL
 
 
 declare dso_local void @direct_callee()
@@ -113,14 +125,18 @@
 ; X64-LABEL: nonlazybind_caller:
 ; X64:   movq nonlazybind_callee@GOTPCREL(%rip), %[[REG:.*]]
 ; X64:   movq %[[REG]], %r11
-; X64:   callq __llvm_lvi_thunk_r11
+; X64:   cs
+; X64-NEXT:  callq __llvm_lvi_thunk_r11
 ; X64:   movq %[[REG]], %r11
-; X64:   jmp __llvm_lvi_thunk_r11 # TAILCALL
+; X64:   cs
+; X64-NEXT:  jmp __llvm_lvi_thunk_r11 # TAILCALL
 ; X64FAST-LABEL: nonlazybind_caller:
 ; X64FAST:   movq nonlazybind_callee@GOTPCREL(%rip), %r11
-; X64FAST:   callq __llvm_lvi_thunk_r11
+; X64FAST:   cs
+; X64FAST-NEXT:  callq __llvm_lvi_thunk_r11
 ; X64FAST:   movq nonlazybind_callee@GOTPCREL(%rip), %r11
-; X64FAST:   jmp __llvm_lvi_thunk_r11 # TAILCALL
+; X64FAST:   cs
+; X64FAST-NEXT:  jmp __llvm_lvi_thunk_r11 # TAILCALL
 
 
 ; Check that a switch gets lowered using a jump table
@@ -278,3 +294,7 @@
 ; X64-NEXT:  jmpq *%r11
 
 attributes #1 = { nonlazybind }
+
+!llvm.module.flags = !{!0}
+
+!0 = !{i32 4, !"indirect_branch_cs_prefix", i32 1}
Index: llvm/test/CodeGen/X86/attr-function-return.ll
===
--- llvm/test/CodeGen/X86/attr-function-return.ll
+++ llvm/test/CodeGen/X86/attr-function-return.ll
@@ -6,6 +6,11 @@
 define void @x() fn_ret_thunk_extern {
 ; CHECK-LABEL: x:
 ; CHECK:   # %bb.0:
+; CHECK-NEXT:cs
 ; CHECK-NEXT:jmp __x86_return_thunk
   ret void
 }
+
+!llvm.module.flags = !{!0}
+
+!0 = !{i32 4, !"indirect_branch_cs_prefix", i

[PATCH] D130964: [X86][BF16] Enable __bf16 for x86 targets.

2022-08-02 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

Add to `ReleaseNotes.rst` as well.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130964/new/

https://reviews.llvm.org/D130964

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D130754: [X86] Support ``-mindirect-branch-cs-prefix`` for call and jmp to indirect thunk

2022-08-02 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei updated this revision to Diff 449275.
pengfei marked an inline comment as done.
pengfei added a comment.

Add CC1 option test.

> When a module with "`indirect_branch_cs_prefix`" and another without the 
> module flag are merged, what the result should be? If 0, we should use `Min` 
> instead of `Override`.

I think `Override` is correct. This option is used for Linux Kernel build. When 
merged, all should be set to 1.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130754/new/

https://reviews.llvm.org/D130754

Files:
  clang/docs/ReleaseNotes.rst
  clang/include/clang/Basic/CodeGenOptions.def
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/CodeGen/X86/indirect-branch-cs-prefix.c
  clang/test/Driver/x86_features.c
  llvm/lib/Target/X86/X86MCInstLower.cpp
  llvm/lib/Target/X86/X86ReturnThunks.cpp
  llvm/test/CodeGen/X86/attr-function-return.ll
  llvm/test/CodeGen/X86/lvi-hardening-indirectbr.ll

Index: llvm/test/CodeGen/X86/lvi-hardening-indirectbr.ll
===
--- llvm/test/CodeGen/X86/lvi-hardening-indirectbr.ll
+++ llvm/test/CodeGen/X86/lvi-hardening-indirectbr.ll
@@ -22,18 +22,22 @@
 ; X64:   callq bar
 ; X64-DAG:   movl %[[x]], %edi
 ; X64-DAG:   movq %[[fp]], %r11
-; X64:   callq __llvm_lvi_thunk_r11
+; X64:   cs
+; X64-NEXT:  callq __llvm_lvi_thunk_r11
 ; X64:   movl %[[x]], %edi
 ; X64:   callq bar
 ; X64-DAG:   movl %[[x]], %edi
 ; X64-DAG:   movq %[[fp]], %r11
-; X64:   jmp __llvm_lvi_thunk_r11 # TAILCALL
+; X64:   cs
+; X64-NEXT:  jmp __llvm_lvi_thunk_r11 # TAILCALL
 
 ; X64FAST-LABEL: icall_reg:
 ; X64FAST:   callq bar
-; X64FAST:   callq __llvm_lvi_thunk_r11
+; X64FAST:   cs
+; X64FAST-NEXT:  callq __llvm_lvi_thunk_r11
 ; X64FAST:   callq bar
-; X64FAST:   jmp __llvm_lvi_thunk_r11 # TAILCALL
+; X64FAST:   cs
+; X64FAST-NEXT:  jmp __llvm_lvi_thunk_r11 # TAILCALL
 
 
 @global_fp = external dso_local global ptr
@@ -50,16 +54,20 @@
 ; X64-LABEL: icall_global_fp:
 ; X64-DAG:   movl %edi, %[[x:[^ ]*]]
 ; X64-DAG:   movq global_fp(%rip), %r11
-; X64:   callq __llvm_lvi_thunk_r11
+; X64:   cs
+; X64-NEXT:  callq __llvm_lvi_thunk_r11
 ; X64-DAG:   movl %[[x]], %edi
 ; X64-DAG:   movq global_fp(%rip), %r11
-; X64:   jmp __llvm_lvi_thunk_r11 # TAILCALL
+; X64:   cs
+; X64-NEXT:  jmp __llvm_lvi_thunk_r11 # TAILCALL
 
 ; X64FAST-LABEL: icall_global_fp:
 ; X64FAST:   movq global_fp(%rip), %r11
-; X64FAST:   callq __llvm_lvi_thunk_r11
+; X64FAST:   cs
+; X64FAST-NEXT:  callq __llvm_lvi_thunk_r11
 ; X64FAST:   movq global_fp(%rip), %r11
-; X64FAST:   jmp __llvm_lvi_thunk_r11 # TAILCALL
+; X64FAST:   cs
+; X64FAST-NEXT:  jmp __llvm_lvi_thunk_r11 # TAILCALL
 
 
 %struct.Foo = type { ptr }
@@ -79,14 +87,18 @@
 ; X64:   movq (%rdi), %[[vptr:[^ ]*]]
 ; X64:   movq 8(%[[vptr]]), %[[fp:[^ ]*]]
 ; X64:   movq %[[fp]], %r11
-; X64:   callq __llvm_lvi_thunk_r11
+; X64:   cs
+; X64-NEXT:  callq __llvm_lvi_thunk_r11
 ; X64-DAG:   movq %[[obj]], %rdi
 ; X64-DAG:   movq %[[fp]], %r11
-; X64:   jmp __llvm_lvi_thunk_r11 # TAILCALL
+; X64:   cs
+; X64-NEXT:  jmp __llvm_lvi_thunk_r11 # TAILCALL
 
 ; X64FAST-LABEL: vcall:
-; X64FAST:   callq __llvm_lvi_thunk_r11
-; X64FAST:   jmp __llvm_lvi_thunk_r11 # TAILCALL
+; X64FAST:   cs
+; X64FAST-NEXT:  callq __llvm_lvi_thunk_r11
+; X64FAST:   cs
+; X64FAST-NEXT:  jmp __llvm_lvi_thunk_r11 # TAILCALL
 
 
 declare dso_local void @direct_callee()
@@ -113,14 +125,18 @@
 ; X64-LABEL: nonlazybind_caller:
 ; X64:   movq nonlazybind_callee@GOTPCREL(%rip), %[[REG:.*]]
 ; X64:   movq %[[REG]], %r11
-; X64:   callq __llvm_lvi_thunk_r11
+; X64:   cs
+; X64-NEXT:  callq __llvm_lvi_thunk_r11
 ; X64:   movq %[[REG]], %r11
-; X64:   jmp __llvm_lvi_thunk_r11 # TAILCALL
+; X64:   cs
+; X64-NEXT:  jmp __llvm_lvi_thunk_r11 # TAILCALL
 ; X64FAST-LABEL: nonlazybind_caller:
 ; X64FAST:   movq nonlazybind_callee@GOTPCREL(%rip), %r11
-; X64FAST:   callq __llvm_lvi_thunk_r11
+; X64FAST:   cs
+; X64FAST-NEXT:  callq __llvm_lvi_thunk_r11
 ; X64FAST:   movq nonlazybind_callee@GOTPCREL(%rip), %r11
-; X64FAST:   jmp __llvm_lvi_thunk_r11 # TAILCALL
+; X64FAST:   cs
+; X64FAST-NEXT:  jmp __llvm_lvi_thunk_r11 # TAILCALL
 
 
 ; Check that a switch gets lowered using a jump table
@@ -278,3 +294,7 @@
 ; X64-NEXT:  jmpq *%r11
 
 attributes #1 = { nonlazybind }
+
+!llvm.module.flags = !{!0}
+
+!0 = !{i32 4, !"indirect_branch_cs_prefix", i32 1}
Index: llvm/test/CodeGen/X86/attr-function-return.ll
===
--- llvm/test/CodeGen/X86/attr-function-return.ll
+++ llvm/test/CodeGen/X86/attr-function-return.ll
@@ -6,6 +6,11 @@
 define void @x() fn_ret_thunk_extern {
 ; CHECK-LABEL: x:
 ; CHECK:   # %bb.0:

[PATCH] D130964: [X86][BF16] Enable __bf16 for x86 targets.

2022-08-02 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

In D130964#3694473 , @rjmccall wrote:

> How are you actually implementing `__bf16` on these targets?  There isn't 
> even hardware support for conversions.

We support `float` -> `bf16` in `AVX512BF16`. 
https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#avx512techs=AVX512_BF16
And we found some problems in how to represent `bf16` types in intrinsics. For 
example, we currently defined `__bfloat16` as `unsigned short`. We cannot stop 
user e.g., adding 2 `__bfloat16` in C code and getting the wrong result. So we 
want to introduce the type on X86. For more information, please see the 
discussions in D120395 ,

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130964/new/

https://reviews.llvm.org/D130964

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D130754: [X86] Support ``-mindirect-branch-cs-prefix`` for call and jmp to indirect thunk

2022-08-03 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added inline comments.



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:6350
 
+  if (Args.hasArg(options::OPT_mindirect_branch_cs_prefix))
+CmdArgs.push_back("-mindirect-branch-cs-prefix");

MaskRay wrote:
> This is not needed with the TableGen CC1Option change.
Do you have an example how to change the CC1Option? Removing the code directly 
won't generate the expected module flag anymore.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130754/new/

https://reviews.llvm.org/D130754

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D130754: [X86] Support ``-mindirect-branch-cs-prefix`` for call and jmp to indirect thunk

2022-08-04 Thread Phoebe Wang via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
pengfei marked an inline comment as done.
Closed by commit rG6f867f910283: [X86] Support ``-mindirect-branch-cs-prefix`` 
for call and jmp to indirect thunk (authored by pengfei).

Changed prior to commit:
  https://reviews.llvm.org/D130754?vs=449275&id=449890#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130754/new/

https://reviews.llvm.org/D130754

Files:
  clang/docs/ReleaseNotes.rst
  clang/include/clang/Basic/CodeGenOptions.def
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/CodeGen/X86/indirect-branch-cs-prefix.c
  clang/test/Driver/x86_features.c
  llvm/lib/Target/X86/X86MCInstLower.cpp
  llvm/lib/Target/X86/X86ReturnThunks.cpp
  llvm/test/CodeGen/X86/attr-function-return.ll
  llvm/test/CodeGen/X86/lvi-hardening-indirectbr.ll

Index: llvm/test/CodeGen/X86/lvi-hardening-indirectbr.ll
===
--- llvm/test/CodeGen/X86/lvi-hardening-indirectbr.ll
+++ llvm/test/CodeGen/X86/lvi-hardening-indirectbr.ll
@@ -22,18 +22,22 @@
 ; X64:   callq bar
 ; X64-DAG:   movl %[[x]], %edi
 ; X64-DAG:   movq %[[fp]], %r11
-; X64:   callq __llvm_lvi_thunk_r11
+; X64:   cs
+; X64-NEXT:  callq __llvm_lvi_thunk_r11
 ; X64:   movl %[[x]], %edi
 ; X64:   callq bar
 ; X64-DAG:   movl %[[x]], %edi
 ; X64-DAG:   movq %[[fp]], %r11
-; X64:   jmp __llvm_lvi_thunk_r11 # TAILCALL
+; X64:   cs
+; X64-NEXT:  jmp __llvm_lvi_thunk_r11 # TAILCALL
 
 ; X64FAST-LABEL: icall_reg:
 ; X64FAST:   callq bar
-; X64FAST:   callq __llvm_lvi_thunk_r11
+; X64FAST:   cs
+; X64FAST-NEXT:  callq __llvm_lvi_thunk_r11
 ; X64FAST:   callq bar
-; X64FAST:   jmp __llvm_lvi_thunk_r11 # TAILCALL
+; X64FAST:   cs
+; X64FAST-NEXT:  jmp __llvm_lvi_thunk_r11 # TAILCALL
 
 
 @global_fp = external dso_local global ptr
@@ -50,16 +54,20 @@
 ; X64-LABEL: icall_global_fp:
 ; X64-DAG:   movl %edi, %[[x:[^ ]*]]
 ; X64-DAG:   movq global_fp(%rip), %r11
-; X64:   callq __llvm_lvi_thunk_r11
+; X64:   cs
+; X64-NEXT:  callq __llvm_lvi_thunk_r11
 ; X64-DAG:   movl %[[x]], %edi
 ; X64-DAG:   movq global_fp(%rip), %r11
-; X64:   jmp __llvm_lvi_thunk_r11 # TAILCALL
+; X64:   cs
+; X64-NEXT:  jmp __llvm_lvi_thunk_r11 # TAILCALL
 
 ; X64FAST-LABEL: icall_global_fp:
 ; X64FAST:   movq global_fp(%rip), %r11
-; X64FAST:   callq __llvm_lvi_thunk_r11
+; X64FAST:   cs
+; X64FAST-NEXT:  callq __llvm_lvi_thunk_r11
 ; X64FAST:   movq global_fp(%rip), %r11
-; X64FAST:   jmp __llvm_lvi_thunk_r11 # TAILCALL
+; X64FAST:   cs
+; X64FAST-NEXT:  jmp __llvm_lvi_thunk_r11 # TAILCALL
 
 
 %struct.Foo = type { ptr }
@@ -79,14 +87,18 @@
 ; X64:   movq (%rdi), %[[vptr:[^ ]*]]
 ; X64:   movq 8(%[[vptr]]), %[[fp:[^ ]*]]
 ; X64:   movq %[[fp]], %r11
-; X64:   callq __llvm_lvi_thunk_r11
+; X64:   cs
+; X64-NEXT:  callq __llvm_lvi_thunk_r11
 ; X64-DAG:   movq %[[obj]], %rdi
 ; X64-DAG:   movq %[[fp]], %r11
-; X64:   jmp __llvm_lvi_thunk_r11 # TAILCALL
+; X64:   cs
+; X64-NEXT:  jmp __llvm_lvi_thunk_r11 # TAILCALL
 
 ; X64FAST-LABEL: vcall:
-; X64FAST:   callq __llvm_lvi_thunk_r11
-; X64FAST:   jmp __llvm_lvi_thunk_r11 # TAILCALL
+; X64FAST:   cs
+; X64FAST-NEXT:  callq __llvm_lvi_thunk_r11
+; X64FAST:   cs
+; X64FAST-NEXT:  jmp __llvm_lvi_thunk_r11 # TAILCALL
 
 
 declare dso_local void @direct_callee()
@@ -113,14 +125,18 @@
 ; X64-LABEL: nonlazybind_caller:
 ; X64:   movq nonlazybind_callee@GOTPCREL(%rip), %[[REG:.*]]
 ; X64:   movq %[[REG]], %r11
-; X64:   callq __llvm_lvi_thunk_r11
+; X64:   cs
+; X64-NEXT:  callq __llvm_lvi_thunk_r11
 ; X64:   movq %[[REG]], %r11
-; X64:   jmp __llvm_lvi_thunk_r11 # TAILCALL
+; X64:   cs
+; X64-NEXT:  jmp __llvm_lvi_thunk_r11 # TAILCALL
 ; X64FAST-LABEL: nonlazybind_caller:
 ; X64FAST:   movq nonlazybind_callee@GOTPCREL(%rip), %r11
-; X64FAST:   callq __llvm_lvi_thunk_r11
+; X64FAST:   cs
+; X64FAST-NEXT:  callq __llvm_lvi_thunk_r11
 ; X64FAST:   movq nonlazybind_callee@GOTPCREL(%rip), %r11
-; X64FAST:   jmp __llvm_lvi_thunk_r11 # TAILCALL
+; X64FAST:   cs
+; X64FAST-NEXT:  jmp __llvm_lvi_thunk_r11 # TAILCALL
 
 
 ; Check that a switch gets lowered using a jump table
@@ -278,3 +294,7 @@
 ; X64-NEXT:  jmpq *%r11
 
 attributes #1 = { nonlazybind }
+
+!llvm.module.flags = !{!0}
+
+!0 = !{i32 4, !"indirect_branch_cs_prefix", i32 1}
Index: llvm/test/CodeGen/X86/attr-function-return.ll
===
--- llvm/test/CodeGen/X86/attr-function-return.ll
+++ llvm/test/CodeGen/X86/attr-function-return.ll
@@ -6,6 +6,11 @@
 define void @x() fn_ret_thunk_extern {
 ; CHECK-LABEL: x:
 ; CHECK:   # %bb.0:
+; CHECK-NEXT:cs
 ; CHECK-NEXT:

[PATCH] D130754: [X86] Support ``-mindirect-branch-cs-prefix`` for call and jmp to indirect thunk

2022-08-04 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

Thanks for review!




Comment at: clang/lib/Driver/ToolChains/Clang.cpp:6350
 
+  if (Args.hasArg(options::OPT_mindirect_branch_cs_prefix))
+CmdArgs.push_back("-mindirect-branch-cs-prefix");

MaskRay wrote:
> pengfei wrote:
> > MaskRay wrote:
> > > This is not needed with the TableGen CC1Option change.
> > Do you have an example how to change the CC1Option? Removing the code 
> > directly won't generate the expected module flag anymore.
> OK, sorry. A statement is needed: `Args.AddLastArg(CmdArgs, 
> options::OPT_mindirect_branch_cs_prefix); `
Good to know such usage. Thank you!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130754/new/

https://reviews.llvm.org/D130754

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D131134: [X86] Report error if the amx enabled on the non-64-bits target

2022-08-04 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei accepted this revision.
pengfei added a subscriber: aaron.ballman.
pengfei added a comment.
This revision is now accepted and ready to land.

LGTM, but maybe wait one day or two for other FE folks' opinions. @aaron.ballman


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131134/new/

https://reviews.llvm.org/D131134

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D131172: [clang][llvm][doc] Add more information for the ABI change in FP16

2022-08-04 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei created this revision.
pengfei added reviewers: kparzysz, thieta, abdulras, tstellar.
Herald added a project: All.
pengfei requested review of this revision.
Herald added projects: clang, LLVM.
Herald added subscribers: llvm-commits, cfe-commits.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D131172

Files:
  clang/docs/ReleaseNotes.rst
  llvm/docs/ReleaseNotes.rst


Index: llvm/docs/ReleaseNotes.rst
===
--- llvm/docs/ReleaseNotes.rst
+++ llvm/docs/ReleaseNotes.rst
@@ -186,9 +186,25 @@
 Changes to the X86 Backend
 --
 
-* Support ``half`` type on SSE2 and above targets.
+* Support ``half`` type on SSE2 and above targets following X86 psABI.
 * Support ``rdpru`` instruction on Zen2 and above targets.
 
+During this release, ``half`` type has an ABI breaking change to provide the
+support for the ABI of ``_Float16`` type on SSE2 and above following X86 psABI.
+(`D107082 `_)
+
+The change may affect the current use of ``half`` includes (but is not limited
+to):
+
+* Frontends generating ``half`` type in function passing and/or returning
+arguments.
+* Downstream runtimes providing any ``half`` conversion builtins assuming the
+old ABI.
+* Projects built with LLVM 15.0 but using early versions of compiler-rt.
+
+When you find failures with ``half`` type, check the calling conversion of the
+code and switch it to the new ABI.
+
 Changes to the OCaml bindings
 -
 
Index: clang/docs/ReleaseNotes.rst
===
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -619,6 +619,19 @@
   will be used by Linux kernel mitigations for RETBLEED. The corresponding flag
   ``-mfunction-return=keep`` may be appended to disable the feature.
 
+The ``_Float16`` type requires SSE2 feature and above due to the instruction
+limitations. When using it on i386 targets, you need to specify ``-msse2``
+explicitly.
+
+For targets without F16C feature or above, please make sure:
+
+- Use GCC 12.0 and above if you are using libgcc.
+- If you are using compiler-rt, use the same version with the compiler.
+Early versions provided FP16 builtins in a different ABI. A workaround is to 
use
+a small code snippet to check the ABI if you cannot make sure of it.
+- If you are using downstream runtimes that provide FP16 conversions, update
+them with the new ABI.
+
 DWARF Support in Clang
 --
 


Index: llvm/docs/ReleaseNotes.rst
===
--- llvm/docs/ReleaseNotes.rst
+++ llvm/docs/ReleaseNotes.rst
@@ -186,9 +186,25 @@
 Changes to the X86 Backend
 --
 
-* Support ``half`` type on SSE2 and above targets.
+* Support ``half`` type on SSE2 and above targets following X86 psABI.
 * Support ``rdpru`` instruction on Zen2 and above targets.
 
+During this release, ``half`` type has an ABI breaking change to provide the
+support for the ABI of ``_Float16`` type on SSE2 and above following X86 psABI.
+(`D107082 `_)
+
+The change may affect the current use of ``half`` includes (but is not limited
+to):
+
+* Frontends generating ``half`` type in function passing and/or returning
+arguments.
+* Downstream runtimes providing any ``half`` conversion builtins assuming the
+old ABI.
+* Projects built with LLVM 15.0 but using early versions of compiler-rt.
+
+When you find failures with ``half`` type, check the calling conversion of the
+code and switch it to the new ABI.
+
 Changes to the OCaml bindings
 -
 
Index: clang/docs/ReleaseNotes.rst
===
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -619,6 +619,19 @@
   will be used by Linux kernel mitigations for RETBLEED. The corresponding flag
   ``-mfunction-return=keep`` may be appended to disable the feature.
 
+The ``_Float16`` type requires SSE2 feature and above due to the instruction
+limitations. When using it on i386 targets, you need to specify ``-msse2``
+explicitly.
+
+For targets without F16C feature or above, please make sure:
+
+- Use GCC 12.0 and above if you are using libgcc.
+- If you are using compiler-rt, use the same version with the compiler.
+Early versions provided FP16 builtins in a different ABI. A workaround is to use
+a small code snippet to check the ABI if you cannot make sure of it.
+- If you are using downstream runtimes that provide FP16 conversions, update
+them with the new ABI.
+
 DWARF Support in Clang
 --
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D131172: [clang][llvm][doc] Add more information for the ABI change in FP16

2022-08-04 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

This is for LLVM 15.0 release per to #56854.

Forgive me my bad English. Suggestions are welcome.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131172/new/

https://reviews.llvm.org/D131172

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D131172: [clang][llvm][doc] Add more information for the ABI change in FP16

2022-08-04 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei closed this revision.
pengfei added a comment.

In D131172#3699950 , @tstellar wrote:

> @pengfei You can commit this directly to the release/15.x branch whenever you 
> are ready.

I see. Done. Thanks all!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131172/new/

https://reviews.llvm.org/D131172

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D130964: [X86][BF16] Enable __bf16 for x86 targets.

2022-08-04 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei accepted this revision.
pengfei added a comment.
This revision is now accepted and ready to land.

LGTM.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130964/new/

https://reviews.llvm.org/D130964

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D130754: [X86] Support ``-mindirect-branch-cs-prefix`` for call and jmp to indirect thunk

2022-08-05 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

In D130754#3701837 , @nikic wrote:

> This change caused a significant compile-time regression for `O0` builds 
> (about 1%): 
> http://llvm-compile-time-tracker.com/compare.php?from=45bae1be90472c696f6ba3bb4f8fabee76040fa9&to=6f867f9102838ebe314c1f3661fdf95700386e5a&stat=instructions
>
> At a guess, fetching a module flag for every single instruction is slow?

Thanks for reporting it. I guess so. I'll try to move it outside :)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130754/new/

https://reviews.llvm.org/D130754

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D130754: [X86] Support ``-mindirect-branch-cs-prefix`` for call and jmp to indirect thunk

2022-08-05 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

In D130754#3701858 , @pengfei wrote:

> In D130754#3701837 , @nikic wrote:
>
>> This change caused a significant compile-time regression for `O0` builds 
>> (about 1%): 
>> http://llvm-compile-time-tracker.com/compare.php?from=45bae1be90472c696f6ba3bb4f8fabee76040fa9&to=6f867f9102838ebe314c1f3661fdf95700386e5a&stat=instructions
>>
>> At a guess, fetching a module flag for every single instruction is slow?
>
> Thanks for reporting it. I guess so. I'll try to move it outside :)

D131245  should help with it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130754/new/

https://reviews.llvm.org/D130754

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D131468: [WIP][BPF]: Force sign/zero extension for arguments in callee and return values in caller

2022-08-08 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

D124435  is going to change the assumption :)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131468/new/

https://reviews.llvm.org/D131468

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D130964: [X86][BF16] Enable __bf16 for x86 targets.

2022-08-09 Thread Phoebe Wang via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGe4888a37d367: [X86][BF16] Enable __bf16 for x86 targets. 
(authored by FreddyYe, committed by pengfei).

Changed prior to commit:
  https://reviews.llvm.org/D130964?vs=449187&id=451331#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130964/new/

https://reviews.llvm.org/D130964

Files:
  clang/docs/LanguageExtensions.rst
  clang/lib/Basic/Targets/X86.cpp
  clang/lib/Basic/Targets/X86.h
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGen/X86/bfloat-abi.c
  clang/test/CodeGen/X86/bfloat-half-abi.c
  clang/test/CodeGen/X86/bfloat-mangle.cpp
  clang/test/Sema/vector-decl-crash.c
  llvm/include/llvm/IR/Type.h

Index: llvm/include/llvm/IR/Type.h
===
--- llvm/include/llvm/IR/Type.h
+++ llvm/include/llvm/IR/Type.h
@@ -144,6 +144,11 @@
   /// Return true if this is 'bfloat', a 16-bit bfloat type.
   bool isBFloatTy() const { return getTypeID() == BFloatTyID; }
 
+  /// Return true if this is a 16-bit float type.
+  bool is16bitFPTy() const {
+return getTypeID() == BFloatTyID || getTypeID() == HalfTyID;
+  }
+
   /// Return true if this is 'float', a 32-bit IEEE fp type.
   bool isFloatTy() const { return getTypeID() == FloatTyID; }
 
Index: clang/test/Sema/vector-decl-crash.c
===
--- clang/test/Sema/vector-decl-crash.c
+++ clang/test/Sema/vector-decl-crash.c
@@ -1,4 +1,4 @@
-// RUN: %clang_cc1 %s -fsyntax-only -verify -triple x86_64-unknown-unknown
+// RUN: %clang_cc1 %s -fsyntax-only -verify -triple riscv64-unknown-unknown
 
 // GH50171
 // This would previously crash when __bf16 was not a supported type.
Index: clang/test/CodeGen/X86/bfloat-mangle.cpp
===
--- /dev/null
+++ clang/test/CodeGen/X86/bfloat-mangle.cpp
@@ -0,0 +1,5 @@
+// RUN: %clang_cc1 -triple i386-unknown-unknown -target-feature +sse2 -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-unknown-unknown -target-feature +sse2 -emit-llvm -o - %s | FileCheck %s
+
+// CHECK: define {{.*}}void @_Z3foou6__bf16(bfloat noundef %b)
+void foo(__bf16 b) {}
Index: clang/test/CodeGen/X86/bfloat-half-abi.c
===
--- /dev/null
+++ clang/test/CodeGen/X86/bfloat-half-abi.c
@@ -0,0 +1,149 @@
+// RUN: %clang_cc1 -triple x86_64-linux -emit-llvm  -target-feature +sse2 < %s | FileCheck %s --check-prefixes=CHECK
+
+struct bfloat1 {
+  __bf16 a;
+};
+
+struct bfloat1 h1(__bf16 a) {
+  // CHECK: define{{.*}}bfloat @
+  struct bfloat1 x;
+  x.a = a;
+  return x;
+}
+
+struct bfloat2 {
+  __bf16 a;
+  __bf16 b;
+};
+
+struct bfloat2 h2(__bf16 a, __bf16 b) {
+  // CHECK: define{{.*}}<2 x bfloat> @
+  struct bfloat2 x;
+  x.a = a;
+  x.b = b;
+  return x;
+}
+
+struct bfloat3 {
+  __bf16 a;
+  __bf16 b;
+  __bf16 c;
+};
+
+struct bfloat3 h3(__bf16 a, __bf16 b, __bf16 c) {
+  // CHECK: define{{.*}}<4 x bfloat> @
+  struct bfloat3 x;
+  x.a = a;
+  x.b = b;
+  x.c = c;
+  return x;
+}
+
+struct bfloat4 {
+  __bf16 a;
+  __bf16 b;
+  __bf16 c;
+  __bf16 d;
+};
+
+struct bfloat4 h4(__bf16 a, __bf16 b, __bf16 c, __bf16 d) {
+  // CHECK: define{{.*}}<4 x bfloat> @
+  struct bfloat4 x;
+  x.a = a;
+  x.b = b;
+  x.c = c;
+  x.d = d;
+  return x;
+}
+
+struct floatbfloat {
+  float a;
+  __bf16 b;
+};
+
+struct floatbfloat fh(float a, __bf16 b) {
+  // CHECK: define{{.*}}<4 x half> @
+  struct floatbfloat x;
+  x.a = a;
+  x.b = b;
+  return x;
+}
+
+struct floatbfloat2 {
+  float a;
+  __bf16 b;
+  __bf16 c;
+};
+
+struct floatbfloat2 fh2(float a, __bf16 b, __bf16 c) {
+  // CHECK: define{{.*}}<4 x half> @
+  struct floatbfloat2 x;
+  x.a = a;
+  x.b = b;
+  x.c = c;
+  return x;
+}
+
+struct bfloatfloat {
+  __bf16 a;
+  float b;
+};
+
+struct bfloatfloat hf(__bf16 a, float b) {
+  // CHECK: define{{.*}}<4 x half> @
+  struct bfloatfloat x;
+  x.a = a;
+  x.b = b;
+  return x;
+}
+
+struct bfloat2float {
+  __bf16 a;
+  __bf16 b;
+  float c;
+};
+
+struct bfloat2float h2f(__bf16 a, __bf16 b, float c) {
+  // CHECK: define{{.*}}<4 x bfloat> @
+  struct bfloat2float x;
+  x.a = a;
+  x.b = b;
+  x.c = c;
+  return x;
+}
+
+struct floatbfloat3 {
+  float a;
+  __bf16 b;
+  __bf16 c;
+  __bf16 d;
+};
+
+struct floatbfloat3 fh3(float a, __bf16 b, __bf16 c, __bf16 d) {
+  // CHECK: define{{.*}}{ <4 x half>, bfloat } @
+  struct floatbfloat3 x;
+  x.a = a;
+  x.b = b;
+  x.c = c;
+  x.d = d;
+  return x;
+}
+
+struct bfloat5 {
+  __bf16 a;
+  __bf16 b;
+  __bf16 c;
+  __bf16 d;
+  __bf16 e;
+};
+
+struct bfloat5 h5(__bf16 a, __bf16 b, __bf16 c, __bf16 d, __bf16 e) {
+  // CHECK: define{{.*}}{ <4 x bfloat>, bfloat } @
+  struct bfloat5 x;
+  x.a = a;
+  x.b = b;
+  x.c = c;
+  x.d = d;
+  x.e = e;
+  return x;
+}
Index: clan

[PATCH] D131134: [X86] Report error if the amx enabled on the non-64-bits target

2022-08-10 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

I have the same impression. I checked ISE 
 says AMX instructions are 
`N.E.` on 32-bit mode. And seems Linux Kernel only enables AMX on 64-bit too 
https://lwn.net/ml/linux-kernel/20210730145957.7927-22-chang.seok@intel.com/#t


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131134/new/

https://reviews.llvm.org/D131134

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107141: [Inline-asm] Add diagnosts for unsupported inline assembly arguments

2022-03-09 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

Ping @jyu2


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107141/new/

https://reviews.llvm.org/D107141

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107141: [Inline-asm] Add diagnosts for unsupported inline assembly arguments

2022-03-09 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei updated this revision to Diff 414276.
pengfei added a comment.

Address review comment.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107141/new/

https://reviews.llvm.org/D107141

Files:
  clang/include/clang/Basic/DiagnosticSemaKinds.td
  clang/lib/CodeGen/CGStmt.cpp
  clang/lib/Sema/SemaStmtAsm.cpp
  clang/test/Sema/asm.c

Index: clang/test/Sema/asm.c
===
--- clang/test/Sema/asm.c
+++ clang/test/Sema/asm.c
@@ -313,3 +313,52 @@
   asm ("jne %l0":::);
   asm goto ("jne %l0"lab);
 }
+
+typedef struct _st_size64 {
+  int a;
+  char b;
+} st_size64;
+
+typedef struct _st_size96 {
+  int a;
+  int b;
+  int c;
+} st_size96;
+
+typedef struct _st_size16 {
+  char a;
+  char b;
+} st_size16;
+
+typedef struct _st_size32 {
+  char a;
+  char b;
+  char c;
+  char d;
+} st_size32;
+
+typedef struct _st_size128 {
+  int a;
+  int b;
+  int c;
+  int d;
+} st_size128;
+
+void test19(long long x)
+{
+  st_size64 a;
+  st_size96 b;
+  st_size16 c;
+  st_size32 d;
+  st_size128 e;
+  asm ("" : "=rm" (a): "0" (1)); // no-error
+  asm ("" : "=rm" (d): "0" (1)); // no-error
+  asm ("" : "=rm" (c): "0" (x)); // no-error
+  asm ("" : "=rm" (x): "0" (a)); // no-error
+  asm ("" : "=rm" (a): "0" (d)); // no-error
+  // Check the output size is pow of 2
+  asm ("" : "=rm" (b): "0" (1)); // expected-error {{impossible constraint in asm: can't store value into a register}}
+  // Check the output size is <= 64
+  asm ("" : "=rm" (e): "0" (1)); // no-error
+  asm ("" : "=rm" (x): "0" (e)); // no-error
+}
Index: clang/lib/Sema/SemaStmtAsm.cpp
===
--- clang/lib/Sema/SemaStmtAsm.cpp
+++ clang/lib/Sema/SemaStmtAsm.cpp
@@ -618,14 +618,16 @@
   AD_Int, AD_FP, AD_Other
 } InputDomain, OutputDomain;
 
-if (InTy->isIntegerType() || InTy->isPointerType())
+if (InTy->isIntegerType() || InTy->isPointerType() ||
+InTy->isStructureType() || InTy->isConstantArrayType())
   InputDomain = AD_Int;
 else if (InTy->isRealFloatingType())
   InputDomain = AD_FP;
 else
   InputDomain = AD_Other;
 
-if (OutTy->isIntegerType() || OutTy->isPointerType())
+if (OutTy->isIntegerType() || OutTy->isPointerType() ||
+OutTy->isStructureType() || OutTy->isConstantArrayType())
   OutputDomain = AD_Int;
 else if (OutTy->isRealFloatingType())
   OutputDomain = AD_FP;
@@ -667,8 +669,15 @@
 // output was a register, just extend the shorter one to the size of the
 // larger one.
 if (!SmallerValueMentioned && InputDomain != AD_Other &&
-OutputConstraintInfos[TiedTo].allowsRegister())
+OutputConstraintInfos[TiedTo].allowsRegister()) {
+  // FIXME: GCC supports the OutSize to be 128 at maximum. Currently codegen
+  // crash when the size larger than the register size. So we limit it here.
+  if (OutputDomain == AD_Int &&
+  Context.getIntTypeForBitwidth(OutSize, /*Signed*/ false).isNull())
+targetDiag(OutputExpr->getExprLoc(), diag::err_store_value_to_reg);
+
   continue;
+}
 
 // Either both of the operands were mentioned or the smaller one was
 // mentioned.  One more special case that we'll allow: if the tied input is
Index: clang/lib/CodeGen/CGStmt.cpp
===
--- clang/lib/CodeGen/CGStmt.cpp
+++ clang/lib/CodeGen/CGStmt.cpp
@@ -2725,9 +2725,8 @@
   QualType Ty = getContext().getIntTypeForBitwidth(Size, /*Signed*/ false);
   if (Ty.isNull()) {
 const Expr *OutExpr = S.getOutputExpr(i);
-CGM.Error(
-OutExpr->getExprLoc(),
-"impossible constraint in asm: can't store value into a register");
+CGM.getDiags().Report(OutExpr->getExprLoc(),
+  diag::err_store_value_to_reg);
 return;
   }
   Dest = MakeAddrLValue(A, Ty);
Index: clang/include/clang/Basic/DiagnosticSemaKinds.td
===
--- clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -8763,6 +8763,8 @@
 " in asm %select{input|output}1 with a memory constraint '%2'">;
   def err_asm_input_duplicate_match : Error<
 "more than one input constraint matches the same output '%0'">;
+  def err_store_value_to_reg : Error<
+"impossible constraint in asm: can't store value into a register">;
 
   def warn_asm_label_on_auto_decl : Warning<
 "ignored asm label '%0' on automatic variable">;
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107141: [Inline-asm] Add diagnosts for unsupported inline assembly arguments

2022-03-09 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added inline comments.



Comment at: clang/lib/Sema/SemaStmtAsm.cpp:679
+  !llvm::isPowerOf2_32(OutSize))
+targetDiag(OutputExpr->getExprLoc(), diag::err_store_value_to_reg);
+

jyu2 wrote:
> Error message is not very clear to me.  I think we should create more 
> specified error message there.  Like power of two, or size <  8 or > pointer 
> size?
> 
> Using error message selector.
Use `getIntTypeForBitwidth` instead. So we don't need to check for each case.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107141/new/

https://reviews.llvm.org/D107141

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107141: [Inline-asm] Add diagnosts for unsupported inline assembly arguments

2022-03-09 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei updated this revision to Diff 414277.
pengfei added a comment.

Remove outdated comment


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107141/new/

https://reviews.llvm.org/D107141

Files:
  clang/include/clang/Basic/DiagnosticSemaKinds.td
  clang/lib/CodeGen/CGStmt.cpp
  clang/lib/Sema/SemaStmtAsm.cpp
  clang/test/Sema/asm.c

Index: clang/test/Sema/asm.c
===
--- clang/test/Sema/asm.c
+++ clang/test/Sema/asm.c
@@ -313,3 +313,51 @@
   asm ("jne %l0":::);
   asm goto ("jne %l0"lab);
 }
+
+typedef struct _st_size64 {
+  int a;
+  char b;
+} st_size64;
+
+typedef struct _st_size96 {
+  int a;
+  int b;
+  int c;
+} st_size96;
+
+typedef struct _st_size16 {
+  char a;
+  char b;
+} st_size16;
+
+typedef struct _st_size32 {
+  char a;
+  char b;
+  char c;
+  char d;
+} st_size32;
+
+typedef struct _st_size128 {
+  int a;
+  int b;
+  int c;
+  int d;
+} st_size128;
+
+void test19(long long x)
+{
+  st_size64 a;
+  st_size96 b;
+  st_size16 c;
+  st_size32 d;
+  st_size128 e;
+  asm ("" : "=rm" (a): "0" (1)); // no-error
+  asm ("" : "=rm" (d): "0" (1)); // no-error
+  asm ("" : "=rm" (c): "0" (x)); // no-error
+  asm ("" : "=rm" (x): "0" (a)); // no-error
+  asm ("" : "=rm" (a): "0" (d)); // no-error
+  // Check the output size is pow of 2
+  asm ("" : "=rm" (b): "0" (1)); // expected-error {{impossible constraint in asm: can't store value into a register}}
+  asm ("" : "=rm" (e): "0" (1)); // no-error
+  asm ("" : "=rm" (x): "0" (e)); // no-error
+}
Index: clang/lib/Sema/SemaStmtAsm.cpp
===
--- clang/lib/Sema/SemaStmtAsm.cpp
+++ clang/lib/Sema/SemaStmtAsm.cpp
@@ -618,14 +618,16 @@
   AD_Int, AD_FP, AD_Other
 } InputDomain, OutputDomain;
 
-if (InTy->isIntegerType() || InTy->isPointerType())
+if (InTy->isIntegerType() || InTy->isPointerType() ||
+InTy->isStructureType() || InTy->isConstantArrayType())
   InputDomain = AD_Int;
 else if (InTy->isRealFloatingType())
   InputDomain = AD_FP;
 else
   InputDomain = AD_Other;
 
-if (OutTy->isIntegerType() || OutTy->isPointerType())
+if (OutTy->isIntegerType() || OutTy->isPointerType() ||
+OutTy->isStructureType() || OutTy->isConstantArrayType())
   OutputDomain = AD_Int;
 else if (OutTy->isRealFloatingType())
   OutputDomain = AD_FP;
@@ -667,8 +669,15 @@
 // output was a register, just extend the shorter one to the size of the
 // larger one.
 if (!SmallerValueMentioned && InputDomain != AD_Other &&
-OutputConstraintInfos[TiedTo].allowsRegister())
+OutputConstraintInfos[TiedTo].allowsRegister()) {
+  // FIXME: GCC supports the OutSize to be 128 at maximum. Currently codegen
+  // crash when the size larger than the register size. So we limit it here.
+  if (OutputDomain == AD_Int &&
+  Context.getIntTypeForBitwidth(OutSize, /*Signed*/ false).isNull())
+targetDiag(OutputExpr->getExprLoc(), diag::err_store_value_to_reg);
+
   continue;
+}
 
 // Either both of the operands were mentioned or the smaller one was
 // mentioned.  One more special case that we'll allow: if the tied input is
Index: clang/lib/CodeGen/CGStmt.cpp
===
--- clang/lib/CodeGen/CGStmt.cpp
+++ clang/lib/CodeGen/CGStmt.cpp
@@ -2725,9 +2725,8 @@
   QualType Ty = getContext().getIntTypeForBitwidth(Size, /*Signed*/ false);
   if (Ty.isNull()) {
 const Expr *OutExpr = S.getOutputExpr(i);
-CGM.Error(
-OutExpr->getExprLoc(),
-"impossible constraint in asm: can't store value into a register");
+CGM.getDiags().Report(OutExpr->getExprLoc(),
+  diag::err_store_value_to_reg);
 return;
   }
   Dest = MakeAddrLValue(A, Ty);
Index: clang/include/clang/Basic/DiagnosticSemaKinds.td
===
--- clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -8763,6 +8763,8 @@
 " in asm %select{input|output}1 with a memory constraint '%2'">;
   def err_asm_input_duplicate_match : Error<
 "more than one input constraint matches the same output '%0'">;
+  def err_store_value_to_reg : Error<
+"impossible constraint in asm: can't store value into a register">;
 
   def warn_asm_label_on_auto_decl : Warning<
 "ignored asm label '%0' on automatic variable">;
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

2022-03-11 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

Typos in `wiht different feature lists` and `In the even that`.




Comment at: clang/lib/CodeGen/CodeGenModule.cpp:2067
+  // favor this processor.
+  TuneCPU = SD->getCPUName(GD.getMultiVersionIndex())->getName();
+}

erichkeane wrote:
> andrew.w.kaylor wrote:
> > Unfortunately, I don't think it's this easy. The list of names used for 
> > cpu_specific doesn't come from the same place as the list of names used by 
> > "tune-cpu". For one thing, the cpu_specific names can't contain the '-' 
> > character, so we have names like "skylake_avx512" in cpu_specific that 
> > would need to be translated to "skylake-avx512" for "tune-cpu". I believe 
> > the list of valid names for "tune-cpu" comes from here: 
> > https://github.com/llvm/llvm-project/blob/26cd258420c774254cc48330b1f4d23d353baf05/llvm/lib/Support/X86TargetParser.cpp#L294
> > 
> > Also, some of the aliases supported by cpu_specific don't have any 
> > corresponding "tune-cpu" name. You happen to have picked one of these for 
> > the test. I believe "core_4th_gen_avx" should map to "haswell".
> Hmm... this is unfortunate.  I wonder if we add some 'translation' type field 
> to the X86TargetParser.def entries?  Any idea who the right one to populate 
> said list would be?
> I believe the list of valid names for "tune-cpu" comes from ...

I think it's here 
https://github.com/llvm/llvm-project/blob/26cd258420c774254cc48330b1f4d23d353baf05/llvm/lib/Target/X86/X86.td#L1408

So back to Andy's problems, where we consume the cpu_specific names in compiler 
previously, e.g., mapping to different targets? Or it is done by external 
libraries like compiler-rt?

I think I have the same requirments that mapping `-` and `_` for "tune-cpu" in 
https://github.com/llvm/llvm-project/issues/50125 where the preprocessor 
defines use `_` as well.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121410/new/

https://reviews.llvm.org/D121410

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107141: [Inline-asm] Add diagnosts for unsupported inline assembly arguments

2022-03-11 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei updated this revision to Diff 414614.
pengfei added a comment.

Address review comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107141/new/

https://reviews.llvm.org/D107141

Files:
  clang/include/clang/Basic/DiagnosticSemaKinds.td
  clang/lib/CodeGen/CGStmt.cpp
  clang/lib/Sema/SemaStmtAsm.cpp
  clang/test/Sema/asm.c

Index: clang/test/Sema/asm.c
===
--- clang/test/Sema/asm.c
+++ clang/test/Sema/asm.c
@@ -313,3 +313,51 @@
   asm ("jne %l0":::);
   asm goto ("jne %l0"lab);
 }
+
+typedef struct _st_size64 {
+  int a;
+  char b;
+} st_size64;
+
+typedef struct _st_size96 {
+  int a;
+  int b;
+  int c;
+} st_size96;
+
+typedef struct _st_size16 {
+  char a;
+  char b;
+} st_size16;
+
+typedef struct _st_size32 {
+  char a;
+  char b;
+  char c;
+  char d;
+} st_size32;
+
+typedef struct _st_size128 {
+  int a;
+  int b;
+  int c;
+  int d;
+} st_size128;
+
+void test19(long long x)
+{
+  st_size64 a;
+  st_size96 b;
+  st_size16 c;
+  st_size32 d;
+  st_size128 e;
+  asm ("" : "=rm" (a): "0" (1)); // no-error
+  asm ("" : "=rm" (d): "0" (1)); // no-error
+  asm ("" : "=rm" (c): "0" (x)); // no-error
+  asm ("" : "=rm" (x): "0" (a)); // no-error
+  asm ("" : "=rm" (a): "0" (d)); // no-error
+  // Check the output size is pow of 2
+  asm ("" : "=rm" (b): "0" (1)); // expected-error {{impossible constraint in asm: can't store value into a register}}
+  asm ("" : "=rm" (e): "0" (1)); // no-error
+  asm ("" : "=rm" (x): "0" (e)); // no-error
+}
Index: clang/lib/Sema/SemaStmtAsm.cpp
===
--- clang/lib/Sema/SemaStmtAsm.cpp
+++ clang/lib/Sema/SemaStmtAsm.cpp
@@ -618,14 +618,16 @@
   AD_Int, AD_FP, AD_Other
 } InputDomain, OutputDomain;
 
-if (InTy->isIntegerType() || InTy->isPointerType())
+if (InTy->isIntegerType() || InTy->isPointerType() ||
+InTy->isStructureType())
   InputDomain = AD_Int;
 else if (InTy->isRealFloatingType())
   InputDomain = AD_FP;
 else
   InputDomain = AD_Other;
 
-if (OutTy->isIntegerType() || OutTy->isPointerType())
+if (OutTy->isIntegerType() || OutTy->isPointerType() ||
+OutTy->isStructureType())
   OutputDomain = AD_Int;
 else if (OutTy->isRealFloatingType())
   OutputDomain = AD_FP;
@@ -667,8 +669,17 @@
 // output was a register, just extend the shorter one to the size of the
 // larger one.
 if (!SmallerValueMentioned && InputDomain != AD_Other &&
-OutputConstraintInfos[TiedTo].allowsRegister())
+OutputConstraintInfos[TiedTo].allowsRegister()) {
+  // FIXME: GCC supports the OutSize to be 128 at maximum. Currently codegen
+  // crash when the size larger than the register size. So we limit it here.
+  if (OutputDomain == AD_Int &&
+  Context.getIntTypeForBitwidth(OutSize, /*Signed*/ false).isNull()) {
+targetDiag(OutputExpr->getExprLoc(), diag::err_store_value_to_reg);
+return NS;
+  }
+
   continue;
+}
 
 // Either both of the operands were mentioned or the smaller one was
 // mentioned.  One more special case that we'll allow: if the tied input is
Index: clang/lib/CodeGen/CGStmt.cpp
===
--- clang/lib/CodeGen/CGStmt.cpp
+++ clang/lib/CodeGen/CGStmt.cpp
@@ -2725,9 +2725,8 @@
   QualType Ty = getContext().getIntTypeForBitwidth(Size, /*Signed*/ false);
   if (Ty.isNull()) {
 const Expr *OutExpr = S.getOutputExpr(i);
-CGM.Error(
-OutExpr->getExprLoc(),
-"impossible constraint in asm: can't store value into a register");
+CGM.getDiags().Report(OutExpr->getExprLoc(),
+  diag::err_store_value_to_reg);
 return;
   }
   Dest = MakeAddrLValue(A, Ty);
Index: clang/include/clang/Basic/DiagnosticSemaKinds.td
===
--- clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -8763,6 +8763,8 @@
 " in asm %select{input|output}1 with a memory constraint '%2'">;
   def err_asm_input_duplicate_match : Error<
 "more than one input constraint matches the same output '%0'">;
+  def err_store_value_to_reg : Error<
+"impossible constraint in asm: can't store value into a register">;
 
   def warn_asm_label_on_auto_decl : Warning<
 "ignored asm label '%0' on automatic variable">;
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107141: [Inline-asm] Add diagnosts for unsupported inline assembly arguments

2022-03-11 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added inline comments.



Comment at: clang/lib/Sema/SemaStmtAsm.cpp:622
+if (InTy->isIntegerType() || InTy->isPointerType() ||
+InTy->isStructureType() || InTy->isConstantArrayType())
   InputDomain = AD_Int;

jyu2 wrote:
> Are you sure you want to change the Input/output Domain?  Since you changed 
> this, could you add both codegen and sema check tests for  struct type(you 
> already has sema check for struct type, but I don't see any array type) and 
> array type.  
> 
> Thanks.
> Jennifer
The input / output domain are just for sema check here. I don't think it 
changes codegen behavior.



Comment at: clang/lib/Sema/SemaStmtAsm.cpp:677
+  Context.getIntTypeForBitwidth(OutSize, /*Signed*/ false).isNull())
+targetDiag(OutputExpr->getExprLoc(), diag::err_store_value_to_reg);
+

jyu2 wrote:
> Do you need return NS after diagnostic?
Good catch!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107141/new/

https://reviews.llvm.org/D107141

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

2022-03-11 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei accepted this revision.
pengfei added a comment.

LGTM.




Comment at: clang/lib/Basic/Targets/X86.cpp:1133
+#include "llvm/Support/X86TargetParser.def"
+.Default("");
+}

clang-format.



Comment at: llvm/include/llvm/Support/X86TargetParser.def:236
+CPU_SPECIFIC("core_i7_sse4_2", "nehalem", 'P', 
"+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt")
+CPU_SPECIFIC("core_aes_pclmulqdq", westmere", 'Q', 
"+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+popcnt")
+CPU_SPECIFIC("atom_sse4_2_movbe", "silvermont", 'd', 
"+cmov,+mmx,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+movbe,+popcnt")

Missed the left `"`?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121410/new/

https://reviews.llvm.org/D121410

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D121815: [X86] Use the unaligned vector typedefs for the lddqu intrinsics pointer arguments (PR20670)

2022-03-16 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei accepted this revision.
pengfei added a comment.
This revision is now accepted and ready to land.

LGTM, thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121815/new/

https://reviews.llvm.org/D121815

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D122104: [X86][regcall] Support passing / returning structures

2022-03-20 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei created this revision.
pengfei added reviewers: erichkeane, craig.topper, LiuChen3, LuoYuanke.
Herald added a project: All.
pengfei requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Currently, the regcall calling conversion in Clang doesn't match with
ICC when passing / returning structures. https://godbolt.org/z/axxKMKrW7

This patch tries to fix the problem to match with ICC.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D122104

Files:
  clang/include/clang/CodeGen/CGFunctionInfo.h
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGen/aarch64-neon-tbl.c
  clang/test/CodeGen/regcall2.c

Index: clang/test/CodeGen/regcall2.c
===
--- /dev/null
+++ clang/test/CodeGen/regcall2.c
@@ -0,0 +1,28 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang_cc1 -emit-llvm %s -o - -ffreestanding -target-feature +avx512vl -triple=x86_64-pc-win32 | FileCheck %s --check-prefix=Win
+// RUN: %clang_cc1 -emit-llvm %s -o - -ffreestanding -target-feature +avx512vl -triple=x86_64-pc-linux-gnu | FileCheck %s --check-prefix=Lin
+
+#include 
+
+typedef struct {
+  __m512d r1[4];
+  __m512 r2[4];
+} __sVector;
+__sVector A;
+
+__sVector __regcall foo(int a) {
+  return A;
+}
+
+double __regcall bar(__sVector a) {
+  return a.r1[0][4];
+}
+
+// FIXME: Do we need to change for Windows?
+// Win: define dso_local x86_regcallcc void @__regcall3__foo(%struct.__sVector* noalias sret(%struct.__sVector) align 64 %agg.result, i32 noundef %a) #0
+// Win: define dso_local x86_regcallcc double @__regcall3__bar(%struct.__sVector* noundef %a) #0
+// Win: attributes #0 = { noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="0" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+avx,+avx2,+avx512f,+avx512vl,+crc32,+cx8,+f16c,+fma,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" }
+
+// Lin: define dso_local x86_regcallcc %struct.__sVector @__regcall3__foo(i32 noundef %a) #0
+// Lin: define dso_local x86_regcallcc double @__regcall3__bar([4 x <8 x double>] %a.coerce0, [4 x <16 x float>] %a.coerce1) #0
+// Lin: attributes #0 = { noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="512" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+avx,+avx2,+avx512f,+avx512vl,+crc32,+cx8,+f16c,+fma,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" }
Index: clang/test/CodeGen/aarch64-neon-tbl.c
===
--- clang/test/CodeGen/aarch64-neon-tbl.c
+++ clang/test/CodeGen/aarch64-neon-tbl.c
@@ -42,7 +42,7 @@
   return vtbl2_s8(a, b);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl2_s8([2 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl2_s8([2 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #1 {
 // CHECK:   [[__P0_I:%.*]] = alloca %struct.int8x16x2_t, align 16
 // CHECK:   [[A:%.*]] = alloca %struct.int8x16x2_t, align 16
 // CHECK:   [[COERCE_DIVE:%.*]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t* [[A]], i32 0, i32 0
@@ -89,7 +89,7 @@
   return vtbl3_s8(a, b);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl3_s8([3 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl3_s8([3 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #1 {
 // CHECK:   [[__P0_I:%.*]] = alloca %struct.int8x16x3_t, align 16
 // CHECK:   [[A:%.*]] = alloca %struct.int8x16x3_t, align 16
 // CHECK:   [[COERCE_DIVE:%.*]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t* [[A]], i32 0, i32 0
@@ -142,7 +142,7 @@
   return vtbl4_s8(a, b);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl4_s8([4 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl4_s8([4 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #1 {
 // CHECK:   [[__P0_I:%.*]] = alloca %struct.int8x16x4_t, align 16
 // CHECK:   [[A:%.*]] = alloca %struct.int8x16x4_t, align 16
 // CHECK:   [[COERCE_DIVE:%.*]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t* [[A]], i32 0, i32 0
@@ -352,7 +352,7 @@
   return vqtbx1_s8(a, b, c);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbx2_s8(<8 x i8> noundef %a, [2 x <16 x i8>] %b.coerce, <8 x i8> noundef %c) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbx2_s8(<8 x i8> noundef %a, [2 x <16 x i8>] %b.coerce, <8 x i8> noundef %c) #1 {
 // CHECK:   [[__P1_I:%.*]] = alloca %struct.int8x16x2_t, align 16
 // CHECK:   [[B:%.*]] = alloca %struct.int8x16x2_t, align 16
 // CHECK:   [[COERCE_DIVE:%.*]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t* [[B]], i32 0, i32 0
@@ -373,7 +373,7 @@
   return vqt

[PATCH] D107141: [Inline-asm] Add diagnosts for unsupported inline assembly arguments

2022-03-20 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

Ping @jyu2


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107141/new/

https://reviews.llvm.org/D107141

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D122104: [X86][regcall] Support passing / returning structures

2022-03-21 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei updated this revision to Diff 416945.
pengfei added a comment.

Address Yuanke's comment.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122104/new/

https://reviews.llvm.org/D122104

Files:
  clang/include/clang/CodeGen/CGFunctionInfo.h
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGen/aarch64-neon-tbl.c
  clang/test/CodeGen/regcall2.c

Index: clang/test/CodeGen/regcall2.c
===
--- /dev/null
+++ clang/test/CodeGen/regcall2.c
@@ -0,0 +1,28 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang_cc1 -emit-llvm %s -o - -ffreestanding -target-feature +avx512vl -triple=x86_64-pc-win32 | FileCheck %s --check-prefix=Win
+// RUN: %clang_cc1 -emit-llvm %s -o - -ffreestanding -target-feature +avx512vl -triple=x86_64-pc-linux-gnu | FileCheck %s --check-prefix=Lin
+
+#include 
+
+typedef struct {
+  __m512d r1[4];
+  __m512 r2[4];
+} __sVector;
+__sVector A;
+
+__sVector __regcall foo(int a) {
+  return A;
+}
+
+double __regcall bar(__sVector a) {
+  return a.r1[0][4];
+}
+
+// FIXME: Do we need to change for Windows?
+// Win: define dso_local x86_regcallcc void @__regcall3__foo(%struct.__sVector* noalias sret(%struct.__sVector) align 64 %agg.result, i32 noundef %a) #0
+// Win: define dso_local x86_regcallcc double @__regcall3__bar(%struct.__sVector* noundef %a) #0
+// Win: attributes #0 = { noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="0" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+avx,+avx2,+avx512f,+avx512vl,+crc32,+cx8,+f16c,+fma,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" }
+
+// Lin: define dso_local x86_regcallcc %struct.__sVector @__regcall3__foo(i32 noundef %a) #0
+// Lin: define dso_local x86_regcallcc double @__regcall3__bar([4 x <8 x double>] %a.coerce0, [4 x <16 x float>] %a.coerce1) #0
+// Lin: attributes #0 = { noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="512" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+avx,+avx2,+avx512f,+avx512vl,+crc32,+cx8,+f16c,+fma,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" }
Index: clang/test/CodeGen/aarch64-neon-tbl.c
===
--- clang/test/CodeGen/aarch64-neon-tbl.c
+++ clang/test/CodeGen/aarch64-neon-tbl.c
@@ -42,7 +42,7 @@
   return vtbl2_s8(a, b);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl2_s8([2 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl2_s8([2 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #1 {
 // CHECK:   [[__P0_I:%.*]] = alloca %struct.int8x16x2_t, align 16
 // CHECK:   [[A:%.*]] = alloca %struct.int8x16x2_t, align 16
 // CHECK:   [[COERCE_DIVE:%.*]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t* [[A]], i32 0, i32 0
@@ -89,7 +89,7 @@
   return vtbl3_s8(a, b);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl3_s8([3 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl3_s8([3 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #1 {
 // CHECK:   [[__P0_I:%.*]] = alloca %struct.int8x16x3_t, align 16
 // CHECK:   [[A:%.*]] = alloca %struct.int8x16x3_t, align 16
 // CHECK:   [[COERCE_DIVE:%.*]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t* [[A]], i32 0, i32 0
@@ -142,7 +142,7 @@
   return vtbl4_s8(a, b);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl4_s8([4 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl4_s8([4 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #1 {
 // CHECK:   [[__P0_I:%.*]] = alloca %struct.int8x16x4_t, align 16
 // CHECK:   [[A:%.*]] = alloca %struct.int8x16x4_t, align 16
 // CHECK:   [[COERCE_DIVE:%.*]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t* [[A]], i32 0, i32 0
@@ -352,7 +352,7 @@
   return vqtbx1_s8(a, b, c);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbx2_s8(<8 x i8> noundef %a, [2 x <16 x i8>] %b.coerce, <8 x i8> noundef %c) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbx2_s8(<8 x i8> noundef %a, [2 x <16 x i8>] %b.coerce, <8 x i8> noundef %c) #1 {
 // CHECK:   [[__P1_I:%.*]] = alloca %struct.int8x16x2_t, align 16
 // CHECK:   [[B:%.*]] = alloca %struct.int8x16x2_t, align 16
 // CHECK:   [[COERCE_DIVE:%.*]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t* [[B]], i32 0, i32 0
@@ -373,7 +373,7 @@
   return vqtbx2_s8(a, b, c);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbx3_s8(<8 x i8> noundef %a, [3 x <16 x i8>] %b.coerce, <8 x i8> noundef %c) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbx3_s8(<8 x i8> noundef %a, [3 x <16 x i8>] %b.coerce, <8 x i8> noundef %c) #1 {

[PATCH] D122104: [X86][regcall] Support passing / returning structures

2022-03-21 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei updated this revision to Diff 417179.
pengfei marked 2 inline comments as done.
pengfei added a comment.
Herald added a subscriber: StephenFan.

Address review comments. Thanks Craig!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122104/new/

https://reviews.llvm.org/D122104

Files:
  clang/include/clang/CodeGen/CGFunctionInfo.h
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGen/aarch64-neon-tbl.c
  clang/test/CodeGen/regcall2.c

Index: clang/test/CodeGen/regcall2.c
===
--- /dev/null
+++ clang/test/CodeGen/regcall2.c
@@ -0,0 +1,28 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang_cc1 -emit-llvm %s -o - -ffreestanding -target-feature +avx512vl -triple=x86_64-pc-win32 | FileCheck %s --check-prefix=Win
+// RUN: %clang_cc1 -emit-llvm %s -o - -ffreestanding -target-feature +avx512vl -triple=x86_64-pc-linux-gnu | FileCheck %s --check-prefix=Lin
+
+#include 
+
+typedef struct {
+  __m512d r1[4];
+  __m512 r2[4];
+} __sVector;
+__sVector A;
+
+__sVector __regcall foo(int a) {
+  return A;
+}
+
+double __regcall bar(__sVector a) {
+  return a.r1[0][4];
+}
+
+// FIXME: Do we need to change for Windows?
+// Win: define dso_local x86_regcallcc void @__regcall3__foo(%struct.__sVector* noalias sret(%struct.__sVector) align 64 %agg.result, i32 noundef %a) #0
+// Win: define dso_local x86_regcallcc double @__regcall3__bar(%struct.__sVector* noundef %a) #0
+// Win: attributes #0 = { noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="0" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+avx,+avx2,+avx512f,+avx512vl,+crc32,+cx8,+f16c,+fma,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" }
+
+// Lin: define dso_local x86_regcallcc %struct.__sVector @__regcall3__foo(i32 noundef %a) #0
+// Lin: define dso_local x86_regcallcc double @__regcall3__bar([4 x <8 x double>] %a.coerce0, [4 x <16 x float>] %a.coerce1) #0
+// Lin: attributes #0 = { noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="512" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+avx,+avx2,+avx512f,+avx512vl,+crc32,+cx8,+f16c,+fma,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" }
Index: clang/test/CodeGen/aarch64-neon-tbl.c
===
--- clang/test/CodeGen/aarch64-neon-tbl.c
+++ clang/test/CodeGen/aarch64-neon-tbl.c
@@ -42,7 +42,7 @@
   return vtbl2_s8(a, b);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl2_s8([2 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl2_s8([2 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #1 {
 // CHECK:   [[__P0_I:%.*]] = alloca %struct.int8x16x2_t, align 16
 // CHECK:   [[A:%.*]] = alloca %struct.int8x16x2_t, align 16
 // CHECK:   [[COERCE_DIVE:%.*]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t* [[A]], i32 0, i32 0
@@ -89,7 +89,7 @@
   return vtbl3_s8(a, b);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl3_s8([3 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl3_s8([3 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #1 {
 // CHECK:   [[__P0_I:%.*]] = alloca %struct.int8x16x3_t, align 16
 // CHECK:   [[A:%.*]] = alloca %struct.int8x16x3_t, align 16
 // CHECK:   [[COERCE_DIVE:%.*]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t* [[A]], i32 0, i32 0
@@ -142,7 +142,7 @@
   return vtbl4_s8(a, b);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl4_s8([4 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl4_s8([4 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #1 {
 // CHECK:   [[__P0_I:%.*]] = alloca %struct.int8x16x4_t, align 16
 // CHECK:   [[A:%.*]] = alloca %struct.int8x16x4_t, align 16
 // CHECK:   [[COERCE_DIVE:%.*]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t* [[A]], i32 0, i32 0
@@ -352,7 +352,7 @@
   return vqtbx1_s8(a, b, c);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbx2_s8(<8 x i8> noundef %a, [2 x <16 x i8>] %b.coerce, <8 x i8> noundef %c) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbx2_s8(<8 x i8> noundef %a, [2 x <16 x i8>] %b.coerce, <8 x i8> noundef %c) #1 {
 // CHECK:   [[__P1_I:%.*]] = alloca %struct.int8x16x2_t, align 16
 // CHECK:   [[B:%.*]] = alloca %struct.int8x16x2_t, align 16
 // CHECK:   [[COERCE_DIVE:%.*]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t* [[B]], i32 0, i32 0
@@ -373,7 +373,7 @@
   return vqtbx2_s8(a, b, c);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbx3_s8(<8 x i8> noundef %a, [3 x <16 x i8>] %b.coerce, <8 x i8> noundef %c) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i

[PATCH] D122104: [X86][regcall] Support passing / returning structures

2022-03-21 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added inline comments.



Comment at: clang/include/clang/CodeGen/CGFunctionInfo.h:744
+  void setMaxVectorWidth(unsigned Width) {
+MaxVectorWidth = llvm::Log2_32(Width) + 1;
+  }

craig.topper wrote:
> Are you assuming Width is a power of 2? Should we assert that?
Good question! I assumed it, but I found it's not true for Clang, although ICC 
and GCC error for it. 
Maybe we should diagnose it too? Anyway, I added an assert for it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122104/new/

https://reviews.llvm.org/D122104

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107141: [Inline-asm] Add diagnosts for unsupported inline assembly arguments

2022-03-22 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei updated this revision to Diff 417303.
pengfei added a comment.

Address review comment.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107141/new/

https://reviews.llvm.org/D107141

Files:
  clang/include/clang/Basic/DiagnosticSemaKinds.td
  clang/lib/CodeGen/CGStmt.cpp
  clang/lib/Sema/SemaStmtAsm.cpp
  clang/test/Sema/asm.c

Index: clang/test/Sema/asm.c
===
--- clang/test/Sema/asm.c
+++ clang/test/Sema/asm.c
@@ -313,3 +313,54 @@
   asm ("jne %l0":::);
   asm goto ("jne %l0"lab);
 }
+
+typedef struct _st_size64 {
+  int a;
+  char b;
+} st_size64;
+
+typedef struct _st_size96 {
+  int a;
+  int b;
+  int c;
+} st_size96;
+
+typedef struct _st_size16 {
+  char a;
+  char b;
+} st_size16;
+
+typedef struct _st_size32 {
+  char a;
+  char b;
+  char c;
+  char d;
+} st_size32;
+
+typedef struct _st_size128 {
+  int a;
+  int b;
+  int c;
+  int d;
+} st_size128;
+
+void test19(long long x)
+{
+  st_size64 a;
+  st_size96 b;
+  st_size16 c;
+  st_size32 d;
+  st_size128 e;
+  asm ("" : "=rm" (a): "0" (1)); // no-error
+  asm ("" : "=rm" (d): "0" (1)); // no-error
+  asm ("" : "=rm" (c): "0" (x)); // no-error
+  // FIXME: This case is actually supported by codegen.
+  asm ("" : "=rm" (x): "0" (a)); // expected-error {{unsupported inline asm: input with type 'st_size64' (aka 'struct _st_size64') matching output with type 'long long'}}
+  // FIXME: This case is actually supported by codegen.
+  asm ("" : "=rm" (a): "0" (d)); // expected-error {{unsupported inline asm: input with type 'st_size32' (aka 'struct _st_size32') matching output with type 'st_size64' (aka 'struct _st_size64')}}
+  asm ("" : "=rm" (b): "0" (1)); // expected-error {{impossible constraint in asm: can't store value into a register}}
+  // FIXME: This case should be supported by codegen, but it fails now.
+  asm ("" : "=rm" (e): "0" (1)); // no-error
+  // FIXME: This case should be supported by codegen, but it fails now.
+  asm ("" : "=rm" (x): "0" (e)); // expected-error {{unsupported inline asm: input with type 'st_size128' (aka 'struct _st_size128') matching output with type 'long long'}}
+}
Index: clang/lib/Sema/SemaStmtAsm.cpp
===
--- clang/lib/Sema/SemaStmtAsm.cpp
+++ clang/lib/Sema/SemaStmtAsm.cpp
@@ -667,8 +667,17 @@
 // output was a register, just extend the shorter one to the size of the
 // larger one.
 if (!SmallerValueMentioned && InputDomain != AD_Other &&
-OutputConstraintInfos[TiedTo].allowsRegister())
+OutputConstraintInfos[TiedTo].allowsRegister()) {
+  // FIXME: GCC supports the OutSize to be 128 at maximum. Currently codegen
+  // crash when the size larger than the register size. So we limit it here.
+  if (OutTy->isStructureType() &&
+  Context.getIntTypeForBitwidth(OutSize, /*Signed*/ false).isNull()) {
+targetDiag(OutputExpr->getExprLoc(), diag::err_store_value_to_reg);
+return NS;
+  }
+
   continue;
+}
 
 // Either both of the operands were mentioned or the smaller one was
 // mentioned.  One more special case that we'll allow: if the tied input is
Index: clang/lib/CodeGen/CGStmt.cpp
===
--- clang/lib/CodeGen/CGStmt.cpp
+++ clang/lib/CodeGen/CGStmt.cpp
@@ -2725,9 +2725,8 @@
   QualType Ty = getContext().getIntTypeForBitwidth(Size, /*Signed*/ false);
   if (Ty.isNull()) {
 const Expr *OutExpr = S.getOutputExpr(i);
-CGM.Error(
-OutExpr->getExprLoc(),
-"impossible constraint in asm: can't store value into a register");
+CGM.getDiags().Report(OutExpr->getExprLoc(),
+  diag::err_store_value_to_reg);
 return;
   }
   Dest = MakeAddrLValue(A, Ty);
Index: clang/include/clang/Basic/DiagnosticSemaKinds.td
===
--- clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -8763,6 +8763,8 @@
 " in asm %select{input|output}1 with a memory constraint '%2'">;
   def err_asm_input_duplicate_match : Error<
 "more than one input constraint matches the same output '%0'">;
+  def err_store_value_to_reg : Error<
+"impossible constraint in asm: can't store value into a register">;
 
   def warn_asm_label_on_auto_decl : Warning<
 "ignored asm label '%0' on automatic variable">;
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107141: [Inline-asm] Add diagnosts for unsupported inline assembly arguments

2022-03-22 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added inline comments.



Comment at: clang/lib/Sema/SemaStmtAsm.cpp:622
+if (InTy->isIntegerType() || InTy->isPointerType() ||
+InTy->isStructureType() || InTy->isConstantArrayType())
   InputDomain = AD_Int;

jyu2 wrote:
> pengfei wrote:
> > jyu2 wrote:
> > > Are you sure you want to change the Input/output Domain?  Since you 
> > > changed this, could you add both codegen and sema check tests for  struct 
> > > type(you already has sema check for struct type, but I don't see any 
> > > array type) and array type.  
> > > 
> > > Thanks.
> > > Jennifer
> > The input / output domain are just for sema check here. I don't think it 
> > changes codegen behavior.
> Yes, for seam check, once error emit, compiler stop at CodeGen.  However, if 
> no error emit, compiler will go though the CodeGen.
> 
> Since pointer/struct and constantArray are allowed for small size of type, so 
> would you please add code gen test for that?
> 
> In your test where no-error should emit part...
> 
> asm ("" : "=rm" (a): "0" (1)); // no-error
> 
> Thanks.
> Jennifer
Thanks Jennifer! I see your point. I'll only add sema check (rather than loose 
in some way) in this patch.
The codegen part is complicated, some cases will pass but some still fail. I 
have left comments in the tests.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107141/new/

https://reviews.llvm.org/D107141

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107141: [Inline-asm] Add diagnosts for unsupported inline assembly arguments

2022-03-22 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

Thanks @jyu2 !


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107141/new/

https://reviews.llvm.org/D107141

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107141: [Inline-asm] Add diagnosts for unsupported inline assembly arguments

2022-03-22 Thread Phoebe Wang via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG32103608fc07: [Inline-asm] Add diagnosts for unsupported 
inline assembly arguments (authored by pengfei).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107141/new/

https://reviews.llvm.org/D107141

Files:
  clang/include/clang/Basic/DiagnosticSemaKinds.td
  clang/lib/CodeGen/CGStmt.cpp
  clang/lib/Sema/SemaStmtAsm.cpp
  clang/test/Sema/asm.c

Index: clang/test/Sema/asm.c
===
--- clang/test/Sema/asm.c
+++ clang/test/Sema/asm.c
@@ -313,3 +313,54 @@
   asm ("jne %l0":::);
   asm goto ("jne %l0"lab);
 }
+
+typedef struct _st_size64 {
+  int a;
+  char b;
+} st_size64;
+
+typedef struct _st_size96 {
+  int a;
+  int b;
+  int c;
+} st_size96;
+
+typedef struct _st_size16 {
+  char a;
+  char b;
+} st_size16;
+
+typedef struct _st_size32 {
+  char a;
+  char b;
+  char c;
+  char d;
+} st_size32;
+
+typedef struct _st_size128 {
+  int a;
+  int b;
+  int c;
+  int d;
+} st_size128;
+
+void test19(long long x)
+{
+  st_size64 a;
+  st_size96 b;
+  st_size16 c;
+  st_size32 d;
+  st_size128 e;
+  asm ("" : "=rm" (a): "0" (1)); // no-error
+  asm ("" : "=rm" (d): "0" (1)); // no-error
+  asm ("" : "=rm" (c): "0" (x)); // no-error
+  // FIXME: This case is actually supported by codegen.
+  asm ("" : "=rm" (x): "0" (a)); // expected-error {{unsupported inline asm: input with type 'st_size64' (aka 'struct _st_size64') matching output with type 'long long'}}
+  // FIXME: This case is actually supported by codegen.
+  asm ("" : "=rm" (a): "0" (d)); // expected-error {{unsupported inline asm: input with type 'st_size32' (aka 'struct _st_size32') matching output with type 'st_size64' (aka 'struct _st_size64')}}
+  asm ("" : "=rm" (b): "0" (1)); // expected-error {{impossible constraint in asm: can't store value into a register}}
+  // FIXME: This case should be supported by codegen, but it fails now.
+  asm ("" : "=rm" (e): "0" (1)); // no-error
+  // FIXME: This case should be supported by codegen, but it fails now.
+  asm ("" : "=rm" (x): "0" (e)); // expected-error {{unsupported inline asm: input with type 'st_size128' (aka 'struct _st_size128') matching output with type 'long long'}}
+}
Index: clang/lib/Sema/SemaStmtAsm.cpp
===
--- clang/lib/Sema/SemaStmtAsm.cpp
+++ clang/lib/Sema/SemaStmtAsm.cpp
@@ -667,8 +667,17 @@
 // output was a register, just extend the shorter one to the size of the
 // larger one.
 if (!SmallerValueMentioned && InputDomain != AD_Other &&
-OutputConstraintInfos[TiedTo].allowsRegister())
+OutputConstraintInfos[TiedTo].allowsRegister()) {
+  // FIXME: GCC supports the OutSize to be 128 at maximum. Currently codegen
+  // crash when the size larger than the register size. So we limit it here.
+  if (OutTy->isStructureType() &&
+  Context.getIntTypeForBitwidth(OutSize, /*Signed*/ false).isNull()) {
+targetDiag(OutputExpr->getExprLoc(), diag::err_store_value_to_reg);
+return NS;
+  }
+
   continue;
+}
 
 // Either both of the operands were mentioned or the smaller one was
 // mentioned.  One more special case that we'll allow: if the tied input is
Index: clang/lib/CodeGen/CGStmt.cpp
===
--- clang/lib/CodeGen/CGStmt.cpp
+++ clang/lib/CodeGen/CGStmt.cpp
@@ -2744,9 +2744,8 @@
   QualType Ty = getContext().getIntTypeForBitwidth(Size, /*Signed*/ false);
   if (Ty.isNull()) {
 const Expr *OutExpr = S.getOutputExpr(i);
-CGM.Error(
-OutExpr->getExprLoc(),
-"impossible constraint in asm: can't store value into a register");
+CGM.getDiags().Report(OutExpr->getExprLoc(),
+  diag::err_store_value_to_reg);
 return;
   }
   Dest = MakeAddrLValue(A, Ty);
Index: clang/include/clang/Basic/DiagnosticSemaKinds.td
===
--- clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -8783,6 +8783,8 @@
 " in asm %select{input|output}1 with a memory constraint '%2'">;
   def err_asm_input_duplicate_match : Error<
 "more than one input constraint matches the same output '%0'">;
+  def err_store_value_to_reg : Error<
+"impossible constraint in asm: can't store value into a register">;
 
   def warn_asm_label_on_auto_decl : Warning<
 "ignored asm label '%0' on automatic variable">;
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D122104: [X86][regcall] Support passing / returning structures

2022-03-22 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei updated this revision to Diff 417502.
pengfei added a comment.

clang-formatted.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122104/new/

https://reviews.llvm.org/D122104

Files:
  clang/include/clang/CodeGen/CGFunctionInfo.h
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGen/aarch64-neon-tbl.c
  clang/test/CodeGen/regcall2.c

Index: clang/test/CodeGen/regcall2.c
===
--- /dev/null
+++ clang/test/CodeGen/regcall2.c
@@ -0,0 +1,28 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang_cc1 -emit-llvm %s -o - -ffreestanding -target-feature +avx512vl -triple=x86_64-pc-win32 | FileCheck %s --check-prefix=Win
+// RUN: %clang_cc1 -emit-llvm %s -o - -ffreestanding -target-feature +avx512vl -triple=x86_64-pc-linux-gnu | FileCheck %s --check-prefix=Lin
+
+#include 
+
+typedef struct {
+  __m512d r1[4];
+  __m512 r2[4];
+} __sVector;
+__sVector A;
+
+__sVector __regcall foo(int a) {
+  return A;
+}
+
+double __regcall bar(__sVector a) {
+  return a.r1[0][4];
+}
+
+// FIXME: Do we need to change for Windows?
+// Win: define dso_local x86_regcallcc void @__regcall3__foo(%struct.__sVector* noalias sret(%struct.__sVector) align 64 %agg.result, i32 noundef %a) #0
+// Win: define dso_local x86_regcallcc double @__regcall3__bar(%struct.__sVector* noundef %a) #0
+// Win: attributes #0 = { noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="0" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+avx,+avx2,+avx512f,+avx512vl,+crc32,+cx8,+f16c,+fma,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" }
+
+// Lin: define dso_local x86_regcallcc %struct.__sVector @__regcall3__foo(i32 noundef %a) #0
+// Lin: define dso_local x86_regcallcc double @__regcall3__bar([4 x <8 x double>] %a.coerce0, [4 x <16 x float>] %a.coerce1) #0
+// Lin: attributes #0 = { noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="512" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+avx,+avx2,+avx512f,+avx512vl,+crc32,+cx8,+f16c,+fma,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" }
Index: clang/test/CodeGen/aarch64-neon-tbl.c
===
--- clang/test/CodeGen/aarch64-neon-tbl.c
+++ clang/test/CodeGen/aarch64-neon-tbl.c
@@ -42,7 +42,7 @@
   return vtbl2_s8(a, b);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl2_s8([2 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl2_s8([2 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #1 {
 // CHECK:   [[__P0_I:%.*]] = alloca %struct.int8x16x2_t, align 16
 // CHECK:   [[A:%.*]] = alloca %struct.int8x16x2_t, align 16
 // CHECK:   [[COERCE_DIVE:%.*]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t* [[A]], i32 0, i32 0
@@ -89,7 +89,7 @@
   return vtbl3_s8(a, b);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl3_s8([3 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl3_s8([3 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #1 {
 // CHECK:   [[__P0_I:%.*]] = alloca %struct.int8x16x3_t, align 16
 // CHECK:   [[A:%.*]] = alloca %struct.int8x16x3_t, align 16
 // CHECK:   [[COERCE_DIVE:%.*]] = getelementptr inbounds %struct.int8x16x3_t, %struct.int8x16x3_t* [[A]], i32 0, i32 0
@@ -142,7 +142,7 @@
   return vtbl4_s8(a, b);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl4_s8([4 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbl4_s8([4 x <16 x i8>] %a.coerce, <8 x i8> noundef %b) #1 {
 // CHECK:   [[__P0_I:%.*]] = alloca %struct.int8x16x4_t, align 16
 // CHECK:   [[A:%.*]] = alloca %struct.int8x16x4_t, align 16
 // CHECK:   [[COERCE_DIVE:%.*]] = getelementptr inbounds %struct.int8x16x4_t, %struct.int8x16x4_t* [[A]], i32 0, i32 0
@@ -352,7 +352,7 @@
   return vqtbx1_s8(a, b, c);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbx2_s8(<8 x i8> noundef %a, [2 x <16 x i8>] %b.coerce, <8 x i8> noundef %c) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbx2_s8(<8 x i8> noundef %a, [2 x <16 x i8>] %b.coerce, <8 x i8> noundef %c) #1 {
 // CHECK:   [[__P1_I:%.*]] = alloca %struct.int8x16x2_t, align 16
 // CHECK:   [[B:%.*]] = alloca %struct.int8x16x2_t, align 16
 // CHECK:   [[COERCE_DIVE:%.*]] = getelementptr inbounds %struct.int8x16x2_t, %struct.int8x16x2_t* [[B]], i32 0, i32 0
@@ -373,7 +373,7 @@
   return vqtbx2_s8(a, b, c);
 }
 
-// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbx3_s8(<8 x i8> noundef %a, [3 x <16 x i8>] %b.coerce, <8 x i8> noundef %c) #0 {
+// CHECK-LABEL: define{{.*}} <8 x i8> @test_vqtbx3_s8(<8 x i8> noundef %a, [3 x <16 x i8>] %b.coerce, <8 x i8> noundef %c) #1 {
 // CHECK:

[PATCH] D118052: [X86] Fix CodeGen Module Flag for -mibt-seal

2022-03-23 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei accepted this revision.
pengfei added a comment.

LGTM.




Comment at: clang/test/CodeGen/X86/x86-cf-protection.c:6
+// RUN: %clang -target i386-unknown-unknown -o - -emit-llvm -S 
-fcf-protection=branch -flto %s | FileCheck %s --check-prefix=NOIBTSEAL
+// RUN: %clang -target i386-unknown-unknown -o - -emit-llvm -S 
-fcf-protection=branch -mibt-seal %s | FileCheck %s --check-prefix=NOLTO
 

I think we can use `NOIBTSEAL` here too.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118052/new/

https://reviews.llvm.org/D118052

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D122789: [compiler-rt] [scudo] Use -mcrc32 on x86 when available

2022-04-12 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a subscriber: hjl.tools.
pengfei added a comment.

In D122789#3446865 , @MaskRay wrote:

> To kurly (original Gentoo reporter):
>
>   printf '#include \n#include \nuint32_t 
> computeHardwareCRC32(uint32_t Crc, uint32_t Data) { return _mm_crc32_u32(Crc, 
> Data); }' > a.c
>
>
>
>   % clang -m32 -march=i686 -msse4.2 -c a.c  # seems to compile fine
>   % gcc -m32 -march=i686 -msse4.2 -c a.c
>   % gcc -m32 -march=i686 -mcrc32 -c a.c
>   In file included from a.c:1:
>   a.c: In function ‘computeHardwareCRC32’:
>   /usr/lib/gcc/x86_64-linux-gnu/11/include/smmintrin.h:839:1: error: inlining 
> failed in call to ‘always_inline’ ‘_mm_crc32_u32’: target specific option 
> mismatch
> 839 | _mm_crc32_u32 (unsigned int __C, unsigned int __V)
> | ^
>   a.c:3:69: note: called from here
>   3 | uint32_t computeHardwareCRC32(uint32_t Crc, uint32_t Data) { return 
> _mm_crc32_u32(Crc, Data); }
> | 
> ^~~~
>
> I have some older GCC and latest GCC (2022-04, multilib), `gcc -m32 
> -march=i686 -msse4.2 -c a.c`  builds while `-mcrc32` doesn't.
>
> I suspect we should revert the `-mcrc32` change. The `__CRC32__` macro may be 
> fine.

Thanks for the infromation. I got the same result. That's weird. I believe at 
some point, GCC supported `-mcrc32`. @hjl.tools, did GCC intentionally remove 
the support for `-mcrc32`?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122789/new/

https://reviews.llvm.org/D122789

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D105263: [X86] AVX512FP16 instructions enabling 1/6

2022-04-13 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

Thanks @vtjnash for the information! Comments on 
https://github.com/JuliaLang/julia/issues/44829


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105263/new/

https://reviews.llvm.org/D105263

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D123498: [clang] Adding Platform/Architecture Specific Resource Header Installation Targets

2022-04-13 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei accepted this revision.
pengfei added a comment.

LGTM for X86.




Comment at: clang/lib/Headers/CMakeLists.txt:88
+
+set(x86_files
+# Intrinsics

Verified the list is correct to X86. Nit: should make them in alphabetical 
order?



Comment at: clang/lib/Headers/CMakeLists.txt:194
+set(utility_files
+  mm_malloc.h
+)

Is it only used by X86 for now?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123498/new/

https://reviews.llvm.org/D123498

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D122789: [compiler-rt] [scudo] Use -mcrc32 on x86 when available

2022-04-14 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

GCC supports "-mcrc32", but seems only for built-in functions: 
https://godbolt.org/z/veeGMoY11

https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html#x86-Options


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122789/new/

https://reviews.llvm.org/D122789

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D123498: [clang] Adding Platform/Architecture Specific Resource Header Installation Targets

2022-04-19 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

> Maybe I can ask cmake to check for architecture/targets during configuration 
> and select the headers automatically, but that is beyond the scope of this 
> patch.

I'm not familar with cmake, but I guess it might be doable. I once verified the 
X86 headers by command `echo '#include ' | clang -x c -E - | grep 
'#.*clang.*h' | grep -o '[^\/]*\.h' |sort|uniq`.
Notice, targets like X86 doesn't have a single entry. But it still available 
with a bit more work.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123498/new/

https://reviews.llvm.org/D123498

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D110869: [X86] Implement -fzero-call-used-regs option

2022-02-07 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

> So xorl %ecx, %edx doesn't zero out all 64-bits of %rcx and %rdx? That's two 
> 32-bit writes to two different registers, isn't it?

`xorl %ecx, %edx` only zero out bit 63:32 of `rdx`.

1. There's only 1 write to register in the instruction, i.e. `%edx`;
2. As a src, none bit of `%ecx` will be changed after the instruction;
3. `xorl` is not a zeroing instruction, bit 31:0 of `rdx` happens to zero only 
if `%ecx` == `%edx `;

So the values after the instuction are:

  RCX = RCX_old
  RDX[63:32] = 0
  RDX[31:0] = RCX_old[31:0] ^ RDX_old[31:0]


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110869/new/

https://reviews.llvm.org/D110869

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D84225: [CFE] Add nomerge function attribute to inline assembly.

2022-02-08 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

In D84225#3302142 , @rnk wrote:

> I think LLVM already doesn't do some tail merging optimizations on inline 
> asm, but allowing the use of the attribute is more principled, and will block 
> more optimizations (CSE).

IIRC, the initial requirment is to avoid the CSE like optimizations. We usually 
use inline asm for sepcial proposes. We have to stop the merge some time.




Comment at: clang/lib/Sema/SemaStmtAttr.cpp:186
   void VisitCallExpr(const CallExpr *E) { FoundCallExpr = true; }
+  void VisitAsmStmt(const AsmStmt *S) { FoundCallExpr = true; }
 

xbolva00 wrote:
> This is totally wrong, just big hack to smuggle it here.
Could you explain more? Is there any unexpect sideeffect by this?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84225/new/

https://reviews.llvm.org/D84225

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D84225: [CFE] Add nomerge function attribute to inline assembly.

2022-02-08 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

In D84225#3303821 , @lebedev.ri wrote:

> In D84225#3303771 , @pengfei wrote:
>
>> In D84225#3302142 , @rnk wrote:
>>
>>> I think LLVM already doesn't do some tail merging optimizations on inline 
>>> asm, but allowing the use of the attribute is more principled, and will 
>>> block more optimizations (CSE).
>>
>> IIRC, the initial requirment is to avoid the CSE like optimizations. We 
>> usually use inline asm for sepcial proposes. We have to stop the merge some 
>> time.
>
> Since the big hammer (`nomerge`) is already there i suppose this is fine,
> but given that there is little context in the original description,
> the wording makes it seem like it's being used to workaround
> something that may or may not be a bug in the first place.
>
> There isn't anything inherently wrong with merging inlineasm in general,
> if that does not break the constraints, especially since
> there's already a `sideeffect` keyword possible on the inlineasm.

It's not a workaround. We do need to avoid the merging sometime. For example, 
given we have 2 branches begin with inline asm of `endbr`. We have to use 
`nomerge` to stop them been merged out of the branches. `sideeffect` doesn't 
help with that.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84225/new/

https://reviews.llvm.org/D84225

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D84225: [CFE] Add nomerge function attribute to inline assembly.

2022-02-08 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

In D84225#3305140 , @rnk wrote:

> In D84225#3304189 , @pengfei wrote:
>
>> It's not a workaround. We do need to avoid the merging sometime. For 
>> example, given we have 2 branches begin with inline asm of `endbr`. We have 
>> to use `nomerge` to stop them been merged out of the branches. `sideeffect` 
>> doesn't help with that.
>
> That doesn't sound sufficient to ensure that `endbr` will be the first 
> instruction in that basic block, which I'm guessing is a requirement. PHI 
> nodes might cause register copies / spills to appear before `endbr`, and 
> instrumentation passes typically insert code at the top of basic blocks. It 
> sounds like we might need a more complete solution for tracking indirect 
> branch target blocks. Maybe `indirectbr` and basic block addresses already 
> feed into this, but I'm out of my depth here.
>
> Anyway, I don't want to make a value judgment here. I'm in favor of this 
> change. We should allow users to apply `nomerge` to inline asm, whether it is 
> a workaround or not.

Thanks @rnk . Yes, so we emit `endbr` in a backend pass rather than this way. I 
just want to demonstrate why we can't merge inline asm sometime. I don't have a 
better example at a short time :)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84225/new/

https://reviews.llvm.org/D84225

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D115441: [X86][MS] Add 80bit long double support for Windows

2022-02-13 Thread Phoebe Wang via Phabricator via cfe-commits

This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG3e19ba36fca9: [X86][MS] Add 80bit long double support for 
Windows (authored by pengfei).

Changed prior to commit:
  https://reviews.llvm.org/D115441?vs=395129&id=408328#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115441/new/

https://reviews.llvm.org/D115441

Files:
  clang/lib/Basic/TargetInfo.cpp
  clang/lib/Frontend/CompilerInvocation.cpp
  clang/test/CodeGen/X86/long-double-config-size.c


Index: clang/test/CodeGen/X86/long-double-config-size.c
===
--- /dev/null
+++ clang/test/CodeGen/X86/long-double-config-size.c
@@ -0,0 +1,22 @@
+// RUN: %clang_cc1 -triple x86_64-windows-msvc %s -emit-llvm -mlong-double-64 
-o - | FileCheck %s --check-prefix=SIZE64
+// RUN: %clang_cc1 -triple i386-windows-msvc %s -emit-llvm -mlong-double-80 -o 
- | FileCheck %s --check-prefix=SIZE80
+// RUN: %clang_cc1 -triple x86_64-windows-msvc %s -emit-llvm -mlong-double-80 
-o - | FileCheck %s --check-prefix=SIZE80
+// RUN: %clang_cc1 -triple x86_64-windows-msvc %s -emit-llvm -mlong-double-128 
-o - | FileCheck %s --check-prefix=SIZE128
+// RUN: %clang_cc1 -triple x86_64-windows-msvc %s -emit-llvm -o - | FileCheck 
%s --check-prefix=SIZE64
+
+long double global;
+// SIZE64: @global = dso_local global double 0
+// SIZE80: @global = dso_local global x86_fp80 0xK{{0+}}, align 16
+// SIZE128: @global = dso_local global fp128 0
+
+long double func(long double param) {
+  // SIZE64: define dso_local double @func(double noundef %param)
+  // SIZE80: define dso_local x86_fp80 @func(x86_fp80 noundef %param)
+  // SIZE128: define dso_local fp128  @func(fp128 noundef %param)
+  long double local = param;
+  // SIZE64: alloca double
+  // SIZE80: alloca x86_fp80, align 16
+  // SIZE128: alloca fp128
+  local = param;
+  return local + param;
+}
Index: clang/lib/Frontend/CompilerInvocation.cpp
===
--- clang/lib/Frontend/CompilerInvocation.cpp
+++ clang/lib/Frontend/CompilerInvocation.cpp
@@ -3456,6 +3456,8 @@
 GenerateArg(Args, OPT_mlong_double_128, SA);
   else if (Opts.LongDoubleSize == 64)
 GenerateArg(Args, OPT_mlong_double_64, SA);
+  else if (Opts.LongDoubleSize == 80)
+GenerateArg(Args, OPT_mlong_double_80, SA);
 
   // Not generating '-mrtd', it's just an alias for '-fdefault-calling-conv='.
 
@@ -3838,9 +3840,16 @@
   Opts.NoBuiltin = Args.hasArg(OPT_fno_builtin) || Opts.Freestanding;
   if (!Opts.NoBuiltin)
 getAllNoBuiltinFuncValues(Args, Opts.NoBuiltinFuncs);
-  Opts.LongDoubleSize = Args.hasArg(OPT_mlong_double_128)
-? 128
-: Args.hasArg(OPT_mlong_double_64) ? 64 : 0;
+  if (Arg *A = Args.getLastArg(options::OPT_LongDouble_Group)) {
+if (A->getOption().matches(options::OPT_mlong_double_64))
+  Opts.LongDoubleSize = 64;
+else if (A->getOption().matches(options::OPT_mlong_double_80))
+  Opts.LongDoubleSize = 80;
+else if (A->getOption().matches(options::OPT_mlong_double_128))
+  Opts.LongDoubleSize = 128;
+else
+  Opts.LongDoubleSize = 0;
+  }
   if (Opts.FastRelaxedMath)
 Opts.setDefaultFPContractMode(LangOptions::FPM_Fast);
   llvm::sort(Opts.ModuleFeatures);
Index: clang/lib/Basic/TargetInfo.cpp
===
--- clang/lib/Basic/TargetInfo.cpp
+++ clang/lib/Basic/TargetInfo.cpp
@@ -449,6 +449,20 @@
 } else if (Opts.LongDoubleSize == 128) {
   LongDoubleWidth = LongDoubleAlign = 128;
   LongDoubleFormat = &llvm::APFloat::IEEEquad();
+} else if (Opts.LongDoubleSize == 80) {
+  LongDoubleFormat = &llvm::APFloat::x87DoubleExtended();
+  if (getTriple().isWindowsMSVCEnvironment()) {
+LongDoubleWidth = 128;
+LongDoubleAlign = 128;
+  } else { // Linux
+if (getTriple().getArch() == llvm::Triple::x86) {
+  LongDoubleWidth = 96;
+  LongDoubleAlign = 32;
+} else {
+  LongDoubleWidth = 128;
+  LongDoubleAlign = 128;
+}
+  }
 }
   }
 


Index: clang/test/CodeGen/X86/long-double-config-size.c
===
--- /dev/null
+++ clang/test/CodeGen/X86/long-double-config-size.c
@@ -0,0 +1,22 @@
+// RUN: %clang_cc1 -triple x86_64-windows-msvc %s -emit-llvm -mlong-double-64 -o - | FileCheck %s --check-prefix=SIZE64
+// RUN: %clang_cc1 -triple i386-windows-msvc %s -emit-llvm -mlong-double-80 -o - | FileCheck %s --check-prefix=SIZE80
+// RUN: %clang_cc1 -triple x86_64-windows-msvc %s -emit-llvm -mlong-double-80 -o - | FileCheck %s --check-prefix=SIZE80
+// RUN: %clang_cc1 -triple x86_64-windows-msvc %s -emit-llvm -mlong-double-128 -o - | FileCheck %s --check-prefix=SIZE128
+// RUN: %clang_cc1 -triple x86_64

[PATCH] D84225: [CFE] Add nomerge function attribute to inline assembly.

2022-02-15 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added inline comments.



Comment at: clang/lib/Sema/SemaStmtAttr.cpp:186
   void VisitCallExpr(const CallExpr *E) { FoundCallExpr = true; }
+  void VisitAsmStmt(const AsmStmt *S) { FoundCallExpr = true; }
 

aaron.ballman wrote:
> xbolva00 wrote:
> > pengfei wrote:
> > > xbolva00 wrote:
> > > > This is totally wrong, just big hack to smuggle it here.
> > > Could you explain more? Is there any unexpect sideeffect by this?
> > It looks unfortunate to have something like AsmStmt in "CallExprFinder" 
> > with CallExpr as reference to clang's CallExpr.
> > 
> > Kinda surprised that your list of reviewers missed ALL known clang 
> > developers/code owner, in this case especially @aaron.ballman .
> Yeah, I would have expected that something named `CallExprFinder` would only 
> find call expressions, not use of inline assembly. The class now seems to be 
> misnamed and that may be surprising to users. This is then being built on top 
> of by things like https://reviews.llvm.org/D119061.
> 
> I'm not certain what a reasonable name for the class is given that we now 
> want to use it for different purposes.
Thanks @xbolva00 and @aaron.ballman for the input!
I added it to suppress the diagnosis and it's OK since it's the only use of the 
class at that time. I'm fine with the change on D119061.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84225/new/

https://reviews.llvm.org/D84225

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D124435: [X86] Always extend the integer parameters in callee

2022-04-26 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added inline comments.



Comment at: clang/docs/ClangCommandLineReference.rst:2988-2992
+.. option:: -mconservative-extend
+Always extend the integer parameter both in the callee and caller.
+
+.. option:: -mno-conservative-extend
+Keep the original integer parameter passing behavior.

Combine like others?



Comment at: clang/include/clang/CodeGen/CGFunctionInfo.h:202-205
+if (Ty->hasSignedIntegerRepresentation())
+  AI.setSignExt(true);
+else
+  AI.setSignExt(false);

`AI.setSignExt(Ty->hasSignedIntegerRepresentation())` for short?
Or we can remove the else block since `SignExt` is initialized to `false`?



Comment at: clang/include/clang/CodeGen/CGFunctionInfo.h:333
   bool canHaveCoerceToType() const {
-return isDirect() || isExtend() || isCoerceAndExpand();
+return isDirect() || isExtend() || isCoerceAndExpand() ||
+   isConservativeExtend();

Can we move it to `isExtend`? e.g. `TheKind == Expand | TheKind == 
ConservativeExtend`



Comment at: clang/lib/CodeGen/CGCall.cpp:2451
+  // attribute to the callee.
+  if (AttrOnCallSite || AI.getKind() == ABIArgInfo::Extend) {
+if (AI.isSignExt())

Does the change affect Windows? Seems Win64 doesn't extend on caller. 
https://godbolt.org/z/c95hvvsWf



Comment at: clang/test/CodeGen/X86/integer_argument_passing.c:2
+// RUN: %clang_cc1 -O2 -triple -x86_64-linux-gnu %s -emit-llvm -o - | 
FileCheck %s --check-prefixes=EXTEND,CHECK
+// RUN: %clang_cc1 -O2 -triple -i386-linux-gnu %s -emit-llvm -o - | FileCheck 
%s --check-prefixes=EXTEND,CHECK
+// RUN: %clang_cc1 -O2 -triple -i386-pc-win32 %s -emit-llvm -o - | FileCheck 
%s --check-prefixes=EXTEND,CHECK

Maybe we can remove the tests for i386 given it's only for 64 bits ABI?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124435/new/

https://reviews.llvm.org/D124435

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D124435: [X86] Always extend the integer parameters in callee

2022-04-26 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added inline comments.



Comment at: clang/lib/CodeGen/CGCall.cpp:2451
+  // attribute to the callee.
+  if (AttrOnCallSite || AI.getKind() == ABIArgInfo::Extend) {
+if (AI.isSignExt())

LiuChen3 wrote:
> pengfei wrote:
> > Does the change affect Windows? Seems Win64 doesn't extend on caller. 
> > https://godbolt.org/z/c95hvvsWf
> No.  This patch didn't nothing for Win64 ABI.
But I found some Windows tests are affected?



Comment at: clang/test/CodeGen/X86/integer_argument_passing.c:2
+// RUN: %clang_cc1 -O2 -triple -x86_64-linux-gnu %s -emit-llvm -o - | 
FileCheck %s --check-prefixes=EXTEND,CHECK
+// RUN: %clang_cc1 -O2 -triple -i386-linux-gnu %s -emit-llvm -o - | FileCheck 
%s --check-prefixes=EXTEND,CHECK
+// RUN: %clang_cc1 -O2 -triple -i386-pc-win32 %s -emit-llvm -o - | FileCheck 
%s --check-prefixes=EXTEND,CHECK

LiuChen3 wrote:
> pengfei wrote:
> > Maybe we can remove the tests for i386 given it's only for 64 bits ABI?
> According to the meaning of `ConservativeExtend`, I think the 32bit ABI needs 
> to be modified as well:
> https://godbolt.org/z/W1Ma1T3f3
> The dump of currently clang-cl:
> ```
> _square:
> movb4(%esp), %al
> mulb%al
> mulb8(%esp)
> retl
> 
> .def_baz;
> .scl2;
> .type   32;
> .endef
> .section.text,"xr",one_only,_baz
> .globl  _baz
> .p2align4, 0x90
> _baz:
> movswl  4(%esp), %eax
> pushl   %eax
> calll   _bar
> addl$4, %esp
> retl
> ```
> Of course with this patch the behavior of clang-cl is still different from 
> cl.exe, but I think it fits the meaning of `ConservativeExtend`.
My point was, i386 is passing arguments by stack. The extensions don't make 
sense under the circumstances. That's what I understood the comments in above 
test 2007-06-18-SextAttrAggregate.c


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124435/new/

https://reviews.llvm.org/D124435

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D124435: [X86] Always extend the integer parameters in callee

2022-04-26 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added inline comments.



Comment at: clang/test/CodeGen/X86/integer_argument_passing.c:2
+// RUN: %clang_cc1 -O2 -triple -x86_64-linux-gnu %s -emit-llvm -o - | 
FileCheck %s --check-prefixes=EXTEND,CHECK
+// RUN: %clang_cc1 -O2 -triple -i386-linux-gnu %s -emit-llvm -o - | FileCheck 
%s --check-prefixes=EXTEND,CHECK
+// RUN: %clang_cc1 -O2 -triple -i386-pc-win32 %s -emit-llvm -o - | FileCheck 
%s --check-prefixes=EXTEND,CHECK

pengfei wrote:
> LiuChen3 wrote:
> > pengfei wrote:
> > > Maybe we can remove the tests for i386 given it's only for 64 bits ABI?
> > According to the meaning of `ConservativeExtend`, I think the 32bit ABI 
> > needs to be modified as well:
> > https://godbolt.org/z/W1Ma1T3f3
> > The dump of currently clang-cl:
> > ```
> > _square:
> > movb4(%esp), %al
> > mulb%al
> > mulb8(%esp)
> > retl
> > 
> > .def_baz;
> > .scl2;
> > .type   32;
> > .endef
> > .section.text,"xr",one_only,_baz
> > .globl  _baz
> > .p2align4, 0x90
> > _baz:
> > movswl  4(%esp), %eax
> > pushl   %eax
> > calll   _bar
> > addl$4, %esp
> > retl
> > ```
> > Of course with this patch the behavior of clang-cl is still different from 
> > cl.exe, but I think it fits the meaning of `ConservativeExtend`.
> My point was, i386 is passing arguments by stack. The extensions don't make 
> sense under the circumstances. That's what I understood the comments in above 
> test 2007-06-18-SextAttrAggregate.c
Oh, seems I misunderstood it. The stack still needs extensions since it's 
aligned to 4 bytes. But from the above output, the clang-cl is wrong, because 
it extends on caller which MSVC extends on callee. So back the another 
question, we should change for Windows too, right?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124435/new/

https://reviews.llvm.org/D124435

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D118052: [X86] Fix CodeGen Module Flag for -mibt-seal

2022-04-29 Thread Phoebe Wang via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rGdb1cec371c00: [X86] Fix CodeGen Module Flag for -mibt-seal 
(authored by joaomoreira, committed by pengfei).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118052/new/

https://reviews.llvm.org/D118052

Files:
  clang/lib/Frontend/CompilerInvocation.cpp
  clang/test/CodeGen/X86/x86-cf-protection.c


Index: clang/test/CodeGen/X86/x86-cf-protection.c
===
--- clang/test/CodeGen/X86/x86-cf-protection.c
+++ clang/test/CodeGen/X86/x86-cf-protection.c
@@ -1,8 +1,14 @@
 // RUN: %clang -target i386-unknown-unknown -x c -E -dM -o - 
-fcf-protection=return %s | FileCheck %s --check-prefix=RETURN
 // RUN: %clang -target i386-unknown-unknown -x c -E -dM -o - 
-fcf-protection=branch %s | FileCheck %s --check-prefix=BRANCH
 // RUN: %clang -target i386-unknown-unknown -x c -E -dM -o - 
-fcf-protection=full %s   | FileCheck %s --check-prefix=FULL
+// RUN: %clang -target i386-unknown-unknown -o - -emit-llvm -S 
-fcf-protection=branch -mibt-seal -flto %s | FileCheck %s 
--check-prefixes=CFPROT,IBTSEAL
+// RUN: %clang -target i386-unknown-unknown -o - -emit-llvm -S 
-fcf-protection=branch -flto %s | FileCheck %s --check-prefixes=CFPROT,NOIBTSEAL
+// RUN: %clang -target i386-unknown-unknown -o - -emit-llvm -S 
-fcf-protection=branch -mibt-seal %s | FileCheck %s 
--check-prefixes=CFPROT,NOIBTSEAL
 
 // RETURN: #define __CET__ 2
 // BRANCH: #define __CET__ 1
 // FULL: #define __CET__ 3
+// CFPROT: "cf-protection-branch", i32 1
+// IBTSEAL: "ibt-seal", i32 1
+// NOIBTSEAL-NOT: "ibt-seal", i32 1
 void foo() {}
Index: clang/lib/Frontend/CompilerInvocation.cpp
===
--- clang/lib/Frontend/CompilerInvocation.cpp
+++ clang/lib/Frontend/CompilerInvocation.cpp
@@ -1504,6 +1504,9 @@
   else if (Opts.CFProtectionBranch)
 GenerateArg(Args, OPT_fcf_protection_EQ, "branch", SA);
 
+  if (Opts.IBTSeal)
+GenerateArg(Args, OPT_mibt_seal, SA);
+
   for (const auto &F : Opts.LinkBitcodeFiles) {
 bool Builtint = F.LinkFlags == llvm::Linker::Flags::LinkOnlyNeeded &&
 F.PropagateAttrs && F.Internalize;


Index: clang/test/CodeGen/X86/x86-cf-protection.c
===
--- clang/test/CodeGen/X86/x86-cf-protection.c
+++ clang/test/CodeGen/X86/x86-cf-protection.c
@@ -1,8 +1,14 @@
 // RUN: %clang -target i386-unknown-unknown -x c -E -dM -o - -fcf-protection=return %s | FileCheck %s --check-prefix=RETURN
 // RUN: %clang -target i386-unknown-unknown -x c -E -dM -o - -fcf-protection=branch %s | FileCheck %s --check-prefix=BRANCH
 // RUN: %clang -target i386-unknown-unknown -x c -E -dM -o - -fcf-protection=full %s   | FileCheck %s --check-prefix=FULL
+// RUN: %clang -target i386-unknown-unknown -o - -emit-llvm -S -fcf-protection=branch -mibt-seal -flto %s | FileCheck %s --check-prefixes=CFPROT,IBTSEAL
+// RUN: %clang -target i386-unknown-unknown -o - -emit-llvm -S -fcf-protection=branch -flto %s | FileCheck %s --check-prefixes=CFPROT,NOIBTSEAL
+// RUN: %clang -target i386-unknown-unknown -o - -emit-llvm -S -fcf-protection=branch -mibt-seal %s | FileCheck %s --check-prefixes=CFPROT,NOIBTSEAL
 
 // RETURN: #define __CET__ 2
 // BRANCH: #define __CET__ 1
 // FULL: #define __CET__ 3
+// CFPROT: "cf-protection-branch", i32 1
+// IBTSEAL: "ibt-seal", i32 1
+// NOIBTSEAL-NOT: "ibt-seal", i32 1
 void foo() {}
Index: clang/lib/Frontend/CompilerInvocation.cpp
===
--- clang/lib/Frontend/CompilerInvocation.cpp
+++ clang/lib/Frontend/CompilerInvocation.cpp
@@ -1504,6 +1504,9 @@
   else if (Opts.CFProtectionBranch)
 GenerateArg(Args, OPT_fcf_protection_EQ, "branch", SA);
 
+  if (Opts.IBTSeal)
+GenerateArg(Args, OPT_mibt_seal, SA);
+
   for (const auto &F : Opts.LinkBitcodeFiles) {
 bool Builtint = F.LinkFlags == llvm::Linker::Flags::LinkOnlyNeeded &&
 F.PropagateAttrs && F.Internalize;
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D118052: [X86] Fix CodeGen Module Flag for -mibt-seal

2022-04-29 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

In D118052#3480564 , @joaomoreira 
wrote:

> I think there are no more untied knots... @pengfei, do you think this is 
> ready to merge? If yes, can you please merge it? tks!

Sure.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118052/new/

https://reviews.llvm.org/D118052

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D124757: [X86] Replace avx512f integer add reduction builtins with generic builtin

2022-05-02 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei accepted this revision.
pengfei added a comment.
This revision is now accepted and ready to land.

LGTM. Thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124757/new/

https://reviews.llvm.org/D124757

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D124916: [X86] Fix uninitialized variable warnings in cetintrin.h reported by #55224

2022-05-04 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei created this revision.
pengfei added reviewers: FreddyYe, RKSimon, LuoYuanke, craig.topper.
Herald added a subscriber: StephenFan.
Herald added a project: All.
pengfei requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Fix uninitialized variables introduced by D116325 
.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D124916

Files:
  clang/lib/Headers/cetintrin.h
  clang/test/CodeGen/X86/sse-builtins-constrained.c


Index: clang/test/CodeGen/X86/sse-builtins-constrained.c
===
--- clang/test/CodeGen/X86/sse-builtins-constrained.c
+++ clang/test/CodeGen/X86/sse-builtins-constrained.c
@@ -1,8 +1,8 @@
 // REQUIRES: x86-registered-target
 // RUN: %clang_cc1 -ffreestanding %s -triple=x86_64-unknown-linux-gnu 
-target-feature +sse -emit-llvm -o - -Wall -Werror | FileCheck %s 
--check-prefix=UNCONSTRAINED --check-prefix=COMMON --check-prefix=COMMONIR
 // RUN: %clang_cc1 -ffreestanding %s -triple=x86_64-unknown-linux-gnu 
-target-feature +sse -ffp-exception-behavior=maytrap -DSTRICT=1 -emit-llvm -o - 
-Wall -Werror | FileCheck %s --check-prefix=CONSTRAINED --check-prefix=COMMON 
--check-prefix=COMMONIR
-// RUN: %clang_cc1 -ffreestanding %s -triple=x86_64-unknown-linux-gnu 
-target-feature +sse -S %s -o - -Wall -Werror | FileCheck %s 
--check-prefix=CHECK-ASM --check-prefix=COMMON
-// RUN: %clang_cc1 -ffreestanding %s -triple=x86_64-unknown-linux-gnu 
-target-feature +sse -ffp-exception-behavior=maytrap -DSTRICT=1 -S %s -o - 
-Wall -Werror | FileCheck %s --check-prefix=CHECK-ASM --check-prefix=COMMON
+// RUN: %clang_cc1 -ffreestanding %s -triple=x86_64-unknown-linux-gnu 
-target-feature +sse -S -o - -Wall -Werror | FileCheck %s 
--check-prefix=CHECK-ASM --check-prefix=COMMON
+// RUN: %clang_cc1 -ffreestanding %s -triple=x86_64-unknown-linux-gnu 
-target-feature +sse -ffp-exception-behavior=maytrap -DSTRICT=1 -S -o - -Wall 
-Werror | FileCheck %s --check-prefix=CHECK-ASM --check-prefix=COMMON
 
 #ifdef STRICT
 // Test that the constrained intrinsics are picking up the exception
Index: clang/lib/Headers/cetintrin.h
===
--- clang/lib/Headers/cetintrin.h
+++ clang/lib/Headers/cetintrin.h
@@ -43,8 +43,11 @@
 }
 
 static __inline__ unsigned int __DEFAULT_FN_ATTRS _rdsspd_i32() {
+#pragma clang diagnostic push
+#pragma clang diagnostic ignored "-Wuninitialized"
   unsigned int t;
   return __builtin_ia32_rdsspd(t);
+#pragma clang diagnostic pop
 }
 
 #ifdef __x86_64__
@@ -53,8 +56,11 @@
 }
 
 static __inline__ unsigned long long __DEFAULT_FN_ATTRS _rdsspq_i64() {
+#pragma clang diagnostic push
+#pragma clang diagnostic ignored "-Wuninitialized"
   unsigned long long t;
   return __builtin_ia32_rdsspq(t);
+#pragma clang diagnostic pop
 }
 #endif /* __x86_64__ */
 


Index: clang/test/CodeGen/X86/sse-builtins-constrained.c
===
--- clang/test/CodeGen/X86/sse-builtins-constrained.c
+++ clang/test/CodeGen/X86/sse-builtins-constrained.c
@@ -1,8 +1,8 @@
 // REQUIRES: x86-registered-target
 // RUN: %clang_cc1 -ffreestanding %s -triple=x86_64-unknown-linux-gnu -target-feature +sse -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefix=UNCONSTRAINED --check-prefix=COMMON --check-prefix=COMMONIR
 // RUN: %clang_cc1 -ffreestanding %s -triple=x86_64-unknown-linux-gnu -target-feature +sse -ffp-exception-behavior=maytrap -DSTRICT=1 -emit-llvm -o - -Wall -Werror | FileCheck %s --check-prefix=CONSTRAINED --check-prefix=COMMON --check-prefix=COMMONIR
-// RUN: %clang_cc1 -ffreestanding %s -triple=x86_64-unknown-linux-gnu -target-feature +sse -S %s -o - -Wall -Werror | FileCheck %s --check-prefix=CHECK-ASM --check-prefix=COMMON
-// RUN: %clang_cc1 -ffreestanding %s -triple=x86_64-unknown-linux-gnu -target-feature +sse -ffp-exception-behavior=maytrap -DSTRICT=1 -S %s -o - -Wall -Werror | FileCheck %s --check-prefix=CHECK-ASM --check-prefix=COMMON
+// RUN: %clang_cc1 -ffreestanding %s -triple=x86_64-unknown-linux-gnu -target-feature +sse -S -o - -Wall -Werror | FileCheck %s --check-prefix=CHECK-ASM --check-prefix=COMMON
+// RUN: %clang_cc1 -ffreestanding %s -triple=x86_64-unknown-linux-gnu -target-feature +sse -ffp-exception-behavior=maytrap -DSTRICT=1 -S -o - -Wall -Werror | FileCheck %s --check-prefix=CHECK-ASM --check-prefix=COMMON
 
 #ifdef STRICT
 // Test that the constrained intrinsics are picking up the exception
Index: clang/lib/Headers/cetintrin.h
===
--- clang/lib/Headers/cetintrin.h
+++ clang/lib/Headers/cetintrin.h
@@ -43,8 +43,11 @@
 }
 
 static __inline__ unsigned int __DEFAULT_FN_ATTRS _rdsspd_i32() {
+#pragma clang diagnostic push
+#pragma clang diagnostic ignored "-Wuninitialized"
   unsigned int t;
   return __builtin_ia32_rdsspd(t);
+#pragma clang diagno

[PATCH] D124916: [X86] Fix uninitialized variable warnings in cetintrin.h reported by #55224

2022-05-04 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added inline comments.



Comment at: clang/test/CodeGen/X86/sse-builtins-constrained.c:5
+// RUN: %clang_cc1 -ffreestanding %s -triple=x86_64-unknown-linux-gnu 
-target-feature +sse -S -o - -Wall -Werror | FileCheck %s 
--check-prefix=CHECK-ASM --check-prefix=COMMON
+// RUN: %clang_cc1 -ffreestanding %s -triple=x86_64-unknown-linux-gnu 
-target-feature +sse -ffp-exception-behavior=maytrap -DSTRICT=1 -S -o - -Wall 
-Werror | FileCheck %s --check-prefix=CHECK-ASM --check-prefix=COMMON
 

RKSimon wrote:
> Any reason you can't commit this separately and immediately?
Good point! I'll do it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124916/new/

https://reviews.llvm.org/D124916

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D124916: [X86] Fix uninitialized variable warnings in cetintrin.h reported by #55224

2022-05-04 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei updated this revision to Diff 426971.
pengfei added a comment.

Seperated unrelated change and rebase.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124916/new/

https://reviews.llvm.org/D124916

Files:
  clang/lib/Headers/cetintrin.h


Index: clang/lib/Headers/cetintrin.h
===
--- clang/lib/Headers/cetintrin.h
+++ clang/lib/Headers/cetintrin.h
@@ -43,8 +43,11 @@
 }
 
 static __inline__ unsigned int __DEFAULT_FN_ATTRS _rdsspd_i32() {
+#pragma clang diagnostic push
+#pragma clang diagnostic ignored "-Wuninitialized"
   unsigned int t;
   return __builtin_ia32_rdsspd(t);
+#pragma clang diagnostic pop
 }
 
 #ifdef __x86_64__
@@ -53,8 +56,11 @@
 }
 
 static __inline__ unsigned long long __DEFAULT_FN_ATTRS _rdsspq_i64() {
+#pragma clang diagnostic push
+#pragma clang diagnostic ignored "-Wuninitialized"
   unsigned long long t;
   return __builtin_ia32_rdsspq(t);
+#pragma clang diagnostic pop
 }
 #endif /* __x86_64__ */
 


Index: clang/lib/Headers/cetintrin.h
===
--- clang/lib/Headers/cetintrin.h
+++ clang/lib/Headers/cetintrin.h
@@ -43,8 +43,11 @@
 }
 
 static __inline__ unsigned int __DEFAULT_FN_ATTRS _rdsspd_i32() {
+#pragma clang diagnostic push
+#pragma clang diagnostic ignored "-Wuninitialized"
   unsigned int t;
   return __builtin_ia32_rdsspd(t);
+#pragma clang diagnostic pop
 }
 
 #ifdef __x86_64__
@@ -53,8 +56,11 @@
 }
 
 static __inline__ unsigned long long __DEFAULT_FN_ATTRS _rdsspq_i64() {
+#pragma clang diagnostic push
+#pragma clang diagnostic ignored "-Wuninitialized"
   unsigned long long t;
   return __builtin_ia32_rdsspq(t);
+#pragma clang diagnostic pop
 }
 #endif /* __x86_64__ */
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D124916: [X86] Fix uninitialized variable warnings in cetintrin.h reported by #55224

2022-05-04 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

In D124916#3490868 , @RKSimon wrote:

> LGTM - I'm intending to add -Wsystem-headers to the clang x86 builtins tests 
> once everything is clean.

Thanks @RKSimon! That sounds great! I was thinking the headers will do 
diagnosis when `-Wall`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124916/new/

https://reviews.llvm.org/D124916

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D124916: [X86] Fix uninitialized variable warnings in cetintrin.h reported by #55224

2022-05-04 Thread Phoebe Wang via Phabricator via cfe-commits

This revision was automatically updated to reflect the committed changes.
Closed by commit rG2d18a86d14a9: [X86] Fix uninitialized variable warnings in 
cetintrin.h reported by #55224 (authored by pengfei).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124916/new/

https://reviews.llvm.org/D124916

Files:
  clang/lib/Headers/cetintrin.h


Index: clang/lib/Headers/cetintrin.h
===
--- clang/lib/Headers/cetintrin.h
+++ clang/lib/Headers/cetintrin.h
@@ -43,8 +43,11 @@
 }
 
 static __inline__ unsigned int __DEFAULT_FN_ATTRS _rdsspd_i32() {
+#pragma clang diagnostic push
+#pragma clang diagnostic ignored "-Wuninitialized"
   unsigned int t;
   return __builtin_ia32_rdsspd(t);
+#pragma clang diagnostic pop
 }
 
 #ifdef __x86_64__
@@ -53,8 +56,11 @@
 }
 
 static __inline__ unsigned long long __DEFAULT_FN_ATTRS _rdsspq_i64() {
+#pragma clang diagnostic push
+#pragma clang diagnostic ignored "-Wuninitialized"
   unsigned long long t;
   return __builtin_ia32_rdsspq(t);
+#pragma clang diagnostic pop
 }
 #endif /* __x86_64__ */
 


Index: clang/lib/Headers/cetintrin.h
===
--- clang/lib/Headers/cetintrin.h
+++ clang/lib/Headers/cetintrin.h
@@ -43,8 +43,11 @@
 }
 
 static __inline__ unsigned int __DEFAULT_FN_ATTRS _rdsspd_i32() {
+#pragma clang diagnostic push
+#pragma clang diagnostic ignored "-Wuninitialized"
   unsigned int t;
   return __builtin_ia32_rdsspd(t);
+#pragma clang diagnostic pop
 }
 
 #ifdef __x86_64__
@@ -53,8 +56,11 @@
 }
 
 static __inline__ unsigned long long __DEFAULT_FN_ATTRS _rdsspq_i64() {
+#pragma clang diagnostic push
+#pragma clang diagnostic ignored "-Wuninitialized"
   unsigned long long t;
   return __builtin_ia32_rdsspq(t);
+#pragma clang diagnostic pop
 }
 #endif /* __x86_64__ */
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D124916: [X86] Fix uninitialized variable warnings in cetintrin.h reported by #55224

2022-05-04 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei marked an inline comment as done.
pengfei added inline comments.



Comment at: clang/lib/Headers/cetintrin.h:45
 
 static __inline__ unsigned int __DEFAULT_FN_ATTRS _rdsspd_i32() {
+#pragma clang diagnostic push

craig.topper wrote:
> The argument should also be `(void)`.
Thanks! Done by rGaa25b55bde87.



Comment at: clang/lib/Headers/cetintrin.h:48
+#pragma clang diagnostic ignored "-Wuninitialized"
   unsigned int t;
   return __builtin_ia32_rdsspd(t);

craig.topper wrote:
> So if CET isn't enabled this intrinsic returns a random value instead of 0 
> like _get_ssp?
Exactly! These intrinsics are used to reflect the exact instructions behavior, 
i.e., a nop operation. They are used for performance.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124916/new/

https://reviews.llvm.org/D124916

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D125164: [X86] Fix some signedness errors in x86 headers

2022-05-07 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

One question: is it better to change the define of builtins than explicit casts?




Comment at: clang/lib/Headers/cetintrin.h:26
 #ifdef __x86_64__
 static __inline__ void __DEFAULT_FN_ATTRS _incsspq(unsigned long long __a) {
   __builtin_ia32_incsspq(__a);

RKSimon wrote:
> @pengfei The Intel Intrisics guide has this taking a int ?
I think both make sense in some way. The instruction `incsspq` only takes the 
low 8-bit of a 64-bit register, but the operand is still 64-bit anyway.
I'm not sure which one we should change. Might be Clang given we specify 
`unsigned int` for `_inc_ssp`? Any special reason we use `unsigned long long` 
here @craig.topper ?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125164/new/

https://reviews.llvm.org/D125164

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D125164: [X86] Fix some signedness errors in x86 headers

2022-05-07 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

> What do you want to do about _mm512_maskz_srli_epi16 ? The Intel Intrinsic 
> guide has the same mismatch.

These intrinsics are interesting. The descriptions on Intrinsic guide are for 
immediate variant, but all compilers' implementations are register variant. 
What's more, the codegen from Clang and GCC don't seem correct according to the 
description of `vpsrlw  zmm0, zmm0, xmm1`. They should do the same broadcast as 
ICC.

Back to the question, I think the type in Clang's intrinsics match with 
Intrinsic guide. They just not match each other. I guess it might be historical 
reasons, so let's keep them and using cast?




Comment at: clang/lib/Headers/cetintrin.h:37
 static __inline__ void __DEFAULT_FN_ATTRS _inc_ssp(unsigned int __a) {
-  __builtin_ia32_incsspd((int)__a);
+  __builtin_ia32_incsspd((unsigned int)__a);
 }

Unnecessary cast?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125164/new/

https://reviews.llvm.org/D125164

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D125164: [X86] Fix some signedness errors in x86 headers

2022-05-07 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

In D125164#3498752 , @RKSimon wrote:

> Actually the ia32_tzcnt builtins should stay the way they are - other C/C++ 
> intrinsics return unsigned so we'd still end up with adding explicit casts

No problem, adding explicit casts look good to me.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125164/new/

https://reviews.llvm.org/D125164

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D125164: [X86] Fix some signedness errors in x86 headers

2022-05-07 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

> These intrinsics are interesting. The descriptions on Intrinsic guide are for 
> immediate variant, but all compilers' implementations are register variant. 
> What's more, the codegen from Clang and GCC don't seem correct according to 
> the description of `vpsrlw  zmm0, zmm0, xmm1`. They should do the same 
> broadcast as ICC. https://godbolt.org/z/dcrqdEs8q

After a second read, I found Clang and GCC's generation are also correct, I 
confused `vpsrlw  zmm0, zmm0, xmm1` with `vpsrlw  zmm0, zmm0, zmm1`. Please 
ignore the noise.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125164/new/

https://reviews.llvm.org/D125164

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D125164: [X86] Fix some signedness errors in x86 headers

2022-05-07 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei accepted this revision.
pengfei added a comment.
This revision is now accepted and ready to land.

LGTM, thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125164/new/

https://reviews.llvm.org/D125164

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D125170: [Headers][X86] Replace \operation with \verbatim

2022-05-07 Thread Phoebe Wang via Phabricator via cfe-commits

pengfei added a comment.

In D125170#3498913 , @RKSimon wrote:

> If people prefer we can alternatively use \code{.unparsed} .. \endcode blocks 
> - I'm unsure if these operation blocks are being used in a particular way 
> downstream

We have used \code .. \endcode blocks in headers, so maybe \code{.operation} .. 
\endcode for precisely match? We haven't used verbatim before, but I'm a little 
concerned we may mix up in future.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125170/new/

https://reviews.llvm.org/D125170

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

1 2 3 4 5 6 >

1 - 100 of 515 matches

Mail list logo