[PATCH] D50168: [Builtins] Implement __builtin_clrsb to be compatible with gcc

2018-08-07 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

Ping


https://reviews.llvm.org/D50168



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50168: [Builtins] Implement __builtin_clrsb to be compatible with gcc

2018-08-07 Thread Craig Topper via Phabricator via cfe-commits
craig.topper updated this revision to Diff 159627.
craig.topper added a comment.

Add the test case that I failed to pick up in the original diff.


https://reviews.llvm.org/D50168

Files:
  include/clang/Basic/Builtins.def
  lib/CodeGen/CGBuiltin.cpp
  test/CodeGen/builtin_clrsb.c


Index: test/CodeGen/builtin_clrsb.c
===
--- /dev/null
+++ test/CodeGen/builtin_clrsb.c
@@ -0,0 +1,24 @@
+// RUN: %clang_cc1 %s -emit-llvm -o - | FileCheck %s
+
+int test__builtin_clrsb(int x) {
+// CHECK-LABEL: test__builtin_clrsb
+// CHECK: [[C:%.*]] = icmp slt i32 [[X:%.*]], 0
+// CHECK: [[INV:%.*]] = xor i32 [[X]], -1
+// CHECK: [[SEL:%.*]] = select i1 [[C]], i32 [[INV]], i32 [[X]]
+// CHECK: [[SHL:%.*]] = shl i32 [[SEL]], 1
+// CHECK: [[OR:%.*]] = or i32 [[SHL]], 1
+// CHECK: call i32 @llvm.ctlz.i32(i32 [[OR]], i1 true)
+  return __builtin_clrsb(x);
+}
+
+int test__builtin_clrsbll(long long x) {
+// CHECK-LABEL: test__builtin_clrsbll
+// CHECK: [[C:%.*]] = icmp slt i64 [[X:%.*]], 0
+// CHECK-NEXT: [[INV:%.*]] = xor i64 [[X]], -1
+// CHECK-NEXT: [[SEL:%.*]] = select i1 [[C]], i64 [[INV]], i64 [[X]]
+// CHECK-NEXT: [[SHL:%.*]] = shl i64 [[SEL]], 1
+// CHECK-NEXT: [[OR:%.*]] = or i64 [[SHL]], 1
+// CHECK-NEXT: [[CTLZ:%.*]] = call i64 @llvm.ctlz.i64(i64 [[OR]], i1 true)
+// CHECK-NEXT: trunc i64 [[CTLZ]] to i32
+  return __builtin_clrsbll(x);
+}
Index: lib/CodeGen/CGBuiltin.cpp
===
--- lib/CodeGen/CGBuiltin.cpp
+++ lib/CodeGen/CGBuiltin.cpp
@@ -1537,6 +1537,33 @@
 return RValue::get(ComplexVal.second);
   }
 
+  case Builtin::BI__builtin_clrsb:
+  case Builtin::BI__builtin_clrsbl:
+  case Builtin::BI__builtin_clrsbll: {
+// clrsb(x) -> clz(x < 0 ? ~x : x) - 1 or
+//  -> clz(((x < 0 ? ~x : x) << 1) | 1)
+Value *ArgValue = EmitScalarExpr(E->getArg(0));
+
+llvm::Type *ArgType = ArgValue->getType();
+Value *F = CGM.getIntrinsic(Intrinsic::ctlz, ArgType);
+
+llvm::Type *ResultType = ConvertType(E->getType());
+Value *Zero = llvm::Constant::getNullValue(ArgType);
+Value *IsNeg = Builder.CreateICmpSLT(ArgValue, Zero, "isneg");
+Value *Inverse = Builder.CreateNot(ArgValue, "not");
+Value *Tmp = Builder.CreateSelect(IsNeg, Inverse, ArgValue);
+// Now we need to calculate ctlz(Tmp)-1, but Tmp might be zero. We know
+// the sign bit is zero, so we can shift it out. Then put a 1 in the LSB.
+// This removes one leading zero like the subtract does, and replaces it
+// with a guaranteed one to prevent the value being 0.
+Value *One = llvm::ConstantInt::get(ArgType, 1);
+Tmp = Builder.CreateShl(Tmp, One);
+Tmp = Builder.CreateOr(Tmp, One);
+Value *Result = Builder.CreateCall(F, {Tmp, Builder.getTrue()});
+Result = Builder.CreateIntCast(Result, ResultType, /*isSigned*/true,
+   "cast");
+return RValue::get(Result);
+  }
   case Builtin::BI__builtin_ctzs:
   case Builtin::BI__builtin_ctz:
   case Builtin::BI__builtin_ctzl:
Index: include/clang/Basic/Builtins.def
===
--- include/clang/Basic/Builtins.def
+++ include/clang/Basic/Builtins.def
@@ -413,6 +413,9 @@
 BUILTIN(__builtin_popcount  , "iUi"  , "nc")
 BUILTIN(__builtin_popcountl , "iULi" , "nc")
 BUILTIN(__builtin_popcountll, "iULLi", "nc")
+BUILTIN(__builtin_clrsb  , "ii"  , "nc")
+BUILTIN(__builtin_clrsbl , "iLi" , "nc")
+BUILTIN(__builtin_clrsbll, "iLLi", "nc")
 
 // FIXME: These type signatures are not correct for targets with int != 32-bits
 // or with ULL != 64-bits.


Index: test/CodeGen/builtin_clrsb.c
===
--- /dev/null
+++ test/CodeGen/builtin_clrsb.c
@@ -0,0 +1,24 @@
+// RUN: %clang_cc1 %s -emit-llvm -o - | FileCheck %s
+
+int test__builtin_clrsb(int x) {
+// CHECK-LABEL: test__builtin_clrsb
+// CHECK: [[C:%.*]] = icmp slt i32 [[X:%.*]], 0
+// CHECK: [[INV:%.*]] = xor i32 [[X]], -1
+// CHECK: [[SEL:%.*]] = select i1 [[C]], i32 [[INV]], i32 [[X]]
+// CHECK: [[SHL:%.*]] = shl i32 [[SEL]], 1
+// CHECK: [[OR:%.*]] = or i32 [[SHL]], 1
+// CHECK: call i32 @llvm.ctlz.i32(i32 [[OR]], i1 true)
+  return __builtin_clrsb(x);
+}
+
+int test__builtin_clrsbll(long long x) {
+// CHECK-LABEL: test__builtin_clrsbll
+// CHECK: [[C:%.*]] = icmp slt i64 [[X:%.*]], 0
+// CHECK-NEXT: [[INV:%.*]] = xor i64 [[X]], -1
+// CHECK-NEXT: [[SEL:%.*]] = select i1 [[C]], i64 [[INV]], i64 [[X]]
+// CHECK-NEXT: [[SHL:%.*]] = shl i64 [[SEL]], 1
+// CHECK-NEXT: [[OR:%.*]] = or i64 [[SHL]], 1
+// CHECK-NEXT: [[CTLZ:%.*]] = call i64 @llvm.ctlz.i64(i64 [[OR]], i1 true)
+// CHECK-NEXT: trunc i64 [[CTLZ]] to i32
+  return __builtin_clrsbll(x);
+}
Index: lib/CodeGen/CGBuiltin.cpp
===
--- lib/CodeGen/CGBuiltin.cpp
+++ lib/CodeGen/CGBuiltin.cpp
@@ -1537,6 +1537,33 @@
 return RValue::g

[PATCH] D50168: [Builtins] Implement __builtin_clrsb to be compatible with gcc

2018-08-08 Thread Craig Topper via Phabricator via cfe-commits
craig.topper updated this revision to Diff 159753.
craig.topper added a comment.

Use ctlz(zero_undef=false) and sub


https://reviews.llvm.org/D50168

Files:
  include/clang/Basic/Builtins.def
  lib/CodeGen/CGBuiltin.cpp
  test/CodeGen/builtin_clrsb.c


Index: test/CodeGen/builtin_clrsb.c
===
--- /dev/null
+++ test/CodeGen/builtin_clrsb.c
@@ -0,0 +1,22 @@
+// RUN: %clang_cc1 %s -emit-llvm -o - | FileCheck %s
+
+int test__builtin_clrsb(int x) {
+// CHECK-LABEL: test__builtin_clrsb
+// CHECK: [[C:%.*]] = icmp slt i32 [[X:%.*]], 0
+// CHECK-NEXT: [[INV:%.*]] = xor i32 [[X]], -1
+// CHECK-NEXT: [[SEL:%.*]] = select i1 [[C]], i32 [[INV]], i32 [[X]]
+// CHECK-NEXT: [[CTLZ:%.*]] = call i32 @llvm.ctlz.i32(i32 [[SEL]], i1 false)
+// CHECK-NEXT: [[SUB:%.*]] = sub i32 [[CTLZ]], 1
+  return __builtin_clrsb(x);
+}
+
+int test__builtin_clrsbll(long long x) {
+// CHECK-LABEL: test__builtin_clrsbll
+// CHECK: [[C:%.*]] = icmp slt i64 [[X:%.*]], 0
+// CHECK-NEXT: [[INV:%.*]] = xor i64 [[X]], -1
+// CHECK-NEXT: [[SEL:%.*]] = select i1 [[C]], i64 [[INV]], i64 [[X]]
+// CHECK-NEXT: [[CTLZ:%.*]] = call i64 @llvm.ctlz.i64(i64 [[SEL]], i1 false)
+// CHECK-NEXT: [[SUB:%.*]] = sub i64 [[CTLZ]], 1
+// CHECK-NEXT: trunc i64 [[SUB]] to i32
+  return __builtin_clrsbll(x);
+}
Index: lib/CodeGen/CGBuiltin.cpp
===
--- lib/CodeGen/CGBuiltin.cpp
+++ lib/CodeGen/CGBuiltin.cpp
@@ -1537,6 +1537,26 @@
 return RValue::get(ComplexVal.second);
   }
 
+  case Builtin::BI__builtin_clrsb:
+  case Builtin::BI__builtin_clrsbl:
+  case Builtin::BI__builtin_clrsbll: {
+// clrsb(x) -> clz(x < 0 ? ~x : x) - 1 or
+Value *ArgValue = EmitScalarExpr(E->getArg(0));
+
+llvm::Type *ArgType = ArgValue->getType();
+Value *F = CGM.getIntrinsic(Intrinsic::ctlz, ArgType);
+
+llvm::Type *ResultType = ConvertType(E->getType());
+Value *Zero = llvm::Constant::getNullValue(ArgType);
+Value *IsNeg = Builder.CreateICmpSLT(ArgValue, Zero, "isneg");
+Value *Inverse = Builder.CreateNot(ArgValue, "not");
+Value *Tmp = Builder.CreateSelect(IsNeg, Inverse, ArgValue);
+Value *Ctlz = Builder.CreateCall(F, {Tmp, Builder.getFalse()});
+Value *Result = Builder.CreateSub(Ctlz, llvm::ConstantInt::get(ArgType, 
1));
+Result = Builder.CreateIntCast(Result, ResultType, /*isSigned*/true,
+   "cast");
+return RValue::get(Result);
+  }
   case Builtin::BI__builtin_ctzs:
   case Builtin::BI__builtin_ctz:
   case Builtin::BI__builtin_ctzl:
Index: include/clang/Basic/Builtins.def
===
--- include/clang/Basic/Builtins.def
+++ include/clang/Basic/Builtins.def
@@ -413,6 +413,9 @@
 BUILTIN(__builtin_popcount  , "iUi"  , "nc")
 BUILTIN(__builtin_popcountl , "iULi" , "nc")
 BUILTIN(__builtin_popcountll, "iULLi", "nc")
+BUILTIN(__builtin_clrsb  , "ii"  , "nc")
+BUILTIN(__builtin_clrsbl , "iLi" , "nc")
+BUILTIN(__builtin_clrsbll, "iLLi", "nc")
 
 // FIXME: These type signatures are not correct for targets with int != 32-bits
 // or with ULL != 64-bits.


Index: test/CodeGen/builtin_clrsb.c
===
--- /dev/null
+++ test/CodeGen/builtin_clrsb.c
@@ -0,0 +1,22 @@
+// RUN: %clang_cc1 %s -emit-llvm -o - | FileCheck %s
+
+int test__builtin_clrsb(int x) {
+// CHECK-LABEL: test__builtin_clrsb
+// CHECK: [[C:%.*]] = icmp slt i32 [[X:%.*]], 0
+// CHECK-NEXT: [[INV:%.*]] = xor i32 [[X]], -1
+// CHECK-NEXT: [[SEL:%.*]] = select i1 [[C]], i32 [[INV]], i32 [[X]]
+// CHECK-NEXT: [[CTLZ:%.*]] = call i32 @llvm.ctlz.i32(i32 [[SEL]], i1 false)
+// CHECK-NEXT: [[SUB:%.*]] = sub i32 [[CTLZ]], 1
+  return __builtin_clrsb(x);
+}
+
+int test__builtin_clrsbll(long long x) {
+// CHECK-LABEL: test__builtin_clrsbll
+// CHECK: [[C:%.*]] = icmp slt i64 [[X:%.*]], 0
+// CHECK-NEXT: [[INV:%.*]] = xor i64 [[X]], -1
+// CHECK-NEXT: [[SEL:%.*]] = select i1 [[C]], i64 [[INV]], i64 [[X]]
+// CHECK-NEXT: [[CTLZ:%.*]] = call i64 @llvm.ctlz.i64(i64 [[SEL]], i1 false)
+// CHECK-NEXT: [[SUB:%.*]] = sub i64 [[CTLZ]], 1
+// CHECK-NEXT: trunc i64 [[SUB]] to i32
+  return __builtin_clrsbll(x);
+}
Index: lib/CodeGen/CGBuiltin.cpp
===
--- lib/CodeGen/CGBuiltin.cpp
+++ lib/CodeGen/CGBuiltin.cpp
@@ -1537,6 +1537,26 @@
 return RValue::get(ComplexVal.second);
   }
 
+  case Builtin::BI__builtin_clrsb:
+  case Builtin::BI__builtin_clrsbl:
+  case Builtin::BI__builtin_clrsbll: {
+// clrsb(x) -> clz(x < 0 ? ~x : x) - 1 or
+Value *ArgValue = EmitScalarExpr(E->getArg(0));
+
+llvm::Type *ArgType = ArgValue->getType();
+Value *F = CGM.getIntrinsic(Intrinsic::ctlz, ArgType);
+
+llvm::Type *ResultType = ConvertType(E->getType());
+Value *Zero = llvm::Constant::getNullValue(ArgType);
+Value *IsNeg = Builder.CreateICmpSLT(ArgValue, Zero

[PATCH] D45619: [Time-report] (1) Use special new Clang flag 'FrontendTimesIsEnabled' instead of 'llvm::TimePassesIsEnabled' inside -ftime-report feature

2018-08-08 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

Correct me if I'm wrong, but after this change llvm no longer enables the 
timing of individual passes when -ftime-report is passed? Was that intentional?


Repository:
  rL LLVM

https://reviews.llvm.org/D45619



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D45619: [Time-report] (1) Use special new Clang flag 'FrontendTimesIsEnabled' instead of 'llvm::TimePassesIsEnabled' inside -ftime-report feature

2018-08-08 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

Ok I'll add that back.

I'm unclear why the we would want to assign clang's FrontendTimesIsEnabled from 
inside CodeGenAction. If I'm understanding the intentions here, the goal was to 
add more timing infrastructure to clang. But if the enabling is tied to 
CodeGenAction, then doesn't that mean any new clang timers wouldn't work under 
-fsyntax-only?


Repository:
  rL LLVM

https://reviews.llvm.org/D45619



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D45619: [Time-report] (1) Use special new Clang flag 'FrontendTimesIsEnabled' instead of 'llvm::TimePassesIsEnabled' inside -ftime-report feature

2018-08-08 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

Assignment restored in r339281. I'll file a bug to merge to 7.0


Repository:
  rL LLVM

https://reviews.llvm.org/D45619



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50471: [Builtins] Add __bulitin_clrsb support to IntExprEvaluator::VisitBuiltinCallExpr

2018-08-08 Thread Craig Topper via Phabricator via cfe-commits
craig.topper created this revision.
craig.topper added reviewers: bkramer, erichkeane.

Now that __builtin_clrsb is supported by clang, this patch adds constant 
evaluation of it to address the FIXME.


https://reviews.llvm.org/D50471

Files:
  lib/AST/ExprConstant.cpp
  test/Sema/constant-builtins-2.c


Index: test/Sema/constant-builtins-2.c
===
--- test/Sema/constant-builtins-2.c
+++ test/Sema/constant-builtins-2.c
@@ -176,6 +176,19 @@
 char ffs5[__builtin_ffs(1U << (BITSIZE(int) - 1)) == BITSIZE(int) ? 1 : -1];
 char ffs6[__builtin_ffsl(0x10L) == 5 ? 1 : -1];
 char ffs7[__builtin_ffsll(0x100LL) == 9 ? 1 : -1];
+
+char clrsb1[__builtin_clrsb(0) == BITSIZE(int) - 1 ? 1 : -1];
+char clrsb2[__builtin_clrsbl(0L) == BITSIZE(long) - 1 ? 1 : -1];
+char clrsb3[__builtin_clrsbll(0LL) == BITSIZE(long long) - 1 ? 1 : -1];
+char clrsb4[__builtin_clrsb(~0) == BITSIZE(int) - 1 ? 1 : -1];
+char clrsb5[__builtin_clrsbl(~0L) == BITSIZE(long) - 1 ? 1 : -1];
+char clrsb6[__builtin_clrsbll(~0LL) == BITSIZE(long long) - 1 ? 1 : -1];
+char clrsb7[__builtin_clrsb(1) == BITSIZE(int) - 2 ? 1 : -1];
+char clrsb8[__builtin_clrsb(~1) == BITSIZE(int) - 2 ? 1 : -1];
+char clrsb9[__builtin_clrsb(1 << (BITSIZE(int) - 1)) == 0 ? 1 : -1];
+char clrsb10[__builtin_clrsb(~(1 << (BITSIZE(int) - 1))) == 0 ? 1 : -1];
+char clrsb11[__builtin_clrsb(0xf) == BITSIZE(int) - 5 ? 1 : -1];
+char clrsb11[__builtin_clrsb(~0x1f) == BITSIZE(int) - 6 ? 1 : -1];
 #undef BITSIZE
 
 // GCC misc stuff
Index: lib/AST/ExprConstant.cpp
===
--- lib/AST/ExprConstant.cpp
+++ lib/AST/ExprConstant.cpp
@@ -8117,9 +8117,15 @@
   case Builtin::BI__builtin_classify_type:
 return Success((int)EvaluateBuiltinClassifyType(E, Info.getLangOpts()), E);
 
-  // FIXME: BI__builtin_clrsb
-  // FIXME: BI__builtin_clrsbl
-  // FIXME: BI__builtin_clrsbll
+  case Builtin::BI__builtin_clrsb:
+  case Builtin::BI__builtin_clrsbl:
+  case Builtin::BI__builtin_clrsbll: {
+APSInt Val;
+if (!EvaluateInteger(E->getArg(0), Val, Info))
+  return false;
+
+return Success(Val.getBitWidth() - Val.getMinSignedBits(), E);
+  }
 
   case Builtin::BI__builtin_clz:
   case Builtin::BI__builtin_clzl:


Index: test/Sema/constant-builtins-2.c
===
--- test/Sema/constant-builtins-2.c
+++ test/Sema/constant-builtins-2.c
@@ -176,6 +176,19 @@
 char ffs5[__builtin_ffs(1U << (BITSIZE(int) - 1)) == BITSIZE(int) ? 1 : -1];
 char ffs6[__builtin_ffsl(0x10L) == 5 ? 1 : -1];
 char ffs7[__builtin_ffsll(0x100LL) == 9 ? 1 : -1];
+
+char clrsb1[__builtin_clrsb(0) == BITSIZE(int) - 1 ? 1 : -1];
+char clrsb2[__builtin_clrsbl(0L) == BITSIZE(long) - 1 ? 1 : -1];
+char clrsb3[__builtin_clrsbll(0LL) == BITSIZE(long long) - 1 ? 1 : -1];
+char clrsb4[__builtin_clrsb(~0) == BITSIZE(int) - 1 ? 1 : -1];
+char clrsb5[__builtin_clrsbl(~0L) == BITSIZE(long) - 1 ? 1 : -1];
+char clrsb6[__builtin_clrsbll(~0LL) == BITSIZE(long long) - 1 ? 1 : -1];
+char clrsb7[__builtin_clrsb(1) == BITSIZE(int) - 2 ? 1 : -1];
+char clrsb8[__builtin_clrsb(~1) == BITSIZE(int) - 2 ? 1 : -1];
+char clrsb9[__builtin_clrsb(1 << (BITSIZE(int) - 1)) == 0 ? 1 : -1];
+char clrsb10[__builtin_clrsb(~(1 << (BITSIZE(int) - 1))) == 0 ? 1 : -1];
+char clrsb11[__builtin_clrsb(0xf) == BITSIZE(int) - 5 ? 1 : -1];
+char clrsb11[__builtin_clrsb(~0x1f) == BITSIZE(int) - 6 ? 1 : -1];
 #undef BITSIZE
 
 // GCC misc stuff
Index: lib/AST/ExprConstant.cpp
===
--- lib/AST/ExprConstant.cpp
+++ lib/AST/ExprConstant.cpp
@@ -8117,9 +8117,15 @@
   case Builtin::BI__builtin_classify_type:
 return Success((int)EvaluateBuiltinClassifyType(E, Info.getLangOpts()), E);
 
-  // FIXME: BI__builtin_clrsb
-  // FIXME: BI__builtin_clrsbl
-  // FIXME: BI__builtin_clrsbll
+  case Builtin::BI__builtin_clrsb:
+  case Builtin::BI__builtin_clrsbl:
+  case Builtin::BI__builtin_clrsbll: {
+APSInt Val;
+if (!EvaluateInteger(E->getArg(0), Val, Info))
+  return false;
+
+return Success(Val.getBitWidth() - Val.getMinSignedBits(), E);
+  }
 
   case Builtin::BI__builtin_clz:
   case Builtin::BI__builtin_clzl:
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50471: [Builtins] Add __bulitin_clrsb support to IntExprEvaluator::VisitBuiltinCallExpr

2018-08-08 Thread Craig Topper via Phabricator via cfe-commits
craig.topper updated this revision to Diff 159777.
craig.topper added a comment.

Fix duplicate variable name in the test. Not sure why that didn't complain


https://reviews.llvm.org/D50471

Files:
  lib/AST/ExprConstant.cpp
  test/Sema/constant-builtins-2.c


Index: test/Sema/constant-builtins-2.c
===
--- test/Sema/constant-builtins-2.c
+++ test/Sema/constant-builtins-2.c
@@ -176,6 +176,19 @@
 char ffs5[__builtin_ffs(1U << (BITSIZE(int) - 1)) == BITSIZE(int) ? 1 : -1];
 char ffs6[__builtin_ffsl(0x10L) == 5 ? 1 : -1];
 char ffs7[__builtin_ffsll(0x100LL) == 9 ? 1 : -1];
+
+char clrsb1[__builtin_clrsb(0) == BITSIZE(int) - 1 ? 1 : -1];
+char clrsb2[__builtin_clrsbl(0L) == BITSIZE(long) - 1 ? 1 : -1];
+char clrsb3[__builtin_clrsbll(0LL) == BITSIZE(long long) - 1 ? 1 : -1];
+char clrsb4[__builtin_clrsb(~0) == BITSIZE(int) - 1 ? 1 : -1];
+char clrsb5[__builtin_clrsbl(~0L) == BITSIZE(long) - 1 ? 1 : -1];
+char clrsb6[__builtin_clrsbll(~0LL) == BITSIZE(long long) - 1 ? 1 : -1];
+char clrsb7[__builtin_clrsb(1) == BITSIZE(int) - 2 ? 1 : -1];
+char clrsb8[__builtin_clrsb(~1) == BITSIZE(int) - 2 ? 1 : -1];
+char clrsb9[__builtin_clrsb(1 << (BITSIZE(int) - 1)) == 0 ? 1 : -1];
+char clrsb10[__builtin_clrsb(~(1 << (BITSIZE(int) - 1))) == 0 ? 1 : -1];
+char clrsb11[__builtin_clrsb(0xf) == BITSIZE(int) - 5 ? 1 : -1];
+char clrsb12[__builtin_clrsb(~0x1f) == BITSIZE(int) - 6 ? 1 : -1];
 #undef BITSIZE
 
 // GCC misc stuff
Index: lib/AST/ExprConstant.cpp
===
--- lib/AST/ExprConstant.cpp
+++ lib/AST/ExprConstant.cpp
@@ -8117,9 +8117,15 @@
   case Builtin::BI__builtin_classify_type:
 return Success((int)EvaluateBuiltinClassifyType(E, Info.getLangOpts()), E);
 
-  // FIXME: BI__builtin_clrsb
-  // FIXME: BI__builtin_clrsbl
-  // FIXME: BI__builtin_clrsbll
+  case Builtin::BI__builtin_clrsb:
+  case Builtin::BI__builtin_clrsbl:
+  case Builtin::BI__builtin_clrsbll: {
+APSInt Val;
+if (!EvaluateInteger(E->getArg(0), Val, Info))
+  return false;
+
+return Success(Val.getBitWidth() - Val.getMinSignedBits(), E);
+  }
 
   case Builtin::BI__builtin_clz:
   case Builtin::BI__builtin_clzl:


Index: test/Sema/constant-builtins-2.c
===
--- test/Sema/constant-builtins-2.c
+++ test/Sema/constant-builtins-2.c
@@ -176,6 +176,19 @@
 char ffs5[__builtin_ffs(1U << (BITSIZE(int) - 1)) == BITSIZE(int) ? 1 : -1];
 char ffs6[__builtin_ffsl(0x10L) == 5 ? 1 : -1];
 char ffs7[__builtin_ffsll(0x100LL) == 9 ? 1 : -1];
+
+char clrsb1[__builtin_clrsb(0) == BITSIZE(int) - 1 ? 1 : -1];
+char clrsb2[__builtin_clrsbl(0L) == BITSIZE(long) - 1 ? 1 : -1];
+char clrsb3[__builtin_clrsbll(0LL) == BITSIZE(long long) - 1 ? 1 : -1];
+char clrsb4[__builtin_clrsb(~0) == BITSIZE(int) - 1 ? 1 : -1];
+char clrsb5[__builtin_clrsbl(~0L) == BITSIZE(long) - 1 ? 1 : -1];
+char clrsb6[__builtin_clrsbll(~0LL) == BITSIZE(long long) - 1 ? 1 : -1];
+char clrsb7[__builtin_clrsb(1) == BITSIZE(int) - 2 ? 1 : -1];
+char clrsb8[__builtin_clrsb(~1) == BITSIZE(int) - 2 ? 1 : -1];
+char clrsb9[__builtin_clrsb(1 << (BITSIZE(int) - 1)) == 0 ? 1 : -1];
+char clrsb10[__builtin_clrsb(~(1 << (BITSIZE(int) - 1))) == 0 ? 1 : -1];
+char clrsb11[__builtin_clrsb(0xf) == BITSIZE(int) - 5 ? 1 : -1];
+char clrsb12[__builtin_clrsb(~0x1f) == BITSIZE(int) - 6 ? 1 : -1];
 #undef BITSIZE
 
 // GCC misc stuff
Index: lib/AST/ExprConstant.cpp
===
--- lib/AST/ExprConstant.cpp
+++ lib/AST/ExprConstant.cpp
@@ -8117,9 +8117,15 @@
   case Builtin::BI__builtin_classify_type:
 return Success((int)EvaluateBuiltinClassifyType(E, Info.getLangOpts()), E);
 
-  // FIXME: BI__builtin_clrsb
-  // FIXME: BI__builtin_clrsbl
-  // FIXME: BI__builtin_clrsbll
+  case Builtin::BI__builtin_clrsb:
+  case Builtin::BI__builtin_clrsbl:
+  case Builtin::BI__builtin_clrsbll: {
+APSInt Val;
+if (!EvaluateInteger(E->getArg(0), Val, Info))
+  return false;
+
+return Success(Val.getBitWidth() - Val.getMinSignedBits(), E);
+  }
 
   case Builtin::BI__builtin_clz:
   case Builtin::BI__builtin_clzl:
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50678: [InlineAsm] Update the min-legal-vector-width function attribute based on inputs and outputs to inline assembly

2018-08-13 Thread Craig Topper via Phabricator via cfe-commits
craig.topper created this revision.
craig.topper added reviewers: chandlerc, rsmith, rnk.
Herald added a subscriber: eraman.

Another piece of my ongoing to work for prefer-vector-width.

min-legal-vector-width will eventually be used by the X86 backend to know 
whether it needs to make 512 bits type legal when prefer-vector-width=256. If 
the user used inline assembly that passed in/out a 512-bit register, we need to 
make sure 512 bits are considered legal. Otherwise we'll get an assert failure 
when we try to wire up the inline assembly to the rest of the code.

This patch just checks the LLVM IR types to see if they are vectors and then 
updates the attribute based on their total width. I'm not sure if this is the 
best way to do this or if there's any subtlety I might have missed. So if 
anyone has other opinions on how to do this I'm open to suggestions.


https://reviews.llvm.org/D50678

Files:
  lib/CodeGen/CGStmt.cpp
  test/CodeGen/x86-inline-asm-min-vector-width.c


Index: test/CodeGen/x86-inline-asm-min-vector-width.c
===
--- /dev/null
+++ test/CodeGen/x86-inline-asm-min-vector-width.c
@@ -0,0 +1,45 @@
+// RUN: %clang_cc1 %s -triple x86_64-unknown-linux-gnu -emit-llvm 
-target-feature +avx512f -o - | FileCheck %s
+
+typedef long long __m128i __attribute__ ((vector_size (16)));
+typedef long long __m256i __attribute__ ((vector_size (32)));
+typedef long long __m512i __attribute__ ((vector_size (64)));
+
+// CHECK: define <2 x i64> @testXMMout(<2 x i64>* %p) #0
+__m128i testXMMout(__m128i *p) {
+  __m128i xmm0;
+  __asm__("vmovdqu %1, %0" :"=v"(xmm0) : "m"(*(__m128i*)p));
+  return xmm0;
+}
+
+// CHECK: define <4 x i64> @testYMMout(<4 x i64>* %p) #1
+__m256i testYMMout(__m256i *p) {
+  __m256i ymm0;
+  __asm__("vmovdqu %1, %0" :"=v"(ymm0) : "m"(*(__m256i*)p));
+  return ymm0;
+}
+
+// CHECK: define <8 x i64> @testZMMout(<8 x i64>* %p) #2
+__m512i testZMMout(__m512i *p) {
+  __m512i zmm0;
+  __asm__("vmovdqu64 %1, %0" :"=v"(zmm0) : "m"(*(__m512i*)p));
+  return zmm0;
+}
+
+// CHECK: define void @testXMMin(<2 x i64> %xmm0, <2 x i64>* %p) #0
+void testXMMin(__m128i xmm0, __m128i *p) {
+  __asm__("vmovdqu %0, %1" : : "v"(xmm0), "m"(*(__m128i*)p));
+}
+
+// CHECK: define void @testYMMin(<4 x i64> %ymm0, <4 x i64>* %p) #1
+void testYMMin(__m256i ymm0, __m256i *p) {
+  __asm__("vmovdqu %0, %1" : : "v"(ymm0), "m"(*(__m256i*)p));
+}
+
+// CHECK: define void @testZMMin(<8 x i64> %zmm0, <8 x i64>* %p) #2
+void testZMMin(__m512i zmm0, __m512i *p) {
+  __asm__("vmovdqu64 %0, %1" : : "v"(zmm0), "m"(*(__m512i*)p));
+}
+
+// CHECK: attributes #0 = {{.*}}"min-legal-vector-width"="128"
+// CHECK: attributes #1 = {{.*}}"min-legal-vector-width"="256"
+// CHECK: attributes #2 = {{.*}}"min-legal-vector-width"="512"
Index: lib/CodeGen/CGStmt.cpp
===
--- lib/CodeGen/CGStmt.cpp
+++ lib/CodeGen/CGStmt.cpp
@@ -1979,6 +1979,11 @@
   diag::err_asm_invalid_type_in_input)
 << OutExpr->getType() << OutputConstraint;
   }
+
+  // Update largest vector width for any vector types.
+  if (auto *VT = dyn_cast(ResultRegTypes.back()))
+LargestVectorWidth = std::max(LargestVectorWidth,
+  VT->getPrimitiveSizeInBits());
 } else {
   ArgTypes.push_back(Dest.getAddress().getType());
   Args.push_back(Dest.getPointer());
@@ -2000,6 +2005,10 @@
Arg->getType()))
 Arg = Builder.CreateBitCast(Arg, AdjTy);
 
+  // Update largest vector width for any vector types.
+  if (auto *VT = dyn_cast(Arg->getType()))
+LargestVectorWidth = std::max(LargestVectorWidth,
+  VT->getPrimitiveSizeInBits());
   if (Info.allowsRegister())
 InOutConstraints += llvm::utostr(i);
   else
@@ -2080,6 +2089,11 @@
   CGM.getDiags().Report(S.getAsmLoc(), diag::err_asm_invalid_type_in_input)
   << InputExpr->getType() << InputConstraint;
 
+// Update largest vector width for any vector types.
+if (auto *VT = dyn_cast(Arg->getType()))
+  LargestVectorWidth = std::max(LargestVectorWidth,
+VT->getPrimitiveSizeInBits());
+
 ArgTypes.push_back(Arg->getType());
 Args.push_back(Arg);
 Constraints += InputConstraint;


Index: test/CodeGen/x86-inline-asm-min-vector-width.c
===
--- /dev/null
+++ test/CodeGen/x86-inline-asm-min-vector-width.c
@@ -0,0 +1,45 @@
+// RUN: %clang_cc1 %s -triple x86_64-unknown-linux-gnu -emit-llvm -target-feature +avx512f -o - | FileCheck %s
+
+typedef long long __m128i __attribute__ ((vector_size (16)));
+typedef long long __m256i __attribute__ ((vector_size (32)));
+typedef long long __m512i __attribute__ ((vector_size (64)));
+
+// CHECK: define <2 x i64> @testXMMout(<

[PATCH] D46892: [X86] Lowering addus/subus intrinsics to native IR (Clang part)

2018-08-13 Thread Craig Topper via Phabricator via cfe-commits
craig.topper accepted this revision.
craig.topper added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rC Clang

https://reviews.llvm.org/D46892



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50815: Establish the header

2018-08-15 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: include/bit:145
+  static_assert(sizeof(unsigned) == 4, "");
+  return __popcnt(__x);
+#endif

How does this work on pre-Haswell X86 CPUs? Doesn't MSVC just blindly emit the 
popcnt instruction when it sees this?


https://reviews.llvm.org/D50815



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50815: Establish the header

2018-08-15 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: include/bit:163
+  static_assert(sizeof(unsigned long long) == 8, "");
+  return __popcnt64(__x);
+#endif

I don't think __popcnt64 exists in MSVC when targeting 32-bit mode.


https://reviews.llvm.org/D50815



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50907: Make __shiftleft128 / __shiftright128 real compiler built-ins.

2018-08-17 Thread Craig Topper via Phabricator via cfe-commits
craig.topper accepted this revision.
craig.topper added a comment.
This revision is now accepted and ready to land.

LGTM


https://reviews.llvm.org/D50907



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50876: Clean up newly created header

2018-08-17 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: include/bit:140
   static_assert(sizeof(unsigned) == 4, "");
   return __popcnt(__x);
 }

MSVC blindly uses the popcnt instruction whenever it sees this intrinsic. So 
this only works on Nehalem and newer Intel CPUs. And Barcelona and newer AMD 
CPUs. This is why llvm uses a bit math version of popcnt for MSVC in 
include/llvm/Support/MathExtras.h



Comment at: include/bit:150
   static_assert(sizeof(unsigned long long) == 8, "");
   return __popcnt64(__x);
 }

This doesn't exist in 32-bit MSVC.


https://reviews.llvm.org/D50876



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D46540: [X86] ptwrite intrinsic

2018-05-08 Thread Craig Topper via Phabricator via cfe-commits
craig.topper accepted this revision.
craig.topper added a comment.
This revision is now accepted and ready to land.

LGTM


https://reviews.llvm.org/D46540



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D46541: [CodeGen] Improve diagnostics related to target attributes

2018-05-08 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: lib/CodeGen/CodeGenFunction.cpp:2346
+  return Feat.substr(1) == F.getKey();
+   }))
+ReqFeatures.insert(ReqFeatures.begin(), F.getKey());

This and the next line are indented funny.


https://reviews.llvm.org/D46541



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D46541: [CodeGen] Improve diagnostics related to target attributes

2018-05-08 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: lib/CodeGen/CodeGenFunction.cpp:2342
   // Only positive features are "required".
-  if (F.getValue())
+  if (F.getValue()) {
+if (std::any_of(ParsedAttr.Features.begin(), ParsedAttr.Features.end(),

Rather than walking the ParsedAttr.Features for each feature in the map. And 
having to shift the ReqFeatures vectors sometimes. How about doing this

-Walk through all features in ParsedAttr, for each feature with a +, query the 
callee feature map. If it's enabled there, push it to ReqFeatures.
-Walk through all features in the callee feature map and if enabled push those.

This will lead to duplicates in the list, but all the explicitly mentioned 
features will be listed first.


https://reviews.llvm.org/D46541



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D46349: [X86] Mark builtins 'const' where possible

2018-05-08 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

Ping


https://reviews.llvm.org/D46349



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D46541: [CodeGen] Improve diagnostics related to target attributes

2018-05-09 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

This looks pretty good to me. @echristo what do you think?


https://reviews.llvm.org/D46541



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D46656: [Builtins] Improve the IR emitted for MSVC compatible rotr/rotl builtins to match what the middle and backends understand

2018-05-09 Thread Craig Topper via Phabricator via cfe-commits
craig.topper created this revision.
craig.topper added a reviewer: spatel.

Currently we emit something like

rotl(x, n) {
n &= bitwidth -1;
return n != 0 ? ((x << n) | (x >> (bitwidth - n)) : x;
}

We use a select to avoid the undefined behavior on the (bitwidth - n) shift.

The middle and backend don't really recognize this as a rotate and end up 
emitting a cmov or control flow because of the select.

A better pattern is (x << (n & mask)) | (x << (-n & mask))  where mask is 
bitwidth - 1.

Fixes the main complaint in PR37387. There's still some work to be done if the 
user writes that sequence directly on a short or char where type promotion 
rules can prevent it from being recognized. The builtin is emitting direct IR 
with unpromoted types so that isn't a problem for it.


https://reviews.llvm.org/D46656

Files:
  lib/CodeGen/CGBuiltin.cpp
  test/CodeGen/ms-intrinsics-rotations.c

Index: test/CodeGen/ms-intrinsics-rotations.c
===
--- test/CodeGen/ms-intrinsics-rotations.c
+++ test/CodeGen/ms-intrinsics-rotations.c
@@ -30,69 +30,64 @@
   return _rotl8(value, shift);
 }
 // CHECK: i8 @test_rotl8
-// CHECK:   [[SHIFT:%[0-9]+]] = and i8 %{{[0-9]+}}, 7
-// CHECK:   [[NEGSHIFT:%[0-9]+]] = sub i8 8, [[SHIFT]]
-// CHECK:   [[HIGH:%[0-9]+]] = shl i8 [[VALUE:%[0-9]+]], [[SHIFT]]
-// CHECK:   [[LOW:%[0-9]+]] = lshr i8 [[VALUE]], [[NEGSHIFT]]
-// CHECK:   [[ROTATED:%[0-9]+]] = or i8 [[HIGH]], [[LOW]]
-// CHECK:   [[ISZERO:%[0-9]+]] = icmp eq i8 [[SHIFT]], 0
-// CHECK:   [[RESULT:%[0-9]+]] = select i1 [[ISZERO]], i8 [[VALUE]], i8 [[ROTATED]]
+// CHECK:   [[LSHIFT:%[0-9]+]] = and i8 [[SHIFT:%[0-9]+]], 7
+// CHECK:   [[HIGH:%[0-9]+]] = shl i8 [[VALUE:%[0-9]+]], [[LSHIFT]]
+// CHECK:   [[NEGATE:%[0-9]+]] = sub i8 0, [[SHIFT]]
+// CHECK:   [[RSHIFT:%[0-9]+]] = and i8 [[NEGATE]], 7
+// CHECK:   [[LOW:%[0-9]+]] = lshr i8 [[VALUE]], [[RSHIFT]]
+// CHECK:   [[RESULT:%[0-9]+]] = or i8 [[HIGH]], [[LOW]]
 // CHECK:   ret i8 [[RESULT]]
 // CHECK  }
 
 unsigned short test_rotl16(unsigned short value, unsigned char shift) {
   return _rotl16(value, shift);
 }
 // CHECK: i16 @test_rotl16
-// CHECK:   [[SHIFT:%[0-9]+]] = and i16 %{{[0-9]+}}, 15
-// CHECK:   [[NEGSHIFT:%[0-9]+]] = sub i16 16, [[SHIFT]]
-// CHECK:   [[HIGH:%[0-9]+]] = shl i16 [[VALUE:%[0-9]+]], [[SHIFT]]
-// CHECK:   [[LOW:%[0-9]+]] = lshr i16 [[VALUE]], [[NEGSHIFT]]
-// CHECK:   [[ROTATED:%[0-9]+]] = or i16 [[HIGH]], [[LOW]]
-// CHECK:   [[ISZERO:%[0-9]+]] = icmp eq i16 [[SHIFT]], 0
-// CHECK:   [[RESULT:%[0-9]+]] = select i1 [[ISZERO]], i16 [[VALUE]], i16 [[ROTATED]]
+// CHECK:   [[LSHIFT:%[0-9]+]] = and i16 [[SHIFT:%[0-9]+]], 15
+// CHECK:   [[HIGH:%[0-9]+]] = shl i16 [[VALUE:%[0-9]+]], [[LSHIFT]]
+// CHECK:   [[NEGATE:%[0-9]+]] = sub i16 0, [[SHIFT]]
+// CHECK:   [[RSHIFT:%[0-9]+]] = and i16 [[NEGATE]], 15
+// CHECK:   [[LOW:%[0-9]+]] = lshr i16 [[VALUE]], [[RSHIFT]]
+// CHECK:   [[RESULT:%[0-9]+]] = or i16 [[HIGH]], [[LOW]]
 // CHECK:   ret i16 [[RESULT]]
 // CHECK  }
 
 unsigned int test_rotl(unsigned int value, int shift) {
   return _rotl(value, shift);
 }
 // CHECK: i32 @test_rotl
-// CHECK:   [[SHIFT:%[0-9]+]] = and i32 %{{[0-9]+}}, 31
-// CHECK:   [[NEGSHIFT:%[0-9]+]] = sub i32 32, [[SHIFT]]
-// CHECK:   [[HIGH:%[0-9]+]] = shl i32 [[VALUE:%[0-9]+]], [[SHIFT]]
-// CHECK:   [[LOW:%[0-9]+]] = lshr i32 [[VALUE]], [[NEGSHIFT]]
-// CHECK:   [[ROTATED:%[0-9]+]] = or i32 [[HIGH]], [[LOW]]
-// CHECK:   [[ISZERO:%[0-9]+]] = icmp eq i32 [[SHIFT]], 0
-// CHECK:   [[RESULT:%[0-9]+]] = select i1 [[ISZERO]], i32 [[VALUE]], i32 [[ROTATED]]
+// CHECK:   [[LSHIFT:%[0-9]+]] = and i32 [[SHIFT:%[0-9]+]], 31
+// CHECK:   [[HIGH:%[0-9]+]] = shl i32 [[VALUE:%[0-9]+]], [[LSHIFT]]
+// CHECK:   [[NEGATE:%[0-9]+]] = sub i32 0, [[SHIFT]]
+// CHECK:   [[RSHIFT:%[0-9]+]] = and i32 [[NEGATE]], 31
+// CHECK:   [[LOW:%[0-9]+]] = lshr i32 [[VALUE]], [[RSHIFT]]
+// CHECK:   [[RESULT:%[0-9]+]] = or i32 [[HIGH]], [[LOW]]
 // CHECK:   ret i32 [[RESULT]]
 // CHECK  }
 
 unsigned LONG test_lrotl(unsigned LONG value, int shift) {
   return _lrotl(value, shift);
 }
 // CHECK-32BIT-LONG: i32 @test_lrotl
-// CHECK-32BIT-LONG:   [[SHIFT:%[0-9]+]] = and i32 %{{[0-9]+}}, 31
-// CHECK-32BIT-LONG:   [[NEGSHIFT:%[0-9]+]] = sub i32 32, [[SHIFT]]
-// CHECK-32BIT-LONG:   [[HIGH:%[0-9]+]] = shl i32 [[VALUE:%[0-9]+]], [[SHIFT]]
-// CHECK-32BIT-LONG:   [[LOW:%[0-9]+]] = lshr i32 [[VALUE]], [[NEGSHIFT]]
-// CHECK-32BIT-LONG:   [[ROTATED:%[0-9]+]] = or i32 [[HIGH]], [[LOW]]
-// CHECK-32BIT-LONG:   [[ISZERO:%[0-9]+]] = icmp eq i32 [[SHIFT]], 0
-// CHECK-32BIT-LONG:   [[RESULT:%[0-9]+]] = select i1 [[ISZERO]], i32 [[VALUE]], i32 [[ROTATED]]
+// CHECK-32BIT-LONG:   [[LSHIFT:%[0-9]+]] = and i32 [[SHIFT:%[0-9]+]], 31
+// CHECK-32BIT-LONG:   [[HIGH:%[0-9]+]] = shl i32 [[VALUE:%[0-9]+]], [[LSHIFT]]
+// CHECK-32BIT-LONG:   [[NEGATE:%[0-9]+]] = sub i32 0, [[SHIFT]]
+// CHECK-32BIT-LONG:   [[RSHIFT:%[0-9]+]] = and i32 [[NEGATE]], 31
+// CHECK-32BIT-LONG:   [[

[PATCH] D46683: [X86] Assume alignment of movdir64b dst argument

2018-05-10 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

What effect does this have?


Repository:
  rC Clang

https://reviews.llvm.org/D46683



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D46683: [X86] Assume alignment of movdir64b dst argument

2018-05-10 Thread Craig Topper via Phabricator via cfe-commits
craig.topper accepted this revision.
craig.topper added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rC Clang

https://reviews.llvm.org/D46683



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D46742: [X86] Use __builtin_convertvector to replace some of the avx512 truncate builtins.

2018-05-10 Thread Craig Topper via Phabricator via cfe-commits
craig.topper created this revision.
craig.topper added reviewers: RKSimon, GBuella, tkrupa.

As long as the destination type is a 256 or 128 bit vector we can use 
__builtin_convertvector to directly generate trunc IR instruction which will be 
handled natively by the backend.


Repository:
  rC Clang

https://reviews.llvm.org/D46742

Files:
  include/clang/Basic/BuiltinsX86.def
  lib/Headers/avx512bwintrin.h
  lib/Headers/avx512fintrin.h
  lib/Headers/avx512vlbwintrin.h
  lib/Headers/avx512vlintrin.h
  test/CodeGen/avx512bw-builtins.c
  test/CodeGen/avx512f-builtins.c
  test/CodeGen/avx512vl-builtins.c
  test/CodeGen/avx512vlbw-builtins.c

Index: test/CodeGen/avx512vlbw-builtins.c
===
--- test/CodeGen/avx512vlbw-builtins.c
+++ test/CodeGen/avx512vlbw-builtins.c
@@ -1804,19 +1804,21 @@
 
 __m128i test_mm256_cvtepi16_epi8(__m256i __A) {
   // CHECK-LABEL: @test_mm256_cvtepi16_epi8
-  // CHECK: @llvm.x86.avx512.mask.pmov.wb.256
+  // CHECK: trunc <16 x i16> %{{.*}} to <16 x i8>
   return _mm256_cvtepi16_epi8(__A); 
 }
 
 __m128i test_mm256_mask_cvtepi16_epi8(__m128i __O, __mmask16 __M, __m256i __A) {
   // CHECK-LABEL: @test_mm256_mask_cvtepi16_epi8
-  // CHECK: @llvm.x86.avx512.mask.pmov.wb.256
+  // CHECK: trunc <16 x i16> %{{.*}} to <16 x i8>
+  // CHECK: select <16 x i1> %{{.*}}, <16 x i8> %{{.*}}, <16 x i8> %{{.*}}
   return _mm256_mask_cvtepi16_epi8(__O, __M, __A); 
 }
 
 __m128i test_mm256_maskz_cvtepi16_epi8(__mmask16 __M, __m256i __A) {
   // CHECK-LABEL: @test_mm256_maskz_cvtepi16_epi8
-  // CHECK: @llvm.x86.avx512.mask.pmov.wb.256
+  // CHECK: trunc <16 x i16> %{{.*}} to <16 x i8>
+  // CHECK: select <16 x i1> %{{.*}}, <16 x i8> %{{.*}}, <16 x i8> %{{.*}}
   return _mm256_maskz_cvtepi16_epi8(__M, __A); 
 }
 
Index: test/CodeGen/avx512vl-builtins.c
===
--- test/CodeGen/avx512vl-builtins.c
+++ test/CodeGen/avx512vl-builtins.c
@@ -6577,19 +6577,21 @@
 
 __m128i test_mm256_cvtepi32_epi16(__m256i __A) {
   // CHECK-LABEL: @test_mm256_cvtepi32_epi16
-  // CHECK: @llvm.x86.avx512.mask.pmov.dw.256
+  // CHECK: trunc <8 x i32> %{{.*}} to <8 x i16>
   return _mm256_cvtepi32_epi16(__A); 
 }
 
 __m128i test_mm256_mask_cvtepi32_epi16(__m128i __O, __mmask8 __M, __m256i __A) {
   // CHECK-LABEL: @test_mm256_mask_cvtepi32_epi16
-  // CHECK: @llvm.x86.avx512.mask.pmov.dw.256
+  // CHECK: trunc <8 x i32> %{{.*}} to <8 x i16>
+  // CHECK: select <8 x i1> %{{.*}}, <8 x i16> %{{.*}}, <8 x i16> %{{.*}}
   return _mm256_mask_cvtepi32_epi16(__O, __M, __A); 
 }
 
 __m128i test_mm256_maskz_cvtepi32_epi16(__mmask8 __M, __m256i __A) {
   // CHECK-LABEL: @test_mm256_maskz_cvtepi32_epi16
-  // CHECK: @llvm.x86.avx512.mask.pmov.dw.256
+  // CHECK: trunc <8 x i32> %{{.*}} to <8 x i16>
+  // CHECK: select <8 x i1> %{{.*}}, <8 x i16> %{{.*}}, <8 x i16> %{{.*}}
   return _mm256_maskz_cvtepi32_epi16(__M, __A); 
 }
 
@@ -6673,19 +6675,21 @@
 
 __m128i test_mm256_cvtepi64_epi32(__m256i __A) {
   // CHECK-LABEL: @test_mm256_cvtepi64_epi32
-  // CHECK: @llvm.x86.avx512.mask.pmov.qd.256
+  // CHECK: trunc <4 x i64> %{{.*}} to <4 x i32>
   return _mm256_cvtepi64_epi32(__A); 
 }
 
 __m128i test_mm256_mask_cvtepi64_epi32(__m128i __O, __mmask8 __M, __m256i __A) {
   // CHECK-LABEL: @test_mm256_mask_cvtepi64_epi32
-  // CHECK: @llvm.x86.avx512.mask.pmov.qd.256
+  // CHECK: trunc <4 x i64> %{{.*}} to <4 x i32>
+  // CHECK: select <4 x i1> %{{.*}}, <4 x i32> %{{.*}}, <4 x i32> %{{.*}}
   return _mm256_mask_cvtepi64_epi32(__O, __M, __A); 
 }
 
 __m128i test_mm256_maskz_cvtepi64_epi32(__mmask8 __M, __m256i __A) {
   // CHECK-LABEL: @test_mm256_maskz_cvtepi64_epi32
-  // CHECK: @llvm.x86.avx512.mask.pmov.qd.256
+  // CHECK: trunc <4 x i64> %{{.*}} to <4 x i32>
+  // CHECK: select <4 x i1> %{{.*}}, <4 x i32> %{{.*}}, <4 x i32> %{{.*}}
   return _mm256_maskz_cvtepi64_epi32(__M, __A); 
 }
 
Index: test/CodeGen/avx512f-builtins.c
===
--- test/CodeGen/avx512f-builtins.c
+++ test/CodeGen/avx512f-builtins.c
@@ -5102,19 +5102,21 @@
 
 __m128i test_mm512_cvtepi32_epi8(__m512i __A) {
   // CHECK-LABEL: @test_mm512_cvtepi32_epi8
-  // CHECK: @llvm.x86.avx512.mask.pmov.db.512
+  // CHECK: trunc <16 x i32> %{{.*}} to <16 x i8>
   return _mm512_cvtepi32_epi8(__A); 
 }
 
 __m128i test_mm512_mask_cvtepi32_epi8(__m128i __O, __mmask16 __M, __m512i __A) {
   // CHECK-LABEL: @test_mm512_mask_cvtepi32_epi8
-  // CHECK: @llvm.x86.avx512.mask.pmov.db.512
+  // CHECK: trunc <16 x i32> %{{.*}} to <16 x i8>
+  // CHECK: select <16 x i1> %{{.*}}, <16 x i8> %{{.*}}, <16 x i8> %{{.*}}
   return _mm512_mask_cvtepi32_epi8(__O, __M, __A); 
 }
 
 __m128i test_mm512_maskz_cvtepi32_epi8(__mmask16 __M, __m512i __A) {
   // CHECK-LABEL: @test_mm512_maskz_cvtepi32_epi8
-  // CHECK: @llvm.x86.avx512.mask.pmov.db.512
+  // CHECK: trunc <16 x i32> %{{.*}} to <16 x i8>
+  // CHECK: select <16 x i1> %

[PATCH] D46742: [X86] Use __builtin_convertvector to replace some of the avx512 truncate builtins.

2018-05-13 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

Yeah the others will need codegen work. So I'm starting with the easy cases.


Repository:
  rC Clang

https://reviews.llvm.org/D46742



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D46863: [X86] Use __builtin_convertvector to implement some of the packed integer to packed flow conversion intrinsics.

2018-05-14 Thread Craig Topper via Phabricator via cfe-commits
craig.topper created this revision.
craig.topper added reviewers: RKSimon, spatel.

I believe this is safe assuming default rounding mode. The conversion might be 
inexact, but it can never overflow the FP type so this shouldn't be undefined 
behavior for the uitofp/sitofp instructions.

We already do something similar for scalar conversions.


Repository:
  rC Clang

https://reviews.llvm.org/D46863

Files:
  include/clang/Basic/BuiltinsX86.def
  lib/Headers/avx512dqintrin.h
  lib/Headers/avx512fintrin.h
  lib/Headers/avx512vldqintrin.h
  lib/Headers/avx512vlintrin.h
  lib/Headers/avxintrin.h
  lib/Headers/emmintrin.h
  test/CodeGen/avx-builtins.c
  test/CodeGen/avx512dq-builtins.c
  test/CodeGen/avx512f-builtins.c
  test/CodeGen/avx512vl-builtins.c
  test/CodeGen/avx512vldq-builtins.c
  test/CodeGen/builtins-x86.c
  test/CodeGen/sse2-builtins.c

Index: test/CodeGen/sse2-builtins.c
===
--- test/CodeGen/sse2-builtins.c
+++ test/CodeGen/sse2-builtins.c
@@ -468,7 +468,7 @@
 
 __m128 test_mm_cvtepi32_ps(__m128i A) {
   // CHECK-LABEL: test_mm_cvtepi32_ps
-  // CHECK: call <4 x float> @llvm.x86.sse2.cvtdq2ps(<4 x i32> %{{.*}})
+  // CHECK: sitofp <4 x i32> %{{.*}} to <4 x float>
   return _mm_cvtepi32_ps(A);
 }
 
Index: test/CodeGen/builtins-x86.c
===
--- test/CodeGen/builtins-x86.c
+++ test/CodeGen/builtins-x86.c
@@ -338,7 +338,6 @@
   tmp_V2LLi = __builtin_ia32_psadbw128(tmp_V16c, tmp_V16c);
   tmp_V2d = __builtin_ia32_sqrtpd(tmp_V2d);
   tmp_V2d = __builtin_ia32_sqrtsd(tmp_V2d);
-  tmp_V4f = __builtin_ia32_cvtdq2ps(tmp_V4i);
   tmp_V2LLi = __builtin_ia32_cvtpd2dq(tmp_V2d);
   tmp_V2i = __builtin_ia32_cvtpd2pi(tmp_V2d);
   tmp_V4f = __builtin_ia32_cvtpd2ps(tmp_V2d);
Index: test/CodeGen/avx512vldq-builtins.c
===
--- test/CodeGen/avx512vldq-builtins.c
+++ test/CodeGen/avx512vldq-builtins.c
@@ -421,37 +421,41 @@
 
 __m128d test_mm_cvtepi64_pd(__m128i __A) {
   // CHECK-LABEL: @test_mm_cvtepi64_pd
-  // CHECK: @llvm.x86.avx512.mask.cvtqq2pd.128
+  // CHECK: sitofp <2 x i64> %{{.*}} to <2 x double>
   return _mm_cvtepi64_pd(__A); 
 }
 
 __m128d test_mm_mask_cvtepi64_pd(__m128d __W, __mmask8 __U, __m128i __A) {
   // CHECK-LABEL: @test_mm_mask_cvtepi64_pd
-  // CHECK: @llvm.x86.avx512.mask.cvtqq2pd.128
+  // CHECK: sitofp <2 x i64> %{{.*}} to <2 x double>
+  // CHECK: select <2 x i1> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}}
   return _mm_mask_cvtepi64_pd(__W, __U, __A); 
 }
 
 __m128d test_mm_maskz_cvtepi64_pd(__mmask8 __U, __m128i __A) {
   // CHECK-LABEL: @test_mm_maskz_cvtepi64_pd
-  // CHECK: @llvm.x86.avx512.mask.cvtqq2pd.128
+  // CHECK: sitofp <2 x i64> %{{.*}} to <2 x double>
+  // CHECK: select <2 x i1> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}}
   return _mm_maskz_cvtepi64_pd(__U, __A); 
 }
 
 __m256d test_mm256_cvtepi64_pd(__m256i __A) {
   // CHECK-LABEL: @test_mm256_cvtepi64_pd
-  // CHECK: @llvm.x86.avx512.mask.cvtqq2pd.256
+  // CHECK: sitofp <4 x i64> %{{.*}} to <4 x double>
   return _mm256_cvtepi64_pd(__A); 
 }
 
 __m256d test_mm256_mask_cvtepi64_pd(__m256d __W, __mmask8 __U, __m256i __A) {
   // CHECK-LABEL: @test_mm256_mask_cvtepi64_pd
-  // CHECK: @llvm.x86.avx512.mask.cvtqq2pd.256
+  // CHECK: sitofp <4 x i64> %{{.*}} to <4 x double>
+  // CHECK: select <4 x i1> %{{.*}}, <4 x double> %{{.*}}, <4 x double> %{{.*}}
   return _mm256_mask_cvtepi64_pd(__W, __U, __A); 
 }
 
 __m256d test_mm256_maskz_cvtepi64_pd(__mmask8 __U, __m256i __A) {
   // CHECK-LABEL: @test_mm256_maskz_cvtepi64_pd
-  // CHECK: @llvm.x86.avx512.mask.cvtqq2pd.256
+  // CHECK: sitofp <4 x i64> %{{.*}} to <4 x double>
+  // CHECK: select <4 x i1> %{{.*}}, <4 x double> %{{.*}}, <4 x double> %{{.*}}
   return _mm256_maskz_cvtepi64_pd(__U, __A); 
 }
 
@@ -637,37 +641,41 @@
 
 __m128d test_mm_cvtepu64_pd(__m128i __A) {
   // CHECK-LABEL: @test_mm_cvtepu64_pd
-  // CHECK: @llvm.x86.avx512.mask.cvtuqq2pd.128
+  // CHECK: uitofp <2 x i64> %{{.*}} to <2 x double>
   return _mm_cvtepu64_pd(__A); 
 }
 
 __m128d test_mm_mask_cvtepu64_pd(__m128d __W, __mmask8 __U, __m128i __A) {
   // CHECK-LABEL: @test_mm_mask_cvtepu64_pd
-  // CHECK: @llvm.x86.avx512.mask.cvtuqq2pd.128
+  // CHECK: uitofp <2 x i64> %{{.*}} to <2 x double>
+  // CHECK: select <2 x i1> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}}
   return _mm_mask_cvtepu64_pd(__W, __U, __A); 
 }
 
 __m128d test_mm_maskz_cvtepu64_pd(__mmask8 __U, __m128i __A) {
   // CHECK-LABEL: @test_mm_maskz_cvtepu64_pd
-  // CHECK: @llvm.x86.avx512.mask.cvtuqq2pd.128
+  // CHECK: uitofp <2 x i64> %{{.*}} to <2 x double>
+  // CHECK: select <2 x i1> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}}
   return _mm_maskz_cvtepu64_pd(__U, __A); 
 }
 
 __m256d test_mm256_cvtepu64_pd(__m256i __A) {
   // CHECK-LABEL: @test_mm256_cvtepu64_pd
-  // CHECK: @llvm.x86.avx512.mask.cvtuqq2pd.256
+  //

[PATCH] D46541: [CodeGen] Improve diagnostics related to target attributes

2018-05-15 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

Ping @echristo


https://reviews.llvm.org/D46541



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D46881: [X86][CET] Changing -fcf-protection behavior to comply with gcc (clang part)

2018-05-15 Thread Craig Topper via Phabricator via cfe-commits
craig.topper accepted this revision.
craig.topper added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rC Clang

https://reviews.llvm.org/D46881



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D46881: [X86][CET] Changing -fcf-protection behavior to comply with gcc (clang part)

2018-05-17 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

LGTM


https://reviews.llvm.org/D46881



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47029: [X86] Remove some preprocessor feature checks from intrinsic headers

2018-05-17 Thread Craig Topper via Phabricator via cfe-commits
craig.topper created this revision.
craig.topper added reviewers: echristo, RKSimon, spatel.

These look to be a couple things that weren't remvoed when we switched to 
target attribute.

The popcnt makes including just smmintrin.h also include popcntintrin.h. The 
popcnt file itself already contains target attrributes.

The prefetch ones are just wrappers around __builtin_prefetch which we have 
graceful fallbacks for in the backend if the exact instruction isn't available. 
So there's no reason to hide them. And it makes them available in functions 
that have the write target attribute but not a -march command line flag.


https://reviews.llvm.org/D47029

Files:
  lib/Headers/prfchwintrin.h
  lib/Headers/smmintrin.h


Index: lib/Headers/smmintrin.h
===
--- lib/Headers/smmintrin.h
+++ lib/Headers/smmintrin.h
@@ -2458,8 +2458,6 @@
 
 #undef __DEFAULT_FN_ATTRS
 
-#ifdef __POPCNT__
 #include 
-#endif
 
 #endif /* __SMMINTRIN_H */
Index: lib/Headers/prfchwintrin.h
===
--- lib/Headers/prfchwintrin.h
+++ lib/Headers/prfchwintrin.h
@@ -28,7 +28,6 @@
 #ifndef __PRFCHWINTRIN_H
 #define __PRFCHWINTRIN_H
 
-#if defined(__PRFCHW__) || defined(__3dNOW__)
 /// Loads a memory sequence containing the specified memory address into
 ///all data cache levels. The cache-coherency state is set to exclusive.
 ///Data can be read from and written to the cache line without additional
@@ -66,6 +65,5 @@
 {
   __builtin_prefetch (__P, 1, 3 /* _MM_HINT_T0 */);
 }
-#endif
 
 #endif /* __PRFCHWINTRIN_H */


Index: lib/Headers/smmintrin.h
===
--- lib/Headers/smmintrin.h
+++ lib/Headers/smmintrin.h
@@ -2458,8 +2458,6 @@
 
 #undef __DEFAULT_FN_ATTRS
 
-#ifdef __POPCNT__
 #include 
-#endif
 
 #endif /* __SMMINTRIN_H */
Index: lib/Headers/prfchwintrin.h
===
--- lib/Headers/prfchwintrin.h
+++ lib/Headers/prfchwintrin.h
@@ -28,7 +28,6 @@
 #ifndef __PRFCHWINTRIN_H
 #define __PRFCHWINTRIN_H
 
-#if defined(__PRFCHW__) || defined(__3dNOW__)
 /// Loads a memory sequence containing the specified memory address into
 ///all data cache levels. The cache-coherency state is set to exclusive.
 ///Data can be read from and written to the cache line without additional
@@ -66,6 +65,5 @@
 {
   __builtin_prefetch (__P, 1, 3 /* _MM_HINT_T0 */);
 }
-#endif
 
 #endif /* __PRFCHWINTRIN_H */
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47125: [X86] Remove masking from pternlog llvm intrinsics and use a select instruction instead.

2018-05-20 Thread Craig Topper via Phabricator via cfe-commits
craig.topper created this revision.
craig.topper added reviewers: RKSimon, spatel, GBuella.

Because the intrinsics in the headers are implemented as macros, we can't just 
use a select builtin and pternlog builtin. This would require one of the macro 
arguments to be used twice. Depending on what was passed to the macro we could 
expand an expression twice leading to weird behavior. We could maybe declare 
our local variable in the macro, but that would need to worry about name 
collisions.

To avoid that just generate IR directly in CGBuiltin.cpp.


Repository:
  rC Clang

https://reviews.llvm.org/D47125

Files:
  lib/CodeGen/CGBuiltin.cpp
  test/CodeGen/avx512f-builtins.c
  test/CodeGen/avx512vl-builtins.c

Index: test/CodeGen/avx512vl-builtins.c
===
--- test/CodeGen/avx512vl-builtins.c
+++ test/CodeGen/avx512vl-builtins.c
@@ -5604,73 +5604,81 @@
 
 __m128i test_mm_ternarylogic_epi32(__m128i __A, __m128i __B, __m128i __C) {
   // CHECK-LABEL: @test_mm_ternarylogic_epi32
-  // CHECK: @llvm.x86.avx512.mask.pternlog.d.128
+  // CHECK: @llvm.x86.avx512.pternlog.d.128
   return _mm_ternarylogic_epi32(__A, __B, __C, 4); 
 }
 
 __m128i test_mm_mask_ternarylogic_epi32(__m128i __A, __mmask8 __U, __m128i __B, __m128i __C) {
   // CHECK-LABEL: @test_mm_mask_ternarylogic_epi32
-  // CHECK: @llvm.x86.avx512.mask.pternlog.d.128
+  // CHECK: @llvm.x86.avx512.pternlog.d.128
+  // CHECK: select <4 x i1> %{{.*}}, <4 x i32> %{{.*}}, <4 x i32> %{{.*}}
   return _mm_mask_ternarylogic_epi32(__A, __U, __B, __C, 4); 
 }
 
 __m128i test_mm_maskz_ternarylogic_epi32(__mmask8 __U, __m128i __A, __m128i __B, __m128i __C) {
   // CHECK-LABEL: @test_mm_maskz_ternarylogic_epi32
-  // CHECK: @llvm.x86.avx512.maskz.pternlog.d.128
+  // CHECK: @llvm.x86.avx512.pternlog.d.128
+  // CHECK: select <4 x i1> %{{.*}}, <4 x i32> %{{.*}}, <4 x i32> zeroinitializer
   return _mm_maskz_ternarylogic_epi32(__U, __A, __B, __C, 4); 
 }
 
 __m256i test_mm256_ternarylogic_epi32(__m256i __A, __m256i __B, __m256i __C) {
   // CHECK-LABEL: @test_mm256_ternarylogic_epi32
-  // CHECK: @llvm.x86.avx512.mask.pternlog.d.256
+  // CHECK: @llvm.x86.avx512.pternlog.d.256
   return _mm256_ternarylogic_epi32(__A, __B, __C, 4); 
 }
 
 __m256i test_mm256_mask_ternarylogic_epi32(__m256i __A, __mmask8 __U, __m256i __B, __m256i __C) {
   // CHECK-LABEL: @test_mm256_mask_ternarylogic_epi32
-  // CHECK: @llvm.x86.avx512.mask.pternlog.d.256
+  // CHECK: @llvm.x86.avx512.pternlog.d.256
+  // CHECK: select <8 x i1> %{{.*}}, <8 x i32> %{{.*}}, <8 x i32> %{{.*}}
   return _mm256_mask_ternarylogic_epi32(__A, __U, __B, __C, 4); 
 }
 
 __m256i test_mm256_maskz_ternarylogic_epi32(__mmask8 __U, __m256i __A, __m256i __B, __m256i __C) {
   // CHECK-LABEL: @test_mm256_maskz_ternarylogic_epi32
-  // CHECK: @llvm.x86.avx512.maskz.pternlog.d.256
+  // CHECK: @llvm.x86.avx512.pternlog.d.256
+  // CHECK: select <8 x i1> %{{.*}}, <8 x i32> %{{.*}}, <8 x i32> zeroinitializer
   return _mm256_maskz_ternarylogic_epi32(__U, __A, __B, __C, 4); 
 }
 
 __m128i test_mm_ternarylogic_epi64(__m128i __A, __m128i __B, __m128i __C) {
   // CHECK-LABEL: @test_mm_ternarylogic_epi64
-  // CHECK: @llvm.x86.avx512.mask.pternlog.q.128
+  // CHECK: @llvm.x86.avx512.pternlog.q.128
   return _mm_ternarylogic_epi64(__A, __B, __C, 4); 
 }
 
 __m128i test_mm_mask_ternarylogic_epi64(__m128i __A, __mmask8 __U, __m128i __B, __m128i __C) {
   // CHECK-LABEL: @test_mm_mask_ternarylogic_epi64
-  // CHECK: @llvm.x86.avx512.mask.pternlog.q.128
+  // CHECK: @llvm.x86.avx512.pternlog.q.128
+  // CHECK: select <2 x i1> %{{.*}}, <2 x i64> %{{.*}}, <2 x i64> %{{.*}}
   return _mm_mask_ternarylogic_epi64(__A, __U, __B, __C, 4); 
 }
 
 __m128i test_mm_maskz_ternarylogic_epi64(__mmask8 __U, __m128i __A, __m128i __B, __m128i __C) {
   // CHECK-LABEL: @test_mm_maskz_ternarylogic_epi64
-  // CHECK: @llvm.x86.avx512.maskz.pternlog.q.128
+  // CHECK: @llvm.x86.avx512.pternlog.q.128
+  // CHECK: select <2 x i1> %{{.*}}, <2 x i64> %{{.*}}, <2 x i64> zeroinitializer
   return _mm_maskz_ternarylogic_epi64(__U, __A, __B, __C, 4); 
 }
 
 __m256i test_mm256_ternarylogic_epi64(__m256i __A, __m256i __B, __m256i __C) {
   // CHECK-LABEL: @test_mm256_ternarylogic_epi64
-  // CHECK: @llvm.x86.avx512.mask.pternlog.q.256
+  // CHECK: @llvm.x86.avx512.pternlog.q.256
   return _mm256_ternarylogic_epi64(__A, __B, __C, 4); 
 }
 
 __m256i test_mm256_mask_ternarylogic_epi64(__m256i __A, __mmask8 __U, __m256i __B, __m256i __C) {
   // CHECK-LABEL: @test_mm256_mask_ternarylogic_epi64
-  // CHECK: @llvm.x86.avx512.mask.pternlog.q.256
+  // CHECK: @llvm.x86.avx512.pternlog.q.256
+  // CHECK: select <4 x i1> %{{.*}}, <4 x i64> %{{.*}}, <4 x i64> %{{.*}}
   return _mm256_mask_ternarylogic_epi64(__A, __U, __B, __C, 4); 
 }
 
 __m256i test_mm256_maskz_ternarylogic_epi64(__mmask8 __U, __m256i __A, __m256i __B, __m256i __C) {
   // CHECK-LABEL: @test_mm256_maskz_ternarylogic_epi64
-  // CHECK: @llvm.x86.a

[PATCH] D47125: [X86] Remove masking from pternlog llvm intrinsics and use a select instruction instead.

2018-05-21 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

Because the builtins take one of the arguments as an immediate, they must be 
implemented as macros. This was the frontend can verify that it's an imediate 
or a constant expression.


Repository:
  rC Clang

https://reviews.llvm.org/D47125



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47142: [x86] invpcid intrinsic

2018-05-21 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: lib/Headers/cpuid.h:158
 #define bit_BMI20x0100
+#define bit_INVCPID 0x0400
 #define bit_ENH_MOVSB   0x0200

this should be below ENH_MOVSB to keep the bits in order



Comment at: lib/Headers/intrin.h:196
+ */
 void __cdecl _invpcid(unsigned int, void *);
+#endif

@rnk, what's the right thing to do here?


Repository:
  rC Clang

https://reviews.llvm.org/D47142



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D46541: [CodeGen] Improve diagnostics related to target attributes

2018-05-21 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

I think you can pass StringRef(F).substr(1). That won't create a temporary 
string. It will just create a StringRef pointing into the middle of an existing 
std::string stored in the parsed attributes.


https://reviews.llvm.org/D46541



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D46863: [X86] Use __builtin_convertvector to implement some of the packed integer to packed float conversion intrinsics.

2018-05-21 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

So I think we've covered the whether this is ok to do questions. If someone can 
double check signed/unsigned and vector element sizes are all correct and 
approve this that would be great.


Repository:
  rC Clang

https://reviews.llvm.org/D46863



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47174: [X86] Move 128-bit f16c intrinsics to __emmintrin_f16c.h include from emmintrin.h. Move 256-bit f16c intrinsics back to f16cintrin.h

2018-05-21 Thread Craig Topper via Phabricator via cfe-commits
craig.topper created this revision.
craig.topper added reviewers: RKSimon, spatel, echristo, DavidKreitzer.

Intel documents the 128-bit versions as being in emmintrin.h and the 256-bit 
version as being in immintrin.h.

This patch makes a new __emmtrin_f16c.h to hold the 128-bit versions to be 
included from emmintrin.h. And makes the existing f16cintrin.h contain the 
256-bit versions and include it from immintrin.h with an error if its included 
directly.


Repository:
  rC Clang

https://reviews.llvm.org/D47174

Files:
  lib/Headers/__emmintrin_f16c.h
  lib/Headers/emmintrin.h
  lib/Headers/f16cintrin.h
  lib/Headers/immintrin.h

Index: lib/Headers/immintrin.h
===
--- lib/Headers/immintrin.h
+++ lib/Headers/immintrin.h
@@ -69,54 +69,8 @@
 #if !defined(_MSC_VER) || __has_feature(modules) || defined(__AVX2__)
 #include 
 
-/* The 256-bit versions of functions in f16cintrin.h.
-   Intel documents these as being in immintrin.h, and
-   they depend on typedefs from avxintrin.h. */
-
-/// Converts a 256-bit vector of [8 x float] into a 128-bit vector
-///containing 16-bit half-precision float values.
-///
-/// \headerfile 
-///
-/// \code
-/// __m128i _mm256_cvtps_ph(__m256 a, const int imm);
-/// \endcode
-///
-/// This intrinsic corresponds to the  VCVTPS2PH  instruction.
-///
-/// \param a
-///A 256-bit vector containing 32-bit single-precision float values to be
-///converted to 16-bit half-precision float values.
-/// \param imm
-///An immediate value controlling rounding using bits [2:0]: \n
-///000: Nearest \n
-///001: Down \n
-///010: Up \n
-///011: Truncate \n
-///1XX: Use MXCSR.RC for rounding
-/// \returns A 128-bit vector containing the converted 16-bit half-precision
-///float values.
-#define _mm256_cvtps_ph(a, imm) __extension__ ({ \
- (__m128i)__builtin_ia32_vcvtps2ph256((__v8sf)(__m256)(a), (imm)); })
-
-/// Converts a 128-bit vector containing 16-bit half-precision float
-///values into a 256-bit vector of [8 x float].
-///
-/// \headerfile 
-///
-/// This intrinsic corresponds to the  VCVTPH2PS  instruction.
-///
-/// \param __a
-///A 128-bit vector containing 16-bit half-precision float values to be
-///converted to 32-bit single-precision float values.
-/// \returns A vector of [8 x float] containing the converted 32-bit
-///single-precision float values.
-static __inline __m256 __attribute__((__always_inline__, __nodebug__, __target__("f16c")))
-_mm256_cvtph_ps(__m128i __a)
-{
-  return (__m256)__builtin_ia32_vcvtph2ps256((__v8hi)__a);
-}
-#endif /* __AVX2__ */
+#if !defined(_MSC_VER) || __has_feature(modules) || defined(__F16C__)
+#include 
 
 #if !defined(_MSC_VER) || __has_feature(modules) || defined(__VPCLMULQDQ__)
 #include 
Index: lib/Headers/emmintrin.h
===
--- lib/Headers/emmintrin.h
+++ lib/Headers/emmintrin.h
@@ -44,7 +44,7 @@
  * appear in the interface though. */
 typedef signed char __v16qs __attribute__((__vector_size__(16)));
 
-#include 
+#include <__emmintrin_f16c.h>
 
 /* Define the default attributes for the functions in this file. */
 #define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__, __target__("sse2")))
Index: lib/Headers/f16cintrin.h
===
--- lib/Headers/f16cintrin.h
+++ lib/Headers/f16cintrin.h
@@ -21,8 +21,8 @@
  *===---===
  */
 
-#if !defined __X86INTRIN_H && !defined __EMMINTRIN_H && !defined __IMMINTRIN_H
-#error "Never use  directly; include  instead."
+#if !defined __IMMINTRIN_H
+#error "Never use  directly; include  instead."
 #endif
 
 #ifndef __F16CINTRIN_H
@@ -32,91 +32,52 @@
 #define __DEFAULT_FN_ATTRS \
   __attribute__((__always_inline__, __nodebug__, __target__("f16c")))
 
-/// Converts a 16-bit half-precision float value into a 32-bit float
-///value.
-///
-/// \headerfile 
-///
-/// This intrinsic corresponds to the  VCVTPH2PS  instruction.
-///
-/// \param __a
-///A 16-bit half-precision float value.
-/// \returns The converted 32-bit float value.
-static __inline float __DEFAULT_FN_ATTRS
-_cvtsh_ss(unsigned short __a)
-{
-  __v8hi v = {(short)__a, 0, 0, 0, 0, 0, 0, 0};
-  __v4sf r = __builtin_ia32_vcvtph2ps(v);
-  return r[0];
-}
-
-/// Converts a 32-bit single-precision float value to a 16-bit
-///half-precision float value.
-///
-/// \headerfile 
-///
-/// \code
-/// unsigned short _cvtss_sh(float a, const int imm);
-/// \endcode
-///
-/// This intrinsic corresponds to the  VCVTPS2PH  instruction.
-///
-/// \param a
-///A 32-bit single-precision float value to be converted to a 16-bit
-///half-precision float value.
-/// \param imm
-///An immediate value controlling rounding using bits [2:0]: \n
-///000: Nearest \n
-///001: Down \n
-///010: Up \n
-///011: Truncate \n
-///

[PATCH] D47174: [X86] Move 128-bit f16c intrinsics to __emmintrin_f16c.h include from emmintrin.h. Move 256-bit f16c intrinsics back to f16cintrin.h

2018-05-21 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: lib/Headers/immintrin.h:72
 
-/* The 256-bit versions of functions in f16cintrin.h.
-   Intel documents these as being in immintrin.h, and

Interesting this to note here, the 256-bit f16c intrinsics were being guarded 
by __AVX2__ when MSC_VER was defined and modules weren't supported. This was 
definitely incorrect.


Repository:
  rC Clang

https://reviews.llvm.org/D47174



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47182: [X86] Move all Intel defined intrinsic includes into immintrin.h

2018-05-21 Thread Craig Topper via Phabricator via cfe-commits
craig.topper created this revision.
craig.topper added reviewers: DavidKreitzer, echristo, RKSimon, rnk.

This matches the Intel documentation which shows them available by importing 
immintrin.h. x86intrin.h also includes immintrin.h so anyone including 
x86intrin.h will still get them.

This is different than gcc, but I don't think we were a perfect match there 
already. I'm unclear what gcc's policy is about how they choose which to add 
things to.


Repository:
  rC Clang

https://reviews.llvm.org/D47182

Files:
  lib/Headers/bmi2intrin.h
  lib/Headers/bmiintrin.h
  lib/Headers/cldemoteintrin.h
  lib/Headers/clzerointrin.h
  lib/Headers/immintrin.h
  lib/Headers/lzcntintrin.h
  lib/Headers/movdirintrin.h
  lib/Headers/pconfigintrin.h
  lib/Headers/ptwriteintrin.h
  lib/Headers/rdseedintrin.h
  lib/Headers/sgxintrin.h
  lib/Headers/waitpkgintrin.h
  lib/Headers/wbnoinvdintrin.h
  lib/Headers/x86intrin.h

Index: lib/Headers/x86intrin.h
===
--- lib/Headers/x86intrin.h
+++ lib/Headers/x86intrin.h
@@ -32,26 +32,6 @@
 #include 
 #endif
 
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__BMI__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__BMI2__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__LZCNT__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__POPCNT__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__RDSEED__)
-#include 
-#endif
-
 #if !defined(_MSC_VER) || __has_feature(modules) || defined(__PRFCHW__)
 #include 
 #endif
@@ -76,45 +56,8 @@
 #include 
 #endif
 
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__F16C__)
-#include 
-#endif
-
 #if !defined(_MSC_VER) || __has_feature(modules) || defined(__MWAITX__)
 #include 
 #endif
 
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__CLZERO__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__WBNOINVD__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__CLDEMOTE__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__WAITPKG__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || \
-  defined(__MOVDIRI__) || defined(__MOVDIR64B__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__PCONFIG__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__SGX__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__PTWRITE__)
-#include 
-#endif
-
 #endif /* __X86INTRIN_H */
Index: lib/Headers/wbnoinvdintrin.h
===
--- lib/Headers/wbnoinvdintrin.h
+++ lib/Headers/wbnoinvdintrin.h
@@ -21,8 +21,8 @@
  *===---===
  */
 
-#ifndef __X86INTRIN_H
-#error "Never use  directly; include  instead."
+#ifndef __IMMINTRIN_H
+#error "Never use  directly; include  instead."
 #endif
 
 #ifndef __WBNOINVDINTRIN_H
Index: lib/Headers/waitpkgintrin.h
===
--- lib/Headers/waitpkgintrin.h
+++ lib/Headers/waitpkgintrin.h
@@ -20,8 +20,8 @@
  *
  *===---===
  */
-#ifndef __X86INTRIN_H
-#error "Never use  directly; include  instead."
+#ifndef __IMMINTRIN_H
+#error "Never use  directly; include  instead."
 #endif
 
 #ifndef __WAITPKGINTRIN_H
Index: lib/Headers/sgxintrin.h
===
--- lib/Headers/sgxintrin.h
+++ lib/Headers/sgxintrin.h
@@ -21,8 +21,8 @@
  *===---===
  */
 
-#ifndef __X86INTRIN_H
-#error "Never use  directly; include  instead."
+#ifndef __IMMINTRIN_H
+#error "Never use  directly; include  instead."
 #endif
 
 #ifndef __SGXINTRIN_H
Index: lib/Headers/rdseedintrin.h
===
--- lib/Headers/rdseedintrin.h
+++ lib/Headers/rdseedintrin.h
@@ -21,8 +21,8 @@
  *===---===
  */
 
-#ifndef __X86INTRIN_H
-#error "Never use  directly; include  instead."
+#ifndef __IMMINTRIN_H
+#error "Never use  directly; include  instead."
 #endif
 
 #ifndef __RDSEEDINTRIN_H
Index: lib/Headers/ptwriteintrin.h
===
--- lib/Headers/ptwriteintrin.h
+++ lib/Headers/ptwriteintrin.h
@@ -21,8 +21,8 @@
  *===---===
  */
 
-#ifndef __X86INTRIN_H
-#error "Never use  directly; include  instead."
+#ifndef __IMMINTRIN_H
+#error "Never use  directly; include  instead."
 #endif
 
 #ifndef __PTWRITEINTRIN_H
Index: lib/Headers/pconfigint

[PATCH] D47182: [X86] Move all Intel defined intrinsic includes into immintrin.h

2018-05-22 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

First there was mmintrin.h which covered MMX instructions. Then xmmintrin.h 
came along to support SSE1 and implicitly included mmintrin.h. The emmintrin.h 
to support SSE2 and implicitly included xmmintrin.h. This repeated for each new 
version of SSE. With each header file including the previous header file. I 
think most of the later SSE headers start with the first letter of the code 
name of the CPU that added that SSE level including cancelled projects. So 
nmmintrin.h refers to Nehalem. tmmintrin.h came from the cancelled Tejas CPU. 
wmmintrin.h referes to Westmere. pmmintrin.h refers to Penryn. Eventually this 
was determined to not be very scalable to remember which header file contained 
what intrinsics and you have to change it with each generation to get the 
latest.. So immintrin.h was created to just include everything. I assume the 
'i' here just stands for Intel. The 'mm' is just historic legacy due to the 
earlier file names.  x86intrin.h was created by gcc to hold the intrinsics not 
specified by Intel. I think icc has an x86intrin.h that just includes 
immintrin.h.


Repository:
  rC Clang

https://reviews.llvm.org/D47182



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47202: [CodeGen] use nsw negation for abs

2018-05-22 Thread Craig Topper via Phabricator via cfe-commits
craig.topper accepted this revision.
craig.topper added a comment.
This revision is now accepted and ready to land.

This seems right to me. GCC believes believes that __bultin_abs always returns 
a positive number.


https://reviews.llvm.org/D47202



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47174: [X86] Move 128-bit f16c intrinsics to __emmintrin_f16c.h include from emmintrin.h. Move 256-bit f16c intrinsics back to f16cintrin.h

2018-05-22 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

It is odd, but they really are split in the icc include files. So they got 
split a while back in clang to match the Intel Intrinsic Guide documentation.


Repository:
  rC Clang

https://reviews.llvm.org/D47174



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47174: [X86] Move 128-bit f16c intrinsics to __emmintrin_f16c.h include from emmintrin.h. Move 256-bit f16c intrinsics back to f16cintrin.h

2018-05-22 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

Implemented @DavidKreitzer's suggestion in r333033


Repository:
  rL LLVM

https://reviews.llvm.org/D47174



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47182: [X86] Move all Intel defined intrinsic includes into immintrin.h

2018-05-22 Thread Craig Topper via Phabricator via cfe-commits
craig.topper updated this revision to Diff 148115.
craig.topper added a comment.

Leave the message still saying x86intrin.h. Change the error checks to look for 
either x86intrin.h or immintrin.h to have been included. Really only the 
immintrin.h check is necessary since that's the header that does the include, 
but I put both so the error message saying x86intrin.h would be less confusing.


https://reviews.llvm.org/D47182

Files:
  lib/Headers/cldemoteintrin.h
  lib/Headers/clzerointrin.h
  lib/Headers/immintrin.h
  lib/Headers/movdirintrin.h
  lib/Headers/pconfigintrin.h
  lib/Headers/ptwriteintrin.h
  lib/Headers/rdseedintrin.h
  lib/Headers/sgxintrin.h
  lib/Headers/waitpkgintrin.h
  lib/Headers/wbnoinvdintrin.h
  lib/Headers/x86intrin.h

Index: lib/Headers/x86intrin.h
===
--- lib/Headers/x86intrin.h
+++ lib/Headers/x86intrin.h
@@ -32,26 +32,6 @@
 #include 
 #endif
 
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__BMI__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__BMI2__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__LZCNT__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__POPCNT__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__RDSEED__)
-#include 
-#endif
-
 #if !defined(_MSC_VER) || __has_feature(modules) || defined(__PRFCHW__)
 #include 
 #endif
@@ -76,45 +56,8 @@
 #include 
 #endif
 
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__F16C__)
-#include 
-#endif
-
 #if !defined(_MSC_VER) || __has_feature(modules) || defined(__MWAITX__)
 #include 
 #endif
 
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__CLZERO__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__WBNOINVD__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__CLDEMOTE__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__WAITPKG__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || \
-  defined(__MOVDIRI__) || defined(__MOVDIR64B__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__PCONFIG__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__SGX__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__PTWRITE__)
-#include 
-#endif
-
 #endif /* __X86INTRIN_H */
Index: lib/Headers/wbnoinvdintrin.h
===
--- lib/Headers/wbnoinvdintrin.h
+++ lib/Headers/wbnoinvdintrin.h
@@ -21,7 +21,7 @@
  *===---===
  */
 
-#ifndef __X86INTRIN_H
+#if !defined __X86INTRIN_H && !defined __IMMINTRIN_H
 #error "Never use  directly; include  instead."
 #endif
 
Index: lib/Headers/waitpkgintrin.h
===
--- lib/Headers/waitpkgintrin.h
+++ lib/Headers/waitpkgintrin.h
@@ -20,7 +20,7 @@
  *
  *===---===
  */
-#ifndef __X86INTRIN_H
+#if !defined __X86INTRIN_H && !defined __IMMINTRIN_H
 #error "Never use  directly; include  instead."
 #endif
 
Index: lib/Headers/sgxintrin.h
===
--- lib/Headers/sgxintrin.h
+++ lib/Headers/sgxintrin.h
@@ -21,7 +21,7 @@
  *===---===
  */
 
-#ifndef __X86INTRIN_H
+#if !defined __X86INTRIN_H && !defined __IMMINTRIN_H
 #error "Never use  directly; include  instead."
 #endif
 
Index: lib/Headers/rdseedintrin.h
===
--- lib/Headers/rdseedintrin.h
+++ lib/Headers/rdseedintrin.h
@@ -21,7 +21,7 @@
  *===---===
  */
 
-#ifndef __X86INTRIN_H
+#if !defined __X86INTRIN_H && !defined __IMMINTRIN_H
 #error "Never use  directly; include  instead."
 #endif
 
Index: lib/Headers/ptwriteintrin.h
===
--- lib/Headers/ptwriteintrin.h
+++ lib/Headers/ptwriteintrin.h
@@ -21,7 +21,7 @@
  *===---===
  */
 
-#ifndef __X86INTRIN_H
+#if !defined __X86INTRIN_H && !defined __IMMINTRIN_H
 #error "Never use  directly; include  instead."
 #endif
 
Index: lib/Headers/pconfigintrin.h
===
--- lib/Headers/pconfigintrin.h
+++ lib/Headers/pconfigintrin.h
@@ -21,7 +21,7 @@
  *===---===
  */
 
-#ifndef __X86INTRIN_H
+#if !defined __X86INTRIN_H && !defined __IMMINTRIN_H
 #error "Never use  directly; include  instead."
 #endif
 
Index:

[PATCH] D47182: [X86] Move all Intel defined intrinsic includes into immintrin.h

2018-05-23 Thread Craig Topper via Phabricator via cfe-commits
craig.topper updated this revision to Diff 148252.
craig.topper added a comment.

Add back popcntintrin.h


https://reviews.llvm.org/D47182

Files:
  lib/Headers/cldemoteintrin.h
  lib/Headers/clzerointrin.h
  lib/Headers/immintrin.h
  lib/Headers/movdirintrin.h
  lib/Headers/pconfigintrin.h
  lib/Headers/ptwriteintrin.h
  lib/Headers/rdseedintrin.h
  lib/Headers/sgxintrin.h
  lib/Headers/waitpkgintrin.h
  lib/Headers/wbnoinvdintrin.h
  lib/Headers/x86intrin.h

Index: lib/Headers/x86intrin.h
===
--- lib/Headers/x86intrin.h
+++ lib/Headers/x86intrin.h
@@ -32,26 +32,6 @@
 #include 
 #endif
 
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__BMI__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__BMI2__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__LZCNT__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__POPCNT__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__RDSEED__)
-#include 
-#endif
-
 #if !defined(_MSC_VER) || __has_feature(modules) || defined(__PRFCHW__)
 #include 
 #endif
@@ -76,45 +56,8 @@
 #include 
 #endif
 
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__F16C__)
-#include 
-#endif
-
 #if !defined(_MSC_VER) || __has_feature(modules) || defined(__MWAITX__)
 #include 
 #endif
 
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__CLZERO__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__WBNOINVD__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__CLDEMOTE__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__WAITPKG__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || \
-  defined(__MOVDIRI__) || defined(__MOVDIR64B__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__PCONFIG__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__SGX__)
-#include 
-#endif
-
-#if !defined(_MSC_VER) || __has_feature(modules) || defined(__PTWRITE__)
-#include 
-#endif
-
 #endif /* __X86INTRIN_H */
Index: lib/Headers/wbnoinvdintrin.h
===
--- lib/Headers/wbnoinvdintrin.h
+++ lib/Headers/wbnoinvdintrin.h
@@ -21,7 +21,7 @@
  *===---===
  */
 
-#ifndef __X86INTRIN_H
+#if !defined __X86INTRIN_H && !defined __IMMINTRIN_H
 #error "Never use  directly; include  instead."
 #endif
 
Index: lib/Headers/waitpkgintrin.h
===
--- lib/Headers/waitpkgintrin.h
+++ lib/Headers/waitpkgintrin.h
@@ -20,7 +20,7 @@
  *
  *===---===
  */
-#ifndef __X86INTRIN_H
+#if !defined __X86INTRIN_H && !defined __IMMINTRIN_H
 #error "Never use  directly; include  instead."
 #endif
 
Index: lib/Headers/sgxintrin.h
===
--- lib/Headers/sgxintrin.h
+++ lib/Headers/sgxintrin.h
@@ -21,7 +21,7 @@
  *===---===
  */
 
-#ifndef __X86INTRIN_H
+#if !defined __X86INTRIN_H && !defined __IMMINTRIN_H
 #error "Never use  directly; include  instead."
 #endif
 
Index: lib/Headers/rdseedintrin.h
===
--- lib/Headers/rdseedintrin.h
+++ lib/Headers/rdseedintrin.h
@@ -21,7 +21,7 @@
  *===---===
  */
 
-#ifndef __X86INTRIN_H
+#if !defined __X86INTRIN_H && !defined __IMMINTRIN_H
 #error "Never use  directly; include  instead."
 #endif
 
Index: lib/Headers/ptwriteintrin.h
===
--- lib/Headers/ptwriteintrin.h
+++ lib/Headers/ptwriteintrin.h
@@ -21,7 +21,7 @@
  *===---===
  */
 
-#ifndef __X86INTRIN_H
+#if !defined __X86INTRIN_H && !defined __IMMINTRIN_H
 #error "Never use  directly; include  instead."
 #endif
 
Index: lib/Headers/pconfigintrin.h
===
--- lib/Headers/pconfigintrin.h
+++ lib/Headers/pconfigintrin.h
@@ -21,7 +21,7 @@
  *===---===
  */
 
-#ifndef __X86INTRIN_H
+#if !defined __X86INTRIN_H && !defined __IMMINTRIN_H
 #error "Never use  directly; include  instead."
 #endif
 
Index: lib/Headers/movdirintrin.h
===
--- lib/Headers/movdirintrin.h
+++ lib/Headers/movdirintrin.h
@@ -20,7 +20,7 @@
  *
  *===---===
  */
-#ifndef __X86INTRIN_H
+

[PATCH] D46863: [X86] Use __builtin_convertvector to implement some of the packed integer to packed float conversion intrinsics.

2018-05-23 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

Hi @aemerson, I'm not opposed to adding it back. But the clang policy for 
vector builtins has always been that we won't support all the builtins that gcc 
does and to encourage the use of the _mm_* wrappers which are guaranteed to 
work in both compilers. It possible to change your source code to use the 
portable intrinsic name?


Repository:
  rC Clang

https://reviews.llvm.org/D46863



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47277: [modules] Mark __wmmintrin_pclmul.h/__wmmintrin_aes.h as textual

2018-05-23 Thread Craig Topper via Phabricator via cfe-commits
craig.topper accepted this revision.
craig.topper added a comment.
This revision is now accepted and ready to land.

LGTM


https://reviews.llvm.org/D47277



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47142: [x86] invpcid intrinsic

2018-05-24 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

LGTM, if you fix the ordering in cpuid.h.


https://reviews.llvm.org/D47142



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47401: [X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible.

2018-05-26 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: cfe/trunk/lib/Headers/avx512fintrin.h:9855
+  __v8di __t6 = (__v8di)_mm512_##op(__t4, __t5); \
+  return __t6[0];
 

RKSimon wrote:
> Would it be dumb to allow VLX capable CPUs to use 128/256 variants of the 
> VPMAXUQ etc ? Or is it better to focus on improving SimplifyDemandedElts to 
> handle this (and many other reduction cases that all tend to keep to the 
> original vector width)?
I'm not sure how to do that from clang. Should we be using a reduction 
intrinsic and do custom lowering in the backend?


Repository:
  rL LLVM

https://reviews.llvm.org/D47401



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47444: [X86] Lowering FMA intrinsics to native IR (Clang part)

2018-05-29 Thread Craig Topper via Phabricator via cfe-commits
craig.topper accepted this revision.
craig.topper added a comment.
This revision is now accepted and ready to land.

LGTM




Comment at: lib/CodeGen/CGBuiltin.cpp:8416
+static Value *EmitX86FMAExpr(CodeGenFunction &CGF, ArrayRef Ops,
+  unsigned BuiltinID) {
+

Please indent this to line up with the first argument on the previous line.


Repository:
  rC Clang

https://reviews.llvm.org/D47444



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47474: Implement cpu_dispatch/cpu_specific Multiversioning

2018-05-29 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: include/clang/Basic/X86Target.def:295
+CPU_SPECIFIC("pentium_iii", 'H',
+ (1ULL << FEATURE_CMOV | 1ULL << FEATURE_MMX | 1ULL << 
FEATURE_SSE))
+CPU_SPECIFIC("pentium_iii_no_xmm_regs", 'H',

Could we just make the features a comma separated string? Then we wouldn't need 
a third version of EmitX86CpuSupports? Yeah it would incur string processing 
costs, but is that a big deal?



Comment at: lib/Sema/SemaDecl.cpp:9214
+return MultiVersioning::Target;
+  else if (FD->hasAttr())
+return MultiVersioning::CpuDispatch;

No need for else after return.



Comment at: lib/Sema/SemaDeclAttr.cpp:1901
+const TargetInfo &Target = S.Context.getTargetInfo();
+if (llvm::find_if(Cpus, [CpuName, &Target](const IdentifierInfo *Cur) {
+  return Target.cpuSpecificManglingCharacter(CpuName) ==

Maybe use llvm::any_of since you don't actually care about the iterator 
returned.


https://reviews.llvm.org/D47474



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38824: [X86] Synchronize the CPU predefined macros with gcc

2017-10-11 Thread Craig Topper via Phabricator via cfe-commits
craig.topper created this revision.
Herald added a subscriber: krytarowski.

We were using corei7 for a large swatch of Intel CPUs. gcc has a different 
defines that more closely match the march flags. This updates to match.  It 
also fixes skylake-avx512 and adds silvermont in addition to slm.


https://reviews.llvm.org/D38824

Files:
  lib/Basic/Targets/X86.cpp
  test/Preprocessor/predefined-arch-macros.c

Index: test/Preprocessor/predefined-arch-macros.c
===
--- test/Preprocessor/predefined-arch-macros.c
+++ test/Preprocessor/predefined-arch-macros.c
@@ -427,11 +427,14 @@
 // CHECK_COREI7_AVX_M32: #define __SSSE3__ 1
 // CHECK_COREI7_AVX_M32: #define __XSAVEOPT__ 1
 // CHECK_COREI7_AVX_M32: #define __XSAVE__ 1
-// CHECK_COREI7_AVX_M32: #define __corei7 1
-// CHECK_COREI7_AVX_M32: #define __corei7__ 1
+// CHECK_COREI7_AVX_M32: #define __corei7_avx 1
+// CHECK_COREI7_AVX_M32: #define __corei7_avx__ 1
 // CHECK_COREI7_AVX_M32: #define __i386 1
 // CHECK_COREI7_AVX_M32: #define __i386__ 1
-// CHECK_COREI7_AVX_M32: #define __tune_corei7__ 1
+// CHECK_COREI7_AVX_M32: #define __sandybridge 1
+// CHECK_COREI7_AVX_M32: #define __sandybridge__ 1
+// CHECK_COREI7_AVX_M32: #define __tune_corei7_avx__ 1
+// CHECK_COREI7_AVX_M32: #define __tune_sandybridge__ 1
 // CHECK_COREI7_AVX_M32: #define i386 1
 // RUN: %clang -march=corei7-avx -m64 -E -dM %s -o - 2>&1 \
 // RUN: -target i386-unknown-linux \
@@ -454,9 +457,12 @@
 // CHECK_COREI7_AVX_M64: #define __XSAVE__ 1
 // CHECK_COREI7_AVX_M64: #define __amd64 1
 // CHECK_COREI7_AVX_M64: #define __amd64__ 1
-// CHECK_COREI7_AVX_M64: #define __corei7 1
-// CHECK_COREI7_AVX_M64: #define __corei7__ 1
-// CHECK_COREI7_AVX_M64: #define __tune_corei7__ 1
+// CHECK_COREI7_AVX_M64: #define __corei7_avx 1
+// CHECK_COREI7_AVX_M64: #define __corei7_avx__ 1
+// CHECK_COREI7_AVX_M64: #define __sandybridge 1
+// CHECK_COREI7_AVX_M64: #define __sandybridge__ 1
+// CHECK_COREI7_AVX_M64: #define __tune_corei7_avx__ 1
+// CHECK_COREI7_AVX_M64: #define __tune_sandybridge__ 1
 // CHECK_COREI7_AVX_M64: #define __x86_64 1
 // CHECK_COREI7_AVX_M64: #define __x86_64__ 1
 //
@@ -477,11 +483,14 @@
 // CHECK_CORE_AVX_I_M32: #define __SSSE3__ 1
 // CHECK_CORE_AVX_I_M32: #define __XSAVEOPT__ 1
 // CHECK_CORE_AVX_I_M32: #define __XSAVE__ 1
-// CHECK_CORE_AVX_I_M32: #define __corei7 1
-// CHECK_CORE_AVX_I_M32: #define __corei7__ 1
+// CHECK_CORE_AVX_I_M32: #define __corei7_avx 1
+// CHECK_CORE_AVX_I_M32: #define __corei7_avx__ 1
 // CHECK_CORE_AVX_I_M32: #define __i386 1
 // CHECK_CORE_AVX_I_M32: #define __i386__ 1
-// CHECK_CORE_AVX_I_M32: #define __tune_corei7__ 1
+// CHECK_CORE_AVX_I_M32: #define __sandybridge 1
+// CHECK_CORE_AVX_I_M32: #define __sandybridge__ 1
+// CHECK_CORE_AVX_I_M32: #define __tune_corei7_avx__ 1
+// CHECK_CORE_AVX_I_M32: #define __tune_sandybridge__ 1
 // CHECK_CORE_AVX_I_M32: #define i386 1
 // RUN: %clang -march=core-avx-i -m64 -E -dM %s -o - 2>&1 \
 // RUN: -target i386-unknown-linux \
@@ -504,9 +513,12 @@
 // CHECK_CORE_AVX_I_M64: #define __XSAVE__ 1
 // CHECK_CORE_AVX_I_M64: #define __amd64 1
 // CHECK_CORE_AVX_I_M64: #define __amd64__ 1
-// CHECK_CORE_AVX_I_M64: #define __corei7 1
-// CHECK_CORE_AVX_I_M64: #define __corei7__ 1
-// CHECK_CORE_AVX_I_M64: #define __tune_corei7__ 1
+// CHECK_CORE_AVX_I_M64: #define __corei7_avx 1
+// CHECK_CORE_AVX_I_M64: #define __corei7_avx__ 1
+// CHECK_CORE_AVX_I_M64: #define __sandybridge 1
+// CHECK_CORE_AVX_I_M64: #define __sandybridge__ 1
+// CHECK_CORE_AVX_I_M64: #define __tune_corei7_avx__ 1
+// CHECK_CORE_AVX_I_M64: #define __tune_sandybridge__ 1
 // CHECK_CORE_AVX_I_M64: #define __x86_64 1
 // CHECK_CORE_AVX_I_M64: #define __x86_64__ 1
 //
@@ -533,11 +545,14 @@
 // CHECK_CORE_AVX2_M32: #define __SSSE3__ 1
 // CHECK_CORE_AVX2_M32: #define __XSAVEOPT__ 1
 // CHECK_CORE_AVX2_M32: #define __XSAVE__ 1
-// CHECK_CORE_AVX2_M32: #define __corei7 1
-// CHECK_CORE_AVX2_M32: #define __corei7__ 1
+// CHECK_CORE_AVX2_M32: #define __core_avx2 1
+// CHECK_CORE_AVX2_M32: #define __core_avx2__ 1
+// CHECK_CORE_AVX2_M32: #define __haswell 1
+// CHECK_CORE_AVX2_M32: #define __haswell__ 1
 // CHECK_CORE_AVX2_M32: #define __i386 1
 // CHECK_CORE_AVX2_M32: #define __i386__ 1
-// CHECK_CORE_AVX2_M32: #define __tune_corei7__ 1
+// CHECK_CORE_AVX2_M32: #define __tune_core_avx2__ 1
+// CHECK_CORE_AVX2_M32: #define __tune_haswell__ 1
 // CHECK_CORE_AVX2_M32: #define i386 1
 // RUN: %clang -march=core-avx2 -m64 -E -dM %s -o - 2>&1 \
 // RUN: -target i386-unknown-linux \
@@ -566,9 +581,12 @@
 // CHECK_CORE_AVX2_M64: #define __XSAVE__ 1
 // CHECK_CORE_AVX2_M64: #define __amd64 1
 // CHECK_CORE_AVX2_M64: #define __amd64__ 1
-// CHECK_CORE_AVX2_M64: #define __corei7 1
-// CHECK_CORE_AVX2_M64: #define __corei7__ 1
-// CHECK_CORE_AVX2_M64: #define __tune_corei7__ 1
+// CHECK_CORE_AVX2_M64: #define __core_avx2 1
+// CHECK_CORE_AVX2_M64: #define __core_avx2__ 1
+// CHECK_CORE_AVX2_M64: #def

[PATCH] D38824: [X86] Synchronize the CPU predefined macros with gcc

2017-10-11 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: lib/Basic/Targets/X86.cpp:844-845
-// FIXME: Historically, we defined this legacy name, it would be nice to
-// remove it at some point. We've never exposed fine-grained names for
-// recent primary x86 CPUs, and we should keep it that way.
-defineCPUMacros(Builder, "corei7");

chandlerc wrote:
> This seems to undo the idea that we should keep avoiding exposing 
> fine-grained CPU names? What's new that changes this?
CPUs newer than the ones with that comment seem to have ignored said comment.

Probably be cause we don't have a definition for what to do for new CPUs if we 
aren't going to expose fine grained names. Do we just call everything corei7 
forever?



Comment at: lib/Basic/Targets/X86.cpp:852
+defineCPUMacros(Builder, "core_avx2");
+defineCPUMacros(Builder, "haswell");
 break;

chandlerc wrote:
> I find calling a Westmere CPU `nehalem` a little odd. Calling IvyBridge a 
> `sandybridge' CPU seems quite confusing. But calling Skylake (client) and 
> Cannonlake (all? client?) `haswell` seems  deeply weird.
This implementation matches what gcc does. I agree its weird.

gcc doesn't implement cannonlake yet so i don't know what they'll do.


https://reviews.llvm.org/D38824



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38781: [X86] Add CLWB intrinsic. clang part

2017-10-12 Thread Craig Topper via Phabricator via cfe-commits
craig.topper updated this revision to Diff 118816.
craig.topper added a comment.

Address review feedback


https://reviews.llvm.org/D38781

Files:
  include/clang/Basic/BuiltinsX86.def
  lib/Headers/CMakeLists.txt
  lib/Headers/clwbintrin.h
  lib/Headers/immintrin.h
  test/CodeGen/builtin-clwb.c

Index: test/CodeGen/builtin-clwb.c
===
--- /dev/null
+++ test/CodeGen/builtin-clwb.c
@@ -0,0 +1,9 @@
+// RUN: %clang_cc1 %s -ffreestanding -triple=x86_64-apple-darwin -target-feature +clwb  -emit-llvm -o - -Wall -Werror | FileCheck %s
+
+#include 
+
+void test_mm_clwb(const void *__m) {
+  //CHECK-LABEL: @test_mm_clwb
+  //CHECK: @llvm.x86.clwb
+  _mm_clwb(__m);
+}
Index: lib/Headers/immintrin.h
===
--- lib/Headers/immintrin.h
+++ lib/Headers/immintrin.h
@@ -58,6 +58,10 @@
 #include 
 #endif
 
+#if !defined(_MSC_VER) || __has_feature(modules) || defined(__CLWB__)
+#include 
+#endif
+
 #if !defined(_MSC_VER) || __has_feature(modules) || defined(__AVX__)
 #include 
 #endif
Index: lib/Headers/clwbintrin.h
===
--- /dev/null
+++ lib/Headers/clwbintrin.h
@@ -0,0 +1,52 @@
+/*=== clwbintrin.h - CLWB intrinsic ===
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ *
+ *===---===
+ */
+
+#ifndef __IMMINTRIN_H
+#error "Never use  directly; include  instead."
+#endif
+
+#ifndef __CLWBINTRIN_H
+#define __CLWBINTRIN_H
+
+/* Define the default attributes for the functions in this file. */
+#define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__,  __target__("clwb")))
+
+/// \brief Writes back to memory the cache line (if modified) that contains the
+/// linear address specified in \a __m from any level of the cache hierarchy in
+/// the cache coherence domain
+///
+/// \headerfile 
+///
+/// This intrinsic corresponds to the  CLWB  instruction.
+///
+/// \param __p
+///A pointer to the memory location used to identify the cache line to be
+///written back.
+static __inline__ void __DEFAULT_FN_ATTRS
+_mm_clwb(void const *__m) {
+  __builtin_ia32_clwb(__m);
+}
+
+#undef __DEFAULT_FN_ATTRS
+
+#endif
Index: lib/Headers/CMakeLists.txt
===
--- lib/Headers/CMakeLists.txt
+++ lib/Headers/CMakeLists.txt
@@ -33,6 +33,7 @@
   clzerointrin.h
   cpuid.h
   clflushoptintrin.h
+  clwbintrin.h
   emmintrin.h
   f16cintrin.h
   float.h
Index: include/clang/Basic/BuiltinsX86.def
===
--- include/clang/Basic/BuiltinsX86.def
+++ include/clang/Basic/BuiltinsX86.def
@@ -641,6 +641,9 @@
 //CLFLUSHOPT
 TARGET_BUILTIN(__builtin_ia32_clflushopt, "vvC*", "", "clflushopt")
 
+//CLWB
+TARGET_BUILTIN(__builtin_ia32_clwb, "vvC*", "", "clwb")
+
 // ADX
 TARGET_BUILTIN(__builtin_ia32_addcarryx_u32, "UcUcUiUiUi*", "", "adx")
 TARGET_BUILTIN(__builtin_ia32_addcarry_u32, "UcUcUiUiUi*", "", "")
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D38824: [X86] Synchronize the CPU predefined macros with gcc

2017-10-13 Thread Craig Topper via Phabricator via cfe-commits
craig.topper updated this revision to Diff 118973.
craig.topper added a comment.

Only define "corei7" on nehalem/westmere to match gcc. Don't define anything 
for the CPUs newer than that. Add comments to the CPUs where gcc has two sets 
of defines and we have only one.


https://reviews.llvm.org/D38824

Files:
  lib/Basic/Targets/X86.cpp
  test/Preprocessor/predefined-arch-macros.c

Index: test/Preprocessor/predefined-arch-macros.c
===
--- test/Preprocessor/predefined-arch-macros.c
+++ test/Preprocessor/predefined-arch-macros.c
@@ -427,11 +427,8 @@
 // CHECK_COREI7_AVX_M32: #define __SSSE3__ 1
 // CHECK_COREI7_AVX_M32: #define __XSAVEOPT__ 1
 // CHECK_COREI7_AVX_M32: #define __XSAVE__ 1
-// CHECK_COREI7_AVX_M32: #define __corei7 1
-// CHECK_COREI7_AVX_M32: #define __corei7__ 1
 // CHECK_COREI7_AVX_M32: #define __i386 1
 // CHECK_COREI7_AVX_M32: #define __i386__ 1
-// CHECK_COREI7_AVX_M32: #define __tune_corei7__ 1
 // CHECK_COREI7_AVX_M32: #define i386 1
 // RUN: %clang -march=corei7-avx -m64 -E -dM %s -o - 2>&1 \
 // RUN: -target i386-unknown-linux \
@@ -454,9 +451,6 @@
 // CHECK_COREI7_AVX_M64: #define __XSAVE__ 1
 // CHECK_COREI7_AVX_M64: #define __amd64 1
 // CHECK_COREI7_AVX_M64: #define __amd64__ 1
-// CHECK_COREI7_AVX_M64: #define __corei7 1
-// CHECK_COREI7_AVX_M64: #define __corei7__ 1
-// CHECK_COREI7_AVX_M64: #define __tune_corei7__ 1
 // CHECK_COREI7_AVX_M64: #define __x86_64 1
 // CHECK_COREI7_AVX_M64: #define __x86_64__ 1
 //
@@ -477,11 +471,8 @@
 // CHECK_CORE_AVX_I_M32: #define __SSSE3__ 1
 // CHECK_CORE_AVX_I_M32: #define __XSAVEOPT__ 1
 // CHECK_CORE_AVX_I_M32: #define __XSAVE__ 1
-// CHECK_CORE_AVX_I_M32: #define __corei7 1
-// CHECK_CORE_AVX_I_M32: #define __corei7__ 1
 // CHECK_CORE_AVX_I_M32: #define __i386 1
 // CHECK_CORE_AVX_I_M32: #define __i386__ 1
-// CHECK_CORE_AVX_I_M32: #define __tune_corei7__ 1
 // CHECK_CORE_AVX_I_M32: #define i386 1
 // RUN: %clang -march=core-avx-i -m64 -E -dM %s -o - 2>&1 \
 // RUN: -target i386-unknown-linux \
@@ -504,9 +495,6 @@
 // CHECK_CORE_AVX_I_M64: #define __XSAVE__ 1
 // CHECK_CORE_AVX_I_M64: #define __amd64 1
 // CHECK_CORE_AVX_I_M64: #define __amd64__ 1
-// CHECK_CORE_AVX_I_M64: #define __corei7 1
-// CHECK_CORE_AVX_I_M64: #define __corei7__ 1
-// CHECK_CORE_AVX_I_M64: #define __tune_corei7__ 1
 // CHECK_CORE_AVX_I_M64: #define __x86_64 1
 // CHECK_CORE_AVX_I_M64: #define __x86_64__ 1
 //
@@ -533,11 +521,8 @@
 // CHECK_CORE_AVX2_M32: #define __SSSE3__ 1
 // CHECK_CORE_AVX2_M32: #define __XSAVEOPT__ 1
 // CHECK_CORE_AVX2_M32: #define __XSAVE__ 1
-// CHECK_CORE_AVX2_M32: #define __corei7 1
-// CHECK_CORE_AVX2_M32: #define __corei7__ 1
 // CHECK_CORE_AVX2_M32: #define __i386 1
 // CHECK_CORE_AVX2_M32: #define __i386__ 1
-// CHECK_CORE_AVX2_M32: #define __tune_corei7__ 1
 // CHECK_CORE_AVX2_M32: #define i386 1
 // RUN: %clang -march=core-avx2 -m64 -E -dM %s -o - 2>&1 \
 // RUN: -target i386-unknown-linux \
@@ -566,9 +551,6 @@
 // CHECK_CORE_AVX2_M64: #define __XSAVE__ 1
 // CHECK_CORE_AVX2_M64: #define __amd64 1
 // CHECK_CORE_AVX2_M64: #define __amd64__ 1
-// CHECK_CORE_AVX2_M64: #define __corei7 1
-// CHECK_CORE_AVX2_M64: #define __corei7__ 1
-// CHECK_CORE_AVX2_M64: #define __tune_corei7__ 1
 // CHECK_CORE_AVX2_M64: #define __x86_64 1
 // CHECK_CORE_AVX2_M64: #define __x86_64__ 1
 //
@@ -597,11 +579,8 @@
 // CHECK_BROADWELL_M32: #define __SSSE3__ 1
 // CHECK_BROADWELL_M32: #define __XSAVEOPT__ 1
 // CHECK_BROADWELL_M32: #define __XSAVE__ 1
-// CHECK_BROADWELL_M32: #define __corei7 1
-// CHECK_BROADWELL_M32: #define __corei7__ 1
 // CHECK_BROADWELL_M32: #define __i386 1
 // CHECK_BROADWELL_M32: #define __i386__ 1
-// CHECK_BROADWELL_M32: #define __tune_corei7__ 1
 // CHECK_BROADWELL_M32: #define i386 1
 // RUN: %clang -march=broadwell -m64 -E -dM %s -o - 2>&1 \
 // RUN: -target i386-unknown-linux \
@@ -632,9 +611,6 @@
 // CHECK_BROADWELL_M64: #define __XSAVE__ 1
 // CHECK_BROADWELL_M64: #define __amd64 1
 // CHECK_BROADWELL_M64: #define __amd64__ 1
-// CHECK_BROADWELL_M64: #define __corei7 1
-// CHECK_BROADWELL_M64: #define __corei7__ 1
-// CHECK_BROADWELL_M64: #define __tune_corei7__ 1
 // CHECK_BROADWELL_M64: #define __x86_64 1
 // CHECK_BROADWELL_M64: #define __x86_64__ 1
 //
@@ -890,9 +866,6 @@
 // CHECK_SKX_M32: #define __XSAVE__ 1
 // CHECK_SKX_M32: #define __i386 1
 // CHECK_SKX_M32: #define __i386__ 1
-// CHECK_SKX_M32: #define __skx 1
-// CHECK_SKX_M32: #define __skx__ 1
-// CHECK_SKX_M32: #define __tune_skx__ 1
 // CHECK_SKX_M32: #define i386 1
 
 // RUN: %clang -march=skylake-avx512 -m64 -E -dM %s -o - 2>&1 \
@@ -934,9 +907,6 @@
 // CHECK_SKX_M64: #define __XSAVE__ 1
 // CHECK_SKX_M64: #define __amd64 1
 // CHECK_SKX_M64: #define __amd64__ 1
-// CHECK_SKX_M64: #define __skx 1
-// CHECK_SKX_M64: #define __skx__ 1
-// CHECK_SKX_M64: #define __tune_skx__ 1
 // CHECK_SKX_M64: #define __x86_64 1
 // CHECK_SKX_M64: #define __x86_64__ 1
 //
Index: lib/Basic/Tar

[PATCH] D38737: [X86] test/testn intrinsics lowering to IR. clang side

2017-10-16 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: lib/Headers/avx512bwintrin.h:2109
+  return _mm512_cmp_epi8_mask(_mm512_and_epi32(__A, __B),
+_mm512_setzero_qi(), 4);
 }

Can you align this with the opening paren on the line above? Same with all the 
other functions below.


https://reviews.llvm.org/D38737



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D51510: Move AESNI generation to Skylake and Goldmont

2018-08-30 Thread Craig Topper via Phabricator via cfe-commits
craig.topper accepted this revision.
craig.topper added a comment.
This revision is now accepted and ready to land.

LGTM. Can you update lib/Target/X86/X86.td in LLVM repo as well?


Repository:
  rC Clang

https://reviews.llvm.org/D51510



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D51510: Move AESNI generation to Skylake and Goldmont

2018-09-06 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

Do you have commit access, or do you need someone to commit this for you?


Repository:
  rC Clang

https://reviews.llvm.org/D51510



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D51771: [X86] Modify addcarry/subborrow builtins to emit an 2 result and intrinsic and an store instruction.

2018-09-06 Thread Craig Topper via Phabricator via cfe-commits
craig.topper created this revision.
craig.topper added reviewers: RKSimon, spatel.
Herald added subscribers: cfe-commits, kristina.

This is the clang side of https://reviews.llvm.org/D51769. The llvm intrinsics 
now return two results instead of using an out parameter.


Repository:
  rC Clang

https://reviews.llvm.org/D51771

Files:
  lib/CodeGen/CGBuiltin.cpp
  test/CodeGen/adc-builtins.c
  test/CodeGen/adx-builtins.c

Index: test/CodeGen/adx-builtins.c
===
--- test/CodeGen/adx-builtins.c
+++ test/CodeGen/adx-builtins.c
@@ -5,14 +5,20 @@
 unsigned char test_addcarryx_u32(unsigned char __cf, unsigned int __x,
  unsigned int __y, unsigned int *__p) {
 // CHECK-LABEL: test_addcarryx_u32
-// CHECK: call i8 @llvm.x86.addcarryx.u32
+// CHECK: [[ADC:%.*]] = call { i8, i32 } @llvm.x86.addcarryx.u32
+// CHECK: [[DATA:%.*]] = extractvalue { i8, i32 } [[ADC]], 1
+// CHECK: store i32 [[DATA]], i32* %{{.*}}
+// CHECK: [[CF:%.*]] = extractvalue { i8, i32 } [[ADC]], 0
   return _addcarryx_u32(__cf, __x, __y, __p);
 }
 
 unsigned char test_addcarryx_u64(unsigned char __cf, unsigned long long __x,
  unsigned long long __y,
  unsigned long long *__p) {
 // CHECK-LABEL: test_addcarryx_u64
-// CHECK: call i8 @llvm.x86.addcarryx.u64
+// CHECK: [[ADC:%.*]] = call { i8, i64 } @llvm.x86.addcarryx.u64
+// CHECK: [[DATA:%.*]] = extractvalue { i8, i64 } [[ADC]], 1
+// CHECK: store i64 [[DATA]], i64* %{{.*}}
+// CHECK: [[CF:%.*]] = extractvalue { i8, i64 } [[ADC]], 0
   return _addcarryx_u64(__cf, __x, __y, __p);
 }
Index: test/CodeGen/adc-builtins.c
===
--- test/CodeGen/adc-builtins.c
+++ test/CodeGen/adc-builtins.c
@@ -5,29 +5,41 @@
 unsigned char test_addcarry_u32(unsigned char __cf, unsigned int __x,
 unsigned int __y, unsigned int *__p) {
 // CHECK-LABEL: test_addcarry_u32
-// CHECK: call i8 @llvm.x86.addcarry.u32
+// CHECK: [[ADC:%.*]] = call { i8, i32 } @llvm.x86.addcarry.u32
+// CHECK: [[DATA:%.*]] = extractvalue { i8, i32 } [[ADC]], 1
+// CHECK: store i32 [[DATA]], i32* %{{.*}}
+// CHECK: [[CF:%.*]] = extractvalue { i8, i32 } [[ADC]], 0
   return _addcarry_u32(__cf, __x, __y, __p);
 }
 
 unsigned char test_addcarry_u64(unsigned char __cf, unsigned long long __x,
 unsigned long long __y,
 unsigned long long *__p) {
 // CHECK-LABEL: test_addcarry_u64
-// CHECK: call i8 @llvm.x86.addcarry.u64
+// CHECK: [[ADC:%.*]] = call { i8, i64 } @llvm.x86.addcarry.u64
+// CHECK: [[DATA:%.*]] = extractvalue { i8, i64 } [[ADC]], 1
+// CHECK: store i64 [[DATA]], i64* %{{.*}}
+// CHECK: [[CF:%.*]] = extractvalue { i8, i64 } [[ADC]], 0
   return _addcarry_u64(__cf, __x, __y, __p);
 }
 
 unsigned char test_subborrow_u32(unsigned char __cf, unsigned int __x,
  unsigned int __y, unsigned int *__p) {
 // CHECK-LABEL: test_subborrow_u32
-// CHECK: call i8 @llvm.x86.subborrow.u32
+// CHECK: [[SBB:%.*]] = call { i8, i32 } @llvm.x86.subborrow.u32
+// CHECK: [[DATA:%.*]] = extractvalue { i8, i32 } [[SBB]], 1
+// CHECK: store i32 [[DATA]], i32* %{{.*}}
+// CHECK: [[CF:%.*]] = extractvalue { i8, i32 } [[SBB]], 0
   return _subborrow_u32(__cf, __x, __y, __p);
 }
 
 unsigned char test_subborrow_u64(unsigned char __cf, unsigned long long __x,
  unsigned long long __y,
  unsigned long long *__p) {
 // CHECK-LABEL: test_subborrow_u64
-// CHECK: call i8 @llvm.x86.subborrow.u64
+// CHECK: [[SBB:%.*]] = call { i8, i64 } @llvm.x86.subborrow.u64
+// CHECK: [[DATA:%.*]] = extractvalue { i8, i64 } [[SBB]], 1
+// CHECK: store i64 [[DATA]], i64* %{{.*}}
+// CHECK: [[CF:%.*]] = extractvalue { i8, i64 } [[SBB]], 0
   return _subborrow_u64(__cf, __x, __y, __p);
 }
Index: lib/CodeGen/CGBuiltin.cpp
===
--- lib/CodeGen/CGBuiltin.cpp
+++ lib/CodeGen/CGBuiltin.cpp
@@ -10405,6 +10405,41 @@
   Ops[0]);
 return Builder.CreateExtractValue(Call, 1);
   }
+  case X86::BI__builtin_ia32_addcarryx_u32:
+  case X86::BI__builtin_ia32_addcarryx_u64:
+  case X86::BI__builtin_ia32_addcarry_u32:
+  case X86::BI__builtin_ia32_addcarry_u64:
+  case X86::BI__builtin_ia32_subborrow_u32:
+  case X86::BI__builtin_ia32_subborrow_u64: {
+Intrinsic::ID IID;
+switch (BuiltinID) {
+default: llvm_unreachable("Unsupported intrinsic!");
+case X86::BI__builtin_ia32_addcarryx_u32:
+  IID = Intrinsic::x86_addcarryx_u32;
+  break;
+case X86::BI__builtin_ia32_addcarryx_u64:
+  IID = Intrinsic::x86_addcarryx_u64;
+  break;
+case X86::BI__builtin_ia32_addcarry_u32:
+  IID = Intrinsic::x86_addcarry_u32;
+  break;
+case X86::BI__builtin_ia32_addcarry

[PATCH] D51805: [X86] Custom emit __builtin_rdtscp so we can emit an explicit store for the out parameter

2018-09-07 Thread Craig Topper via Phabricator via cfe-commits
craig.topper created this revision.
craig.topper added reviewers: RKSimon, spatel.

This is the clang side of https://reviews.llvm.org/D51803. The llvm intrinsic 
now returns two results. So we need to emit an explicit store in IR for the out 
parameter. This is similar to addcarry/subborrow/rdrand/rdseed.


https://reviews.llvm.org/D51805

Files:
  lib/CodeGen/CGBuiltin.cpp
  test/CodeGen/rd-builtins.c


Index: test/CodeGen/rd-builtins.c
===
--- test/CodeGen/rd-builtins.c
+++ test/CodeGen/rd-builtins.c
@@ -14,3 +14,12 @@
 // CHECK: @test_rdtsc
 // CHECK: call i64 @llvm.x86.rdtsc
 }
+
+unsigned long long test_rdtscp(unsigned int *a) {
+// CHECK: @test_rdtscp
+// CHECK: [[RDTSCP:%.*]] = call { i64, i32 } @llvm.x86.rdtscp
+// CHECK: [[TSC_AUX:%.*]] = extractvalue { i64, i32 } [[RDTSCP]], 1
+// CHECK: store i32 [[TSC_AUX]], i32* %{{.*}}
+// CHECK: [[TSC:%.*]] = extractvalue { i64, i32 } [[RDTSCP]], 0
+  return __rdtscp(a);
+}
Index: lib/CodeGen/CGBuiltin.cpp
===
--- lib/CodeGen/CGBuiltin.cpp
+++ lib/CodeGen/CGBuiltin.cpp
@@ -9158,6 +9158,12 @@
   case X86::BI__rdtsc: {
 return Builder.CreateCall(CGM.getIntrinsic(Intrinsic::x86_rdtsc));
   }
+  case X86::BI__builtin_ia32_rdtscp: {
+Value *Call = Builder.CreateCall(CGM.getIntrinsic(Intrinsic::x86_rdtscp));
+Builder.CreateDefaultAlignedStore(Builder.CreateExtractValue(Call, 1),
+  Ops[0]);
+return Builder.CreateExtractValue(Call, 0);
+  }
   case X86::BI__builtin_ia32_undef128:
   case X86::BI__builtin_ia32_undef256:
   case X86::BI__builtin_ia32_undef512:


Index: test/CodeGen/rd-builtins.c
===
--- test/CodeGen/rd-builtins.c
+++ test/CodeGen/rd-builtins.c
@@ -14,3 +14,12 @@
 // CHECK: @test_rdtsc
 // CHECK: call i64 @llvm.x86.rdtsc
 }
+
+unsigned long long test_rdtscp(unsigned int *a) {
+// CHECK: @test_rdtscp
+// CHECK: [[RDTSCP:%.*]] = call { i64, i32 } @llvm.x86.rdtscp
+// CHECK: [[TSC_AUX:%.*]] = extractvalue { i64, i32 } [[RDTSCP]], 1
+// CHECK: store i32 [[TSC_AUX]], i32* %{{.*}}
+// CHECK: [[TSC:%.*]] = extractvalue { i64, i32 } [[RDTSCP]], 0
+  return __rdtscp(a);
+}
Index: lib/CodeGen/CGBuiltin.cpp
===
--- lib/CodeGen/CGBuiltin.cpp
+++ lib/CodeGen/CGBuiltin.cpp
@@ -9158,6 +9158,12 @@
   case X86::BI__rdtsc: {
 return Builder.CreateCall(CGM.getIntrinsic(Intrinsic::x86_rdtsc));
   }
+  case X86::BI__builtin_ia32_rdtscp: {
+Value *Call = Builder.CreateCall(CGM.getIntrinsic(Intrinsic::x86_rdtscp));
+Builder.CreateDefaultAlignedStore(Builder.CreateExtractValue(Call, 1),
+  Ops[0]);
+return Builder.CreateExtractValue(Call, 0);
+  }
   case X86::BI__builtin_ia32_undef128:
   case X86::BI__builtin_ia32_undef256:
   case X86::BI__builtin_ia32_undef512:
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49606: [ms] Add __shiftleft128 / __shiftright128 intrinsics

2018-07-20 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

@spatel, should this ultimately use funnel shift?


https://reviews.llvm.org/D49606



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49606: [ms] Add __shiftleft128 / __shiftright128 intrinsics

2018-07-20 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

Here are the IR patterns for this that work. Not sure if we can do this 
directly in C, we need a 128 bit type, but maybe we can emit it from 
CGBuiltin.cpp?

  define i64 @__shiftleft128(i64 %x, i64 %y, i8 %amt) {
%a = zext i64 %x to i128
%b = zext i64 %y to i128
%c = shl i128 %b, 64
%d = or i128 %a, %c
%amtmask = and i8 %amt, 63
%e = zext i8 %amtmask to i128
%f = shl i128 %d, %e
%g = lshr i128 %f, 64
%h = trunc i128 %g to i64
ret i64 %h
  }
  
  define i64 @__shiftright128(i64 %x, i64 %y, i8 %amt) {
%a = zext i64 %x to i128
%b = zext i64 %y to i128
%c = shl i128 %b, 64
%d = or i128 %a, %c
%amtmask = and i8 %amt, 63
%e = zext i8 %amtmask to i128
%f = lshr i128 %d, %e
%g = trunc i128 %f to i64
ret i64 %g
  }


https://reviews.llvm.org/D49606



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49606: [ms] Add __shiftleft128 / __shiftright128 intrinsics

2018-07-20 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

I'd prefer the pattern over inline assembly. It'll give us more flexibility in 
the backend if we should be using some other instruction on different targets.


https://reviews.llvm.org/D49606



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49606: [ms] Add __shiftleft128 / __shiftright128 intrinsics

2018-07-20 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

@spatel, yes its exactly funnel shift. I wasn't sure if we were ready for clang 
to create it yet or not. Can we let this go as is and change it to funnel shift 
once we have the variable case fixed in the backend?


https://reviews.llvm.org/D49606



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49606: [ms] Add __shiftleft128 / __shiftright128 intrinsics

2018-07-20 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

The only weird thing that I can really think of with the C version is that the 
'and' on the shift amount might get hoisted out of a loop and not get dropped 
during isel.


https://reviews.llvm.org/D49606



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49606: [ms] Add __shiftleft128 / __shiftright128 intrinsics

2018-07-20 Thread Craig Topper via Phabricator via cfe-commits
craig.topper accepted this revision.
craig.topper added a comment.
This revision is now accepted and ready to land.

LGTM. I'm inclined to let this go in now since we have a requested use for it. 
We can change it to funnel shift once we're confident in the backend.


https://reviews.llvm.org/D49606



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D50168: [Builtins] Implement __builtin_clrsb to be compatible with gcc

2018-08-01 Thread Craig Topper via Phabricator via cfe-commits
craig.topper created this revision.
craig.topper added reviewers: bkramer, efriedma, spatel.
Herald added a reviewer: javed.absar.
Herald added a subscriber: kristof.beyls.

gcc defines an intrinsic called __builtin_clrsb which counts the number of 
extra sign bits on a number. This is equivalent to counting the number of 
leading zeros on a positive number or the number of leading ones on a negative 
number and subtracting one from the result. Since we can't count leading ones 
we need to invert negative numbers to count zeros.

The emitted sequence contains a bit of trickery stolen from an LLVM AArch64 
test arm64-clrsb.ll to prevent passing a value of 0 to ctlz. I used a icmp slt 
and a select to conditionally negate, but InstCombine will turn that into an 
ashr+xor. I can emit that directly if that's prefered. I know @spatel has been 
trying to remove some of the bit tricks from InstCombine so I'm not sure if the 
ashr+xor form will be canonical going forward.

This patch will cause the builtin to be expanded inline while gcc uses a call 
to a function like __clrsbdi2 that is implemented in libgcc. But this is 
similar to what we already do for popcnt. And I don't think compiler-rt 
supports __clrsbdi2.


https://reviews.llvm.org/D50168

Files:
  include/clang/Basic/Builtins.def
  lib/CodeGen/CGBuiltin.cpp


Index: lib/CodeGen/CGBuiltin.cpp
===
--- lib/CodeGen/CGBuiltin.cpp
+++ lib/CodeGen/CGBuiltin.cpp
@@ -1537,6 +1537,34 @@
 return RValue::get(ComplexVal.second);
   }
 
+  case Builtin::BI__builtin_clrsb:
+  case Builtin::BI__builtin_clrsbl:
+  case Builtin::BI__builtin_clrsbll: {
+// clrsb(x) -> clz(x < 0 ? ~x : x) - 1 or
+//  -> clz(((x < 0 ? ~x : x) << 1) | 1)
+Value *ArgValue = EmitScalarExpr(E->getArg(0));
+
+llvm::Type *ArgType = ArgValue->getType();
+Value *F = CGM.getIntrinsic(Intrinsic::ctlz, ArgType);
+
+llvm::Type *ResultType = ConvertType(E->getType());
+Value *Zero = llvm::Constant::getNullValue(ArgType);
+Value *IsNeg = Builder.CreateICmpSLT(ArgValue, Zero, "isneg");
+Value *Inverse = Builder.CreateNot(ArgValue, "not");
+Value *Tmp = Builder.CreateSelect(IsNeg, Inverse, ArgValue);
+// Now we need to calculate ctlz(Tmp)-1, but Tmp might be zero. We know
+// the sign bit is zero, so we can shift it out. Then put a 1 in the LSB.
+// This removes one leading zero like the subtract does, and replaces it
+// with a guaranteed one to prevent the value being 0.
+Value *One = llvm::ConstantInt::get(ArgType, 1);
+Tmp = Builder.CreateShl(Tmp, One);
+Tmp = Builder.CreateOr(Tmp, One);
+Value *Result = Builder.CreateCall(F, {Tmp, Builder.getTrue()});
+if (Result->getType() != ResultType)
+  Result = Builder.CreateIntCast(Result, ResultType, /*isSigned*/true,
+ "cast");
+return RValue::get(Result);
+  }
   case Builtin::BI__builtin_ctzs:
   case Builtin::BI__builtin_ctz:
   case Builtin::BI__builtin_ctzl:
Index: include/clang/Basic/Builtins.def
===
--- include/clang/Basic/Builtins.def
+++ include/clang/Basic/Builtins.def
@@ -413,6 +413,9 @@
 BUILTIN(__builtin_popcount  , "iUi"  , "nc")
 BUILTIN(__builtin_popcountl , "iULi" , "nc")
 BUILTIN(__builtin_popcountll, "iULLi", "nc")
+BUILTIN(__builtin_clrsb  , "ii"  , "nc")
+BUILTIN(__builtin_clrsbl , "iLi" , "nc")
+BUILTIN(__builtin_clrsbll, "iLLi", "nc")
 
 // FIXME: These type signatures are not correct for targets with int != 32-bits
 // or with ULL != 64-bits.


Index: lib/CodeGen/CGBuiltin.cpp
===
--- lib/CodeGen/CGBuiltin.cpp
+++ lib/CodeGen/CGBuiltin.cpp
@@ -1537,6 +1537,34 @@
 return RValue::get(ComplexVal.second);
   }
 
+  case Builtin::BI__builtin_clrsb:
+  case Builtin::BI__builtin_clrsbl:
+  case Builtin::BI__builtin_clrsbll: {
+// clrsb(x) -> clz(x < 0 ? ~x : x) - 1 or
+//  -> clz(((x < 0 ? ~x : x) << 1) | 1)
+Value *ArgValue = EmitScalarExpr(E->getArg(0));
+
+llvm::Type *ArgType = ArgValue->getType();
+Value *F = CGM.getIntrinsic(Intrinsic::ctlz, ArgType);
+
+llvm::Type *ResultType = ConvertType(E->getType());
+Value *Zero = llvm::Constant::getNullValue(ArgType);
+Value *IsNeg = Builder.CreateICmpSLT(ArgValue, Zero, "isneg");
+Value *Inverse = Builder.CreateNot(ArgValue, "not");
+Value *Tmp = Builder.CreateSelect(IsNeg, Inverse, ArgValue);
+// Now we need to calculate ctlz(Tmp)-1, but Tmp might be zero. We know
+// the sign bit is zero, so we can shift it out. Then put a 1 in the LSB.
+// This removes one leading zero like the subtract does, and replaces it
+// with a guaranteed one to prevent the value being 0.
+Value *One = llvm::ConstantInt::get(ArgType, 1);
+Tmp = Builder.CreateShl(Tmp, One);
+Tmp = Builder.CreateOr(Tmp, One);
+Value *Res

[PATCH] D43041: Add X86 Support to ValidCPUList (enabling march notes)

2018-02-08 Thread Craig Topper via Phabricator via cfe-commits
craig.topper accepted this revision.
craig.topper added a comment.
This revision is now accepted and ready to land.

LGTM


https://reviews.llvm.org/D43041



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41880: Adding nocf_check attribute for cf-protection fine tuning

2018-02-13 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: lib/Sema/SemaDeclAttr.cpp:2007
+
+bool Sema::CheckAttrNoArgs(const AttributeList &Attr) {
+  if (!checkAttributeNumArgs(*this, Attr, 0)) {

Wy did this get renamed?



Comment at: lib/Sema/SemaDeclAttr.cpp:2016
 
-bool Sema::CheckNoCallerSavedRegsAttr(const AttributeList &Attr) {
+bool Sema::CheckAttrTarget(const AttributeList &Attr) {
   // Check whether the attribute is valid on the current target.

Why did this get renamed?


Repository:
  rL LLVM

https://reviews.llvm.org/D41880



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D43394: [X86] Add 'sahf' CPU feature, and emit __LAHFSAHF__ for it

2018-02-16 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: lib/Basic/Targets/X86.cpp:295
 setFeatureEnabledImpl(Features, "xsave", true);
 setFeatureEnabledImpl(Features, "movbe", true);
 break;

KNM and KNL should both have sahf



Comment at: lib/Basic/Targets/X86.cpp:308
 setFeatureEnabledImpl(Features, "lzcnt", true);
 setFeatureEnabledImpl(Features, "popcnt", true);
 LLVM_FALLTHROUGH;

sahf should be available on amdfam10



Comment at: lib/Basic/Targets/X86.cpp:1049
 
+  if (HasLAHFSAHF)
+Builder.defineMacro("__LAHFSAHF__");

Does gcc define this? It's such a low level instruction I have a hard time 
believing this define would be useful.


Repository:
  rC Clang

https://reviews.llvm.org/D43394



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D43394: [X86] Add 'sahf' CPU feature, and emit __LAHFSAHF__ for it

2018-02-16 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: lib/Basic/Targets/X86.cpp:1049
 
+  if (HasLAHFSAHF)
+Builder.defineMacro("__LAHFSAHF__");

dim wrote:
> craig.topper wrote:
> > Does gcc define this? It's such a low level instruction I have a hard time 
> > believing this define would be useful.
> I tried gcc 6, 7 and 8, and while they do expose stuff like `__POPCNT__`, I 
> see no `__LAHFSAHF__`.  I am supposing Jonathan's original intent with this 
> was to make it easily testable in source, so you can insert the right 
> assembly for the target CPU.  The same could really be said for things like 
> `__RDSEED__`, and some other defines...
Most of the defines indicate the availability of intrinsics. At least that was 
their original intent. They used to control what intrinsic header were included 
in x86intrin.h or immintrin.h. Though now we include everything except in MSVC 
compatible mode and allow the target attribute to provide per function control.

I'd prefer not to add this one if gcc doesn't have it.


Repository:
  rC Clang

https://reviews.llvm.org/D43394



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D43394: [X86] Add 'sahf' CPU feature to frontend

2018-02-17 Thread Craig Topper via Phabricator via cfe-commits
craig.topper accepted this revision.
craig.topper added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rC Clang

https://reviews.llvm.org/D43394



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D43459: [X86] Disable CLWB in Cannon Lake

2018-02-19 Thread Craig Topper via Phabricator via cfe-commits
craig.topper accepted this revision.
craig.topper added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rC Clang

https://reviews.llvm.org/D43459



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33356: [Nios2] Changes in frontend to support Nios2 LLVM target

2017-05-24 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

Is there enough functional here that there should be tests for? i.e. make sure 
march/mcpu switches are recognized, that the target is recognized, etc.


https://reviews.llvm.org/D33356



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33356: [Nios2] Changes in frontend to support Nios2 LLVM target

2017-06-06 Thread Craig Topper via Phabricator via cfe-commits
craig.topper accepted this revision.
craig.topper added a comment.
This revision is now accepted and ready to land.

LGTM




Comment at: lib/Basic/Targets.cpp:7678
+for (const char *feature : allFeatures) {
+Features[feature] = isFeatureSupportedByCPU(feature, CPU);
+}

This is indented too far. Can you fix when you commit?


https://reviews.llvm.org/D33356



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D40478: Added control flow architecture protection Flag

2017-12-18 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: lib/CodeGen/CGCall.cpp:1737
llvm::toStringRef(CodeGenOpts.NoSignedZeros));
+FuncAttrs.addAttribute("shstk-compatible",
+   llvm::toStringRef(CodeGenOpts.ShstkCompatible));

If the command line option is intended to be target independent, shouldn't 
these generically named?



Comment at: lib/CodeGen/CodeGenFunction.cpp:876
   // Apply xray attributes to the function (as a string, for now)
-  if (D && ShouldXRayInstrumentFunction()) {
+  bool InstrumentXray = ShouldXRayInstrumentFunction();
+  if (D && InstrumentXray) {

Why this change?


Repository:
  rL LLVM

https://reviews.llvm.org/D40478



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D40478: Added control flow architecture protection Flag

2017-12-21 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

Are we sure we want a different command line option name from gcc? From our 
internal conversations with the gcc folks I thought they were suggesting that 
-fcf-protection could imply a software mechanism if a hardware mechanism was 
not available thorugh -mibt or -march?

Should we emit an error to the user if -mibt isn't available?  We should be 
able to add virtual methods on TargetInfo that X86 can customize to check for 
ibt and shstk.

Can you provide more information about the miscompile on MSVC? I think we 
should do more to understand that, this sounds like it could be a time bomb 
waiting to fail somewhere else.


Repository:
  rL LLVM

https://reviews.llvm.org/D40478



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41517: mmintrin.h documentation fixes and updates

2017-12-22 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: lib/Headers/mmintrin.h:88
 ///
-/// This intrinsic corresponds to the  VMOVQ / MOVD  instruction.
+/// This intrinsic corresponds to the  MOVD  instruction.
 ///

Shouldn't this be MOVQ?



Comment at: lib/Headers/mmintrin.h:104
 ///
-/// This intrinsic corresponds to the  VMOVQ / MOVD  instruction.
+/// This intrinsic corresponds to the  MOVD  instruction.
 ///

Shouldn't this be MOVQ?



Comment at: lib/Headers/mmintrin.h:1292
 ///
-/// This intrinsic corresponds to the  VXORPS / XORPS  instruction.
+/// This intrinsic corresponds to the  XOR  instruction.
 ///

PXOR?



Comment at: lib/Headers/mmintrin.h:1384
 ///
-/// This intrinsic corresponds to the  VPSHUFD / PSHUFD  instruction.
+/// This intrinsic corresponds to the  PSHUFD  instruction.
 ///

This is overly specific there is no guarantee we'd use those instructions. If 
it was a constant we'd probably just use a load.



Comment at: lib/Headers/mmintrin.h:1402
 ///
-/// This intrinsic corresponds to the  VPSHUFLW / PSHUFLW  instruction.
+/// This intrinsic corresponds to the  PSHUFLW  instruction.
 ///

This is overly specific



Comment at: lib/Headers/mmintrin.h:1419
 ///
-/// This intrinsic corresponds to the  VPUNPCKLBW + VPSHUFLW / PUNPCKLBW +
-///PSHUFLW  instruction.
+/// This intrinsic corresponds to the  PUNPCKLBW + PSHUFLW  instruction.
 ///

This is overly specific


https://reviews.llvm.org/D41517



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41523: xmmintrin.h documentation fixes and updates

2017-12-22 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: lib/Headers/xmmintrin.h:1706
 ///
-/// This intrinsic corresponds to the  VMOVSS / MOVSS + shuffling 
+/// This intrinsic corresponds to the  VBROADCASTSS / BROADCASTSS 
 ///instruction.

There is no BROADCASTSS instruction. That's an AVX instruction that only exists 
as VBROADCASTSS. The orginal comment was correct for pre-AVX.



Comment at: lib/Headers/xmmintrin.h:2199
 ///
-/// This intrinsic corresponds to the  VPINSRW / PINSRW  instruction.
+/// This intrinsic corresponds to the  PINSRW  instruction.
 ///

Why is VPINSRW removed?



Comment at: lib/Headers/xmmintrin.h:2659
 ///
-/// This intrinsic corresponds to the  VMOVSS / MOVSS  instruction.
+/// This intrinsic corresponds to the  VBLENDPS / BLENDPS  instruction.
 ///

MOVSS is correct for pre SSE4.1 targets.


https://reviews.llvm.org/D41523



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41516: emmintrin.h documentation fixes and updates

2017-12-22 Thread Craig Topper via Phabricator via cfe-commits
craig.topper accepted this revision.
craig.topper added a comment.
This revision is now accepted and ready to land.

LGTM


https://reviews.llvm.org/D41516



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41557: [x86][icelake][vbmi2]

2017-12-23 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

Update the ICL macros in test/Preprocessor/predefined-arch-macros.c




Comment at: include/clang/Basic/BuiltinsX86.def:1254
+TARGET_BUILTIN(__builtin_ia32_vpshldd512_mask, "V16iV16iV16iiV16iUs", "", 
"avx512vbmi2")
+TARGET_BUILTIN(__builtin_ia32_vpshldq128_mask, "V2LLiV2LLiV2LLiiV2LLiUc", "", 
"avx512vl,avx512vbmi2")
+TARGET_BUILTIN(__builtin_ia32_vpshldq256_mask, "V4LLiV4LLiV4LLiiV4LLiUc", "", 
"avx512vl,avx512vbmi2")

Arguments corresponding to immediates need a capital 'I' in front of them so 
clang will error if they are a compile time constant.



Comment at: lib/Basic/Targets/X86.cpp:135
   case CK_Icelake:
-// TODO: Add icelake features here.
 LLVM_FALLTHROUGH;

Dont' remove the TODO until all features are added.



Comment at: lib/Basic/Targets/X86.cpp:589
+// Enable BWI instruction if VBMI/VBMI2 is being enabled.
+if (Name.startswith("avx512vbmi") && Enabled)
   Features["avx512bw"] = true;

Do two equality checks ORed together. I think bad target attributes on 
functions only issue a warning and are discarded in codegen. So strings like 
avx512vbmifoo can get here and we should ignore them.


Repository:
  rC Clang

https://reviews.llvm.org/D41557



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41557: [x86][icelake][vbmi2]

2017-12-23 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

Add tests for -mavx512vbmi2 and -mno-avx512vbmi2 to 
test/Driver/x86-target-features.c

Add a test for -mno-avx512bw also disabling avx512vbmi2 to 
test/Preprocessor/x86_target_features.c. Look for AVX512VBMINOAVX512BW for the 
existing test for avx512vbmi. Also add the test -mavx512vbmi2.


Repository:
  rC Clang

https://reviews.llvm.org/D41557



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41558: [x86][icelake][vnni]

2017-12-24 Thread Craig Topper via Phabricator via cfe-commits
craig.topper accepted this revision.
craig.topper added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rC Clang

https://reviews.llvm.org/D41558



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41564: [x86][icelake][bitalg]

2017-12-24 Thread Craig Topper via Phabricator via cfe-commits
craig.topper accepted this revision.
craig.topper added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rC Clang

https://reviews.llvm.org/D41564



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41573: [x86][icelake][vpclmulqdq]

2017-12-25 Thread Craig Topper via Phabricator via cfe-commits
craig.topper accepted this revision.
craig.topper added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rC Clang

https://reviews.llvm.org/D41573



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41557: [x86][icelake][vbmi2]

2017-12-26 Thread Craig Topper via Phabricator via cfe-commits
craig.topper accepted this revision.
craig.topper added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rC Clang

https://reviews.llvm.org/D41557



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41582: [x86][icelake][gfni]

2017-12-26 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: test/CodeGen/gfni-builtins.c:45
+
+#ifdef AVX512
+__m512i test_mm512_gf2p8affineinv_epi64_epi8(__m512i A, __m512i B) {

Doesn't the define have underscores around it?


Repository:
  rC Clang

https://reviews.llvm.org/D41582



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41582: [x86][icelake][gfni]

2017-12-26 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: test/CodeGen/gfni-builtins.c:45
+
+#ifdef AVX512
+__m512i test_mm512_gf2p8affineinv_epi64_epi8(__m512i A, __m512i B) {

craig.topper wrote:
> Doesn't the define have underscores around it?
Oh its your own define. You can problably just use the builtin one.


Repository:
  rC Clang

https://reviews.llvm.org/D41582



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41583: [x86][icelake][vaes]

2017-12-26 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: lib/Basic/Targets/X86.cpp:573
 setMMXLevel(Features, AMD3DNowAthlon, Enabled);
   } else if (Name == "aes") {
 if (Enabled)

Shouldn't -aes imply -vaes?


Repository:
  rC Clang

https://reviews.llvm.org/D41583



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41573: [x86][icelake][vpclmulqdq]

2017-12-26 Thread Craig Topper via Phabricator via cfe-commits
craig.topper requested changes to this revision.
craig.topper added inline comments.
This revision now requires changes to proceed.



Comment at: lib/Basic/Targets/X86.cpp:577
   } else if (Name == "pclmul") {
 if (Enabled)
   setSSELevel(Features, SSE2, Enabled);

I think we should disable vpclmulqdq when pclmul is being disabled.


Repository:
  rC Clang

https://reviews.llvm.org/D41573



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41573: [x86][icelake][vpclmulqdq]

2017-12-26 Thread Craig Topper via Phabricator via cfe-commits
craig.topper accepted this revision.
craig.topper added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rC Clang

https://reviews.llvm.org/D41573



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41583: [x86][icelake][vaes]

2017-12-26 Thread Craig Topper via Phabricator via cfe-commits
craig.topper accepted this revision.
craig.topper added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rC Clang

https://reviews.llvm.org/D41583



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41582: [x86][icelake][gfni]

2017-12-26 Thread Craig Topper via Phabricator via cfe-commits
craig.topper accepted this revision.
craig.topper added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rC Clang

https://reviews.llvm.org/D41582



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D40478: Added control flow architecture protection Flag

2018-01-03 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: lib/CodeGen/CodeGenModule.cpp:501
+  if (CodeGenOpts.CFProtectionReturn) {
+Target.checkCFProtectionReturnSupported(getDiags());
+// Indicate that we want to instrument return control flow protection.

Should we still be adding the module flag if the target says its not supported?


Repository:
  rL LLVM

https://reviews.llvm.org/D40478



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41517: mmintrin.h documentation fixes and updates

2018-01-04 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: lib/Headers/mmintrin.h:1292
 ///
-/// This intrinsic corresponds to the  VXORPS / XORPS  instruction.
+/// This intrinsic corresponds to the  XOR  instruction.
 ///

kromanova wrote:
> craig.topper wrote:
> > PXOR?
> For which platform/compiler? 
>  
> I checked, for x86_64 Linux XORPS(no avx)/VXORPS (with -mavx) is generated.
> For PS4 we generate XORL.
> 
> I guess, we need to write something more generic, implying that an 
> appropriate platform-specific XOR instruction is generated. 
Ideally to interoperate with other mmx intrinsics it should have been a PXOR 
into an mmx register. But apparently our mmx support is so limited that we 
aren't capable of that and instead create it in another domain and move it over.

I guess just indicate it as a utility function with no specific instruction.



Comment at: lib/Headers/mmintrin.h:1384
 ///
-/// This intrinsic corresponds to the  VPSHUFD / PSHUFD  instruction.
+/// This intrinsic corresponds to the  PSHUFD  instruction.
 ///

kromanova wrote:
> craig.topper wrote:
> > This is overly specific there is no guarantee we'd use those instructions. 
> > If it was a constant we'd probably just use a load.
> That's right. I think we should use the following wording to match other 
> _mm_set* intrinsics documentation in this file.
> 
> /// This intrinsic is a utility function and does not correspond to a specific
> ///instruction.
> 
Agreed.


https://reviews.llvm.org/D41517



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D40478: Added control flow architecture protection Flag

2018-01-04 Thread Craig Topper via Phabricator via cfe-commits
craig.topper accepted this revision.
craig.topper added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rL LLVM

https://reviews.llvm.org/D40478



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41357: WIP: Fix Diagnostic layering, moving diagnostics out of Basic

2018-01-04 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: lib/Basic/DiagnosticIDs.cpp:58
 
 /// GetDiagInfo - Return the StaticDiagInfoRec entry for the specified DiagID,
 /// or null if the ID is invalid.

This comment is out of date with the struct being renamed.



Comment at: lib/Basic/DiagnosticIDs.cpp:73
-#define VALIDATE_DIAG_SIZE(NAME)   
\
-  static_assert(   
\
-  static_cast(diag::NUM_BUILTIN_##NAME##_DIAGNOSTICS) <  
\

Did this static_assert stuff get lost?



Comment at: tools/driver/cc1_main.cpp:210
+
+/// GetDiagInfo - Return the StaticDiagInfoRec entry for the specified DiagID,
+/// or null if the ID is invalid.

StaticDiagInfoRec was renamed.


Repository:
  rC Clang

https://reviews.llvm.org/D41357



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41517: mmintrin.h documentation fixes and updates

2018-01-04 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: lib/Headers/mmintrin.h:55
 ///
-/// This intrinsic corresponds to the  VMOVD / MOVD  instruction.
+/// This intrinsic corresponds to the  MOVD  instruction.
 ///

kromanova wrote:
> I tried clang on Linux, x86_64, and if -mavx option is passed, we generate 
> VMOVD, if this option is omitted, we generate MOVD.
> I think I understand the rational behind this change (namely, to keep MOVD, 
> but remove VMOVD),
> since this intrinsic should use MMX registers and shouldn't have 
> corresponding AVX instruction(s).
> 
> However, that's what we generate at the moment when -mavx is passed (I 
> suspect because our MMX support is limited)
> vmovd   %edi, %xmm0
> 
> Since we are writing the documentation for clang compiler, we should document 
> what clang compiler is doing, not what is should be doing.
> Craig, what do you think? Should we revert back to VMOVD/MOVD?
> 
We can change it back to VMOVD/MOVD


https://reviews.llvm.org/D41517



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41523: xmmintrin.h documentation fixes and updates

2018-01-04 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: lib/Headers/xmmintrin.h:2199
 ///
-/// This intrinsic corresponds to the  VPINSRW / PINSRW  instruction.
+/// This intrinsic corresponds to the  PINSRW  instruction.
 ///

kromanova wrote:
> craig.topper wrote:
> > Why is VPINSRW removed?
> I suspect the rational is the same I talked about in mmintrin.h review. 
> This intrinsic should use MMX registers and shouldn't have corresponding AVX 
> instruction(s).
> 
> I've tried this and with or without -mavx for Linux/x86_64 we generate PINSRW 
> in both cases (i.e. I wasn't able to trigger generation of VEX prefixed 
> instruction).
> 
> __m64 foo (__m64 a, int b)
> {
>   __m64 x;
>   x = _mm_insert_pi16 (a, b, 0);
>   return x;
> }
> 
I didn't realize this was a MMX type intrinsic in the xmmintrin file. So 
VPINSRW being removed makes sense. Sorry for the noise.


https://reviews.llvm.org/D41523



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41523: xmmintrin.h documentation fixes and updates

2018-01-05 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added a comment.

The builtins are tested in tests like test/CodeGen/sse-builtins.c


https://reviews.llvm.org/D41523



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41517: mmintrin.h documentation fixes and updates

2018-01-08 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: lib/Headers/mmintrin.h:1402
 ///
-/// This intrinsic corresponds to the  VPSHUFLW / PSHUFLW  instruction.
+/// This intrinsic corresponds to the  PSHUFLW  instruction.
 ///

dyung wrote:
> craig.topper wrote:
> > This is overly specific
> Just to be clear, when you say this is overly specific, you are saying that 
> it should be replaced with the text "This intrinsic is a utility function and 
> does not correspond to a specific instruction."  Is that correct?
Yeah. I just shortened my comment from _mm_set1_pi32.


https://reviews.llvm.org/D41517



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D43817: [x86] wbnoinvd intrinsic

2018-02-27 Thread Craig Topper via Phabricator via cfe-commits
craig.topper added inline comments.



Comment at: docs/ClangCommandLineReference.rst:2359
 
+.. option:: -mwbnoinvd, -mno-wbnoinvd
+

Did you manually add these? This file is normally generated by a tool and 
should be in alphabetical order.



Comment at: lib/Basic/Targets/X86.cpp:136
 // TODO: Add icelake features here.
+setFeatureEnabledImpl(Features, "wbnoinvd", true);
 LLVM_FALLTHROUGH;

Is this based on an old repo? Icelake features have been coded here since late 
December. But as I said in the llvm patch we probably can't enable this on all 
icelakes.



Comment at: lib/Headers/ia32intrin.h:79
 
+#define _wbinvd() __builtin_ia32_wbinvd()
+

Can you separate wbinvd out of this patch? This has some Microsoft 
compatibility issues that need to be carefully checked. We seem to already have 
__wbinvd() defined in intrin.h but I'm not sure it does anything.



Comment at: lib/Headers/wbnoinvdintrin.h:31
+
+#define _wbnoinvd __builtin_ia32_wbnoinvd
+

Use an inline function, not a macro.


Repository:
  rC Clang

https://reviews.llvm.org/D43817



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   3   4   5   6   7   8   9   10   >