[PATCH] D50168: [Builtins] Implement __builtin_clrsb to be compatible with gcc
craig.topper added a comment. Ping https://reviews.llvm.org/D50168 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D50168: [Builtins] Implement __builtin_clrsb to be compatible with gcc
craig.topper updated this revision to Diff 159627. craig.topper added a comment. Add the test case that I failed to pick up in the original diff. https://reviews.llvm.org/D50168 Files: include/clang/Basic/Builtins.def lib/CodeGen/CGBuiltin.cpp test/CodeGen/builtin_clrsb.c Index: test/CodeGen/builtin_clrsb.c === --- /dev/null +++ test/CodeGen/builtin_clrsb.c @@ -0,0 +1,24 @@ +// RUN: %clang_cc1 %s -emit-llvm -o - | FileCheck %s + +int test__builtin_clrsb(int x) { +// CHECK-LABEL: test__builtin_clrsb +// CHECK: [[C:%.*]] = icmp slt i32 [[X:%.*]], 0 +// CHECK: [[INV:%.*]] = xor i32 [[X]], -1 +// CHECK: [[SEL:%.*]] = select i1 [[C]], i32 [[INV]], i32 [[X]] +// CHECK: [[SHL:%.*]] = shl i32 [[SEL]], 1 +// CHECK: [[OR:%.*]] = or i32 [[SHL]], 1 +// CHECK: call i32 @llvm.ctlz.i32(i32 [[OR]], i1 true) + return __builtin_clrsb(x); +} + +int test__builtin_clrsbll(long long x) { +// CHECK-LABEL: test__builtin_clrsbll +// CHECK: [[C:%.*]] = icmp slt i64 [[X:%.*]], 0 +// CHECK-NEXT: [[INV:%.*]] = xor i64 [[X]], -1 +// CHECK-NEXT: [[SEL:%.*]] = select i1 [[C]], i64 [[INV]], i64 [[X]] +// CHECK-NEXT: [[SHL:%.*]] = shl i64 [[SEL]], 1 +// CHECK-NEXT: [[OR:%.*]] = or i64 [[SHL]], 1 +// CHECK-NEXT: [[CTLZ:%.*]] = call i64 @llvm.ctlz.i64(i64 [[OR]], i1 true) +// CHECK-NEXT: trunc i64 [[CTLZ]] to i32 + return __builtin_clrsbll(x); +} Index: lib/CodeGen/CGBuiltin.cpp === --- lib/CodeGen/CGBuiltin.cpp +++ lib/CodeGen/CGBuiltin.cpp @@ -1537,6 +1537,33 @@ return RValue::get(ComplexVal.second); } + case Builtin::BI__builtin_clrsb: + case Builtin::BI__builtin_clrsbl: + case Builtin::BI__builtin_clrsbll: { +// clrsb(x) -> clz(x < 0 ? ~x : x) - 1 or +// -> clz(((x < 0 ? ~x : x) << 1) | 1) +Value *ArgValue = EmitScalarExpr(E->getArg(0)); + +llvm::Type *ArgType = ArgValue->getType(); +Value *F = CGM.getIntrinsic(Intrinsic::ctlz, ArgType); + +llvm::Type *ResultType = ConvertType(E->getType()); +Value *Zero = llvm::Constant::getNullValue(ArgType); +Value *IsNeg = Builder.CreateICmpSLT(ArgValue, Zero, "isneg"); +Value *Inverse = Builder.CreateNot(ArgValue, "not"); +Value *Tmp = Builder.CreateSelect(IsNeg, Inverse, ArgValue); +// Now we need to calculate ctlz(Tmp)-1, but Tmp might be zero. We know +// the sign bit is zero, so we can shift it out. Then put a 1 in the LSB. +// This removes one leading zero like the subtract does, and replaces it +// with a guaranteed one to prevent the value being 0. +Value *One = llvm::ConstantInt::get(ArgType, 1); +Tmp = Builder.CreateShl(Tmp, One); +Tmp = Builder.CreateOr(Tmp, One); +Value *Result = Builder.CreateCall(F, {Tmp, Builder.getTrue()}); +Result = Builder.CreateIntCast(Result, ResultType, /*isSigned*/true, + "cast"); +return RValue::get(Result); + } case Builtin::BI__builtin_ctzs: case Builtin::BI__builtin_ctz: case Builtin::BI__builtin_ctzl: Index: include/clang/Basic/Builtins.def === --- include/clang/Basic/Builtins.def +++ include/clang/Basic/Builtins.def @@ -413,6 +413,9 @@ BUILTIN(__builtin_popcount , "iUi" , "nc") BUILTIN(__builtin_popcountl , "iULi" , "nc") BUILTIN(__builtin_popcountll, "iULLi", "nc") +BUILTIN(__builtin_clrsb , "ii" , "nc") +BUILTIN(__builtin_clrsbl , "iLi" , "nc") +BUILTIN(__builtin_clrsbll, "iLLi", "nc") // FIXME: These type signatures are not correct for targets with int != 32-bits // or with ULL != 64-bits. Index: test/CodeGen/builtin_clrsb.c === --- /dev/null +++ test/CodeGen/builtin_clrsb.c @@ -0,0 +1,24 @@ +// RUN: %clang_cc1 %s -emit-llvm -o - | FileCheck %s + +int test__builtin_clrsb(int x) { +// CHECK-LABEL: test__builtin_clrsb +// CHECK: [[C:%.*]] = icmp slt i32 [[X:%.*]], 0 +// CHECK: [[INV:%.*]] = xor i32 [[X]], -1 +// CHECK: [[SEL:%.*]] = select i1 [[C]], i32 [[INV]], i32 [[X]] +// CHECK: [[SHL:%.*]] = shl i32 [[SEL]], 1 +// CHECK: [[OR:%.*]] = or i32 [[SHL]], 1 +// CHECK: call i32 @llvm.ctlz.i32(i32 [[OR]], i1 true) + return __builtin_clrsb(x); +} + +int test__builtin_clrsbll(long long x) { +// CHECK-LABEL: test__builtin_clrsbll +// CHECK: [[C:%.*]] = icmp slt i64 [[X:%.*]], 0 +// CHECK-NEXT: [[INV:%.*]] = xor i64 [[X]], -1 +// CHECK-NEXT: [[SEL:%.*]] = select i1 [[C]], i64 [[INV]], i64 [[X]] +// CHECK-NEXT: [[SHL:%.*]] = shl i64 [[SEL]], 1 +// CHECK-NEXT: [[OR:%.*]] = or i64 [[SHL]], 1 +// CHECK-NEXT: [[CTLZ:%.*]] = call i64 @llvm.ctlz.i64(i64 [[OR]], i1 true) +// CHECK-NEXT: trunc i64 [[CTLZ]] to i32 + return __builtin_clrsbll(x); +} Index: lib/CodeGen/CGBuiltin.cpp === --- lib/CodeGen/CGBuiltin.cpp +++ lib/CodeGen/CGBuiltin.cpp @@ -1537,6 +1537,33 @@ return RValue::g
[PATCH] D50168: [Builtins] Implement __builtin_clrsb to be compatible with gcc
craig.topper updated this revision to Diff 159753. craig.topper added a comment. Use ctlz(zero_undef=false) and sub https://reviews.llvm.org/D50168 Files: include/clang/Basic/Builtins.def lib/CodeGen/CGBuiltin.cpp test/CodeGen/builtin_clrsb.c Index: test/CodeGen/builtin_clrsb.c === --- /dev/null +++ test/CodeGen/builtin_clrsb.c @@ -0,0 +1,22 @@ +// RUN: %clang_cc1 %s -emit-llvm -o - | FileCheck %s + +int test__builtin_clrsb(int x) { +// CHECK-LABEL: test__builtin_clrsb +// CHECK: [[C:%.*]] = icmp slt i32 [[X:%.*]], 0 +// CHECK-NEXT: [[INV:%.*]] = xor i32 [[X]], -1 +// CHECK-NEXT: [[SEL:%.*]] = select i1 [[C]], i32 [[INV]], i32 [[X]] +// CHECK-NEXT: [[CTLZ:%.*]] = call i32 @llvm.ctlz.i32(i32 [[SEL]], i1 false) +// CHECK-NEXT: [[SUB:%.*]] = sub i32 [[CTLZ]], 1 + return __builtin_clrsb(x); +} + +int test__builtin_clrsbll(long long x) { +// CHECK-LABEL: test__builtin_clrsbll +// CHECK: [[C:%.*]] = icmp slt i64 [[X:%.*]], 0 +// CHECK-NEXT: [[INV:%.*]] = xor i64 [[X]], -1 +// CHECK-NEXT: [[SEL:%.*]] = select i1 [[C]], i64 [[INV]], i64 [[X]] +// CHECK-NEXT: [[CTLZ:%.*]] = call i64 @llvm.ctlz.i64(i64 [[SEL]], i1 false) +// CHECK-NEXT: [[SUB:%.*]] = sub i64 [[CTLZ]], 1 +// CHECK-NEXT: trunc i64 [[SUB]] to i32 + return __builtin_clrsbll(x); +} Index: lib/CodeGen/CGBuiltin.cpp === --- lib/CodeGen/CGBuiltin.cpp +++ lib/CodeGen/CGBuiltin.cpp @@ -1537,6 +1537,26 @@ return RValue::get(ComplexVal.second); } + case Builtin::BI__builtin_clrsb: + case Builtin::BI__builtin_clrsbl: + case Builtin::BI__builtin_clrsbll: { +// clrsb(x) -> clz(x < 0 ? ~x : x) - 1 or +Value *ArgValue = EmitScalarExpr(E->getArg(0)); + +llvm::Type *ArgType = ArgValue->getType(); +Value *F = CGM.getIntrinsic(Intrinsic::ctlz, ArgType); + +llvm::Type *ResultType = ConvertType(E->getType()); +Value *Zero = llvm::Constant::getNullValue(ArgType); +Value *IsNeg = Builder.CreateICmpSLT(ArgValue, Zero, "isneg"); +Value *Inverse = Builder.CreateNot(ArgValue, "not"); +Value *Tmp = Builder.CreateSelect(IsNeg, Inverse, ArgValue); +Value *Ctlz = Builder.CreateCall(F, {Tmp, Builder.getFalse()}); +Value *Result = Builder.CreateSub(Ctlz, llvm::ConstantInt::get(ArgType, 1)); +Result = Builder.CreateIntCast(Result, ResultType, /*isSigned*/true, + "cast"); +return RValue::get(Result); + } case Builtin::BI__builtin_ctzs: case Builtin::BI__builtin_ctz: case Builtin::BI__builtin_ctzl: Index: include/clang/Basic/Builtins.def === --- include/clang/Basic/Builtins.def +++ include/clang/Basic/Builtins.def @@ -413,6 +413,9 @@ BUILTIN(__builtin_popcount , "iUi" , "nc") BUILTIN(__builtin_popcountl , "iULi" , "nc") BUILTIN(__builtin_popcountll, "iULLi", "nc") +BUILTIN(__builtin_clrsb , "ii" , "nc") +BUILTIN(__builtin_clrsbl , "iLi" , "nc") +BUILTIN(__builtin_clrsbll, "iLLi", "nc") // FIXME: These type signatures are not correct for targets with int != 32-bits // or with ULL != 64-bits. Index: test/CodeGen/builtin_clrsb.c === --- /dev/null +++ test/CodeGen/builtin_clrsb.c @@ -0,0 +1,22 @@ +// RUN: %clang_cc1 %s -emit-llvm -o - | FileCheck %s + +int test__builtin_clrsb(int x) { +// CHECK-LABEL: test__builtin_clrsb +// CHECK: [[C:%.*]] = icmp slt i32 [[X:%.*]], 0 +// CHECK-NEXT: [[INV:%.*]] = xor i32 [[X]], -1 +// CHECK-NEXT: [[SEL:%.*]] = select i1 [[C]], i32 [[INV]], i32 [[X]] +// CHECK-NEXT: [[CTLZ:%.*]] = call i32 @llvm.ctlz.i32(i32 [[SEL]], i1 false) +// CHECK-NEXT: [[SUB:%.*]] = sub i32 [[CTLZ]], 1 + return __builtin_clrsb(x); +} + +int test__builtin_clrsbll(long long x) { +// CHECK-LABEL: test__builtin_clrsbll +// CHECK: [[C:%.*]] = icmp slt i64 [[X:%.*]], 0 +// CHECK-NEXT: [[INV:%.*]] = xor i64 [[X]], -1 +// CHECK-NEXT: [[SEL:%.*]] = select i1 [[C]], i64 [[INV]], i64 [[X]] +// CHECK-NEXT: [[CTLZ:%.*]] = call i64 @llvm.ctlz.i64(i64 [[SEL]], i1 false) +// CHECK-NEXT: [[SUB:%.*]] = sub i64 [[CTLZ]], 1 +// CHECK-NEXT: trunc i64 [[SUB]] to i32 + return __builtin_clrsbll(x); +} Index: lib/CodeGen/CGBuiltin.cpp === --- lib/CodeGen/CGBuiltin.cpp +++ lib/CodeGen/CGBuiltin.cpp @@ -1537,6 +1537,26 @@ return RValue::get(ComplexVal.second); } + case Builtin::BI__builtin_clrsb: + case Builtin::BI__builtin_clrsbl: + case Builtin::BI__builtin_clrsbll: { +// clrsb(x) -> clz(x < 0 ? ~x : x) - 1 or +Value *ArgValue = EmitScalarExpr(E->getArg(0)); + +llvm::Type *ArgType = ArgValue->getType(); +Value *F = CGM.getIntrinsic(Intrinsic::ctlz, ArgType); + +llvm::Type *ResultType = ConvertType(E->getType()); +Value *Zero = llvm::Constant::getNullValue(ArgType); +Value *IsNeg = Builder.CreateICmpSLT(ArgValue, Zero
[PATCH] D45619: [Time-report] (1) Use special new Clang flag 'FrontendTimesIsEnabled' instead of 'llvm::TimePassesIsEnabled' inside -ftime-report feature
craig.topper added a comment. Correct me if I'm wrong, but after this change llvm no longer enables the timing of individual passes when -ftime-report is passed? Was that intentional? Repository: rL LLVM https://reviews.llvm.org/D45619 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45619: [Time-report] (1) Use special new Clang flag 'FrontendTimesIsEnabled' instead of 'llvm::TimePassesIsEnabled' inside -ftime-report feature
craig.topper added a comment. Ok I'll add that back. I'm unclear why the we would want to assign clang's FrontendTimesIsEnabled from inside CodeGenAction. If I'm understanding the intentions here, the goal was to add more timing infrastructure to clang. But if the enabling is tied to CodeGenAction, then doesn't that mean any new clang timers wouldn't work under -fsyntax-only? Repository: rL LLVM https://reviews.llvm.org/D45619 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D45619: [Time-report] (1) Use special new Clang flag 'FrontendTimesIsEnabled' instead of 'llvm::TimePassesIsEnabled' inside -ftime-report feature
craig.topper added a comment. Assignment restored in r339281. I'll file a bug to merge to 7.0 Repository: rL LLVM https://reviews.llvm.org/D45619 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D50471: [Builtins] Add __bulitin_clrsb support to IntExprEvaluator::VisitBuiltinCallExpr
craig.topper created this revision. craig.topper added reviewers: bkramer, erichkeane. Now that __builtin_clrsb is supported by clang, this patch adds constant evaluation of it to address the FIXME. https://reviews.llvm.org/D50471 Files: lib/AST/ExprConstant.cpp test/Sema/constant-builtins-2.c Index: test/Sema/constant-builtins-2.c === --- test/Sema/constant-builtins-2.c +++ test/Sema/constant-builtins-2.c @@ -176,6 +176,19 @@ char ffs5[__builtin_ffs(1U << (BITSIZE(int) - 1)) == BITSIZE(int) ? 1 : -1]; char ffs6[__builtin_ffsl(0x10L) == 5 ? 1 : -1]; char ffs7[__builtin_ffsll(0x100LL) == 9 ? 1 : -1]; + +char clrsb1[__builtin_clrsb(0) == BITSIZE(int) - 1 ? 1 : -1]; +char clrsb2[__builtin_clrsbl(0L) == BITSIZE(long) - 1 ? 1 : -1]; +char clrsb3[__builtin_clrsbll(0LL) == BITSIZE(long long) - 1 ? 1 : -1]; +char clrsb4[__builtin_clrsb(~0) == BITSIZE(int) - 1 ? 1 : -1]; +char clrsb5[__builtin_clrsbl(~0L) == BITSIZE(long) - 1 ? 1 : -1]; +char clrsb6[__builtin_clrsbll(~0LL) == BITSIZE(long long) - 1 ? 1 : -1]; +char clrsb7[__builtin_clrsb(1) == BITSIZE(int) - 2 ? 1 : -1]; +char clrsb8[__builtin_clrsb(~1) == BITSIZE(int) - 2 ? 1 : -1]; +char clrsb9[__builtin_clrsb(1 << (BITSIZE(int) - 1)) == 0 ? 1 : -1]; +char clrsb10[__builtin_clrsb(~(1 << (BITSIZE(int) - 1))) == 0 ? 1 : -1]; +char clrsb11[__builtin_clrsb(0xf) == BITSIZE(int) - 5 ? 1 : -1]; +char clrsb11[__builtin_clrsb(~0x1f) == BITSIZE(int) - 6 ? 1 : -1]; #undef BITSIZE // GCC misc stuff Index: lib/AST/ExprConstant.cpp === --- lib/AST/ExprConstant.cpp +++ lib/AST/ExprConstant.cpp @@ -8117,9 +8117,15 @@ case Builtin::BI__builtin_classify_type: return Success((int)EvaluateBuiltinClassifyType(E, Info.getLangOpts()), E); - // FIXME: BI__builtin_clrsb - // FIXME: BI__builtin_clrsbl - // FIXME: BI__builtin_clrsbll + case Builtin::BI__builtin_clrsb: + case Builtin::BI__builtin_clrsbl: + case Builtin::BI__builtin_clrsbll: { +APSInt Val; +if (!EvaluateInteger(E->getArg(0), Val, Info)) + return false; + +return Success(Val.getBitWidth() - Val.getMinSignedBits(), E); + } case Builtin::BI__builtin_clz: case Builtin::BI__builtin_clzl: Index: test/Sema/constant-builtins-2.c === --- test/Sema/constant-builtins-2.c +++ test/Sema/constant-builtins-2.c @@ -176,6 +176,19 @@ char ffs5[__builtin_ffs(1U << (BITSIZE(int) - 1)) == BITSIZE(int) ? 1 : -1]; char ffs6[__builtin_ffsl(0x10L) == 5 ? 1 : -1]; char ffs7[__builtin_ffsll(0x100LL) == 9 ? 1 : -1]; + +char clrsb1[__builtin_clrsb(0) == BITSIZE(int) - 1 ? 1 : -1]; +char clrsb2[__builtin_clrsbl(0L) == BITSIZE(long) - 1 ? 1 : -1]; +char clrsb3[__builtin_clrsbll(0LL) == BITSIZE(long long) - 1 ? 1 : -1]; +char clrsb4[__builtin_clrsb(~0) == BITSIZE(int) - 1 ? 1 : -1]; +char clrsb5[__builtin_clrsbl(~0L) == BITSIZE(long) - 1 ? 1 : -1]; +char clrsb6[__builtin_clrsbll(~0LL) == BITSIZE(long long) - 1 ? 1 : -1]; +char clrsb7[__builtin_clrsb(1) == BITSIZE(int) - 2 ? 1 : -1]; +char clrsb8[__builtin_clrsb(~1) == BITSIZE(int) - 2 ? 1 : -1]; +char clrsb9[__builtin_clrsb(1 << (BITSIZE(int) - 1)) == 0 ? 1 : -1]; +char clrsb10[__builtin_clrsb(~(1 << (BITSIZE(int) - 1))) == 0 ? 1 : -1]; +char clrsb11[__builtin_clrsb(0xf) == BITSIZE(int) - 5 ? 1 : -1]; +char clrsb11[__builtin_clrsb(~0x1f) == BITSIZE(int) - 6 ? 1 : -1]; #undef BITSIZE // GCC misc stuff Index: lib/AST/ExprConstant.cpp === --- lib/AST/ExprConstant.cpp +++ lib/AST/ExprConstant.cpp @@ -8117,9 +8117,15 @@ case Builtin::BI__builtin_classify_type: return Success((int)EvaluateBuiltinClassifyType(E, Info.getLangOpts()), E); - // FIXME: BI__builtin_clrsb - // FIXME: BI__builtin_clrsbl - // FIXME: BI__builtin_clrsbll + case Builtin::BI__builtin_clrsb: + case Builtin::BI__builtin_clrsbl: + case Builtin::BI__builtin_clrsbll: { +APSInt Val; +if (!EvaluateInteger(E->getArg(0), Val, Info)) + return false; + +return Success(Val.getBitWidth() - Val.getMinSignedBits(), E); + } case Builtin::BI__builtin_clz: case Builtin::BI__builtin_clzl: ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D50471: [Builtins] Add __bulitin_clrsb support to IntExprEvaluator::VisitBuiltinCallExpr
craig.topper updated this revision to Diff 159777. craig.topper added a comment. Fix duplicate variable name in the test. Not sure why that didn't complain https://reviews.llvm.org/D50471 Files: lib/AST/ExprConstant.cpp test/Sema/constant-builtins-2.c Index: test/Sema/constant-builtins-2.c === --- test/Sema/constant-builtins-2.c +++ test/Sema/constant-builtins-2.c @@ -176,6 +176,19 @@ char ffs5[__builtin_ffs(1U << (BITSIZE(int) - 1)) == BITSIZE(int) ? 1 : -1]; char ffs6[__builtin_ffsl(0x10L) == 5 ? 1 : -1]; char ffs7[__builtin_ffsll(0x100LL) == 9 ? 1 : -1]; + +char clrsb1[__builtin_clrsb(0) == BITSIZE(int) - 1 ? 1 : -1]; +char clrsb2[__builtin_clrsbl(0L) == BITSIZE(long) - 1 ? 1 : -1]; +char clrsb3[__builtin_clrsbll(0LL) == BITSIZE(long long) - 1 ? 1 : -1]; +char clrsb4[__builtin_clrsb(~0) == BITSIZE(int) - 1 ? 1 : -1]; +char clrsb5[__builtin_clrsbl(~0L) == BITSIZE(long) - 1 ? 1 : -1]; +char clrsb6[__builtin_clrsbll(~0LL) == BITSIZE(long long) - 1 ? 1 : -1]; +char clrsb7[__builtin_clrsb(1) == BITSIZE(int) - 2 ? 1 : -1]; +char clrsb8[__builtin_clrsb(~1) == BITSIZE(int) - 2 ? 1 : -1]; +char clrsb9[__builtin_clrsb(1 << (BITSIZE(int) - 1)) == 0 ? 1 : -1]; +char clrsb10[__builtin_clrsb(~(1 << (BITSIZE(int) - 1))) == 0 ? 1 : -1]; +char clrsb11[__builtin_clrsb(0xf) == BITSIZE(int) - 5 ? 1 : -1]; +char clrsb12[__builtin_clrsb(~0x1f) == BITSIZE(int) - 6 ? 1 : -1]; #undef BITSIZE // GCC misc stuff Index: lib/AST/ExprConstant.cpp === --- lib/AST/ExprConstant.cpp +++ lib/AST/ExprConstant.cpp @@ -8117,9 +8117,15 @@ case Builtin::BI__builtin_classify_type: return Success((int)EvaluateBuiltinClassifyType(E, Info.getLangOpts()), E); - // FIXME: BI__builtin_clrsb - // FIXME: BI__builtin_clrsbl - // FIXME: BI__builtin_clrsbll + case Builtin::BI__builtin_clrsb: + case Builtin::BI__builtin_clrsbl: + case Builtin::BI__builtin_clrsbll: { +APSInt Val; +if (!EvaluateInteger(E->getArg(0), Val, Info)) + return false; + +return Success(Val.getBitWidth() - Val.getMinSignedBits(), E); + } case Builtin::BI__builtin_clz: case Builtin::BI__builtin_clzl: Index: test/Sema/constant-builtins-2.c === --- test/Sema/constant-builtins-2.c +++ test/Sema/constant-builtins-2.c @@ -176,6 +176,19 @@ char ffs5[__builtin_ffs(1U << (BITSIZE(int) - 1)) == BITSIZE(int) ? 1 : -1]; char ffs6[__builtin_ffsl(0x10L) == 5 ? 1 : -1]; char ffs7[__builtin_ffsll(0x100LL) == 9 ? 1 : -1]; + +char clrsb1[__builtin_clrsb(0) == BITSIZE(int) - 1 ? 1 : -1]; +char clrsb2[__builtin_clrsbl(0L) == BITSIZE(long) - 1 ? 1 : -1]; +char clrsb3[__builtin_clrsbll(0LL) == BITSIZE(long long) - 1 ? 1 : -1]; +char clrsb4[__builtin_clrsb(~0) == BITSIZE(int) - 1 ? 1 : -1]; +char clrsb5[__builtin_clrsbl(~0L) == BITSIZE(long) - 1 ? 1 : -1]; +char clrsb6[__builtin_clrsbll(~0LL) == BITSIZE(long long) - 1 ? 1 : -1]; +char clrsb7[__builtin_clrsb(1) == BITSIZE(int) - 2 ? 1 : -1]; +char clrsb8[__builtin_clrsb(~1) == BITSIZE(int) - 2 ? 1 : -1]; +char clrsb9[__builtin_clrsb(1 << (BITSIZE(int) - 1)) == 0 ? 1 : -1]; +char clrsb10[__builtin_clrsb(~(1 << (BITSIZE(int) - 1))) == 0 ? 1 : -1]; +char clrsb11[__builtin_clrsb(0xf) == BITSIZE(int) - 5 ? 1 : -1]; +char clrsb12[__builtin_clrsb(~0x1f) == BITSIZE(int) - 6 ? 1 : -1]; #undef BITSIZE // GCC misc stuff Index: lib/AST/ExprConstant.cpp === --- lib/AST/ExprConstant.cpp +++ lib/AST/ExprConstant.cpp @@ -8117,9 +8117,15 @@ case Builtin::BI__builtin_classify_type: return Success((int)EvaluateBuiltinClassifyType(E, Info.getLangOpts()), E); - // FIXME: BI__builtin_clrsb - // FIXME: BI__builtin_clrsbl - // FIXME: BI__builtin_clrsbll + case Builtin::BI__builtin_clrsb: + case Builtin::BI__builtin_clrsbl: + case Builtin::BI__builtin_clrsbll: { +APSInt Val; +if (!EvaluateInteger(E->getArg(0), Val, Info)) + return false; + +return Success(Val.getBitWidth() - Val.getMinSignedBits(), E); + } case Builtin::BI__builtin_clz: case Builtin::BI__builtin_clzl: ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D50678: [InlineAsm] Update the min-legal-vector-width function attribute based on inputs and outputs to inline assembly
craig.topper created this revision. craig.topper added reviewers: chandlerc, rsmith, rnk. Herald added a subscriber: eraman. Another piece of my ongoing to work for prefer-vector-width. min-legal-vector-width will eventually be used by the X86 backend to know whether it needs to make 512 bits type legal when prefer-vector-width=256. If the user used inline assembly that passed in/out a 512-bit register, we need to make sure 512 bits are considered legal. Otherwise we'll get an assert failure when we try to wire up the inline assembly to the rest of the code. This patch just checks the LLVM IR types to see if they are vectors and then updates the attribute based on their total width. I'm not sure if this is the best way to do this or if there's any subtlety I might have missed. So if anyone has other opinions on how to do this I'm open to suggestions. https://reviews.llvm.org/D50678 Files: lib/CodeGen/CGStmt.cpp test/CodeGen/x86-inline-asm-min-vector-width.c Index: test/CodeGen/x86-inline-asm-min-vector-width.c === --- /dev/null +++ test/CodeGen/x86-inline-asm-min-vector-width.c @@ -0,0 +1,45 @@ +// RUN: %clang_cc1 %s -triple x86_64-unknown-linux-gnu -emit-llvm -target-feature +avx512f -o - | FileCheck %s + +typedef long long __m128i __attribute__ ((vector_size (16))); +typedef long long __m256i __attribute__ ((vector_size (32))); +typedef long long __m512i __attribute__ ((vector_size (64))); + +// CHECK: define <2 x i64> @testXMMout(<2 x i64>* %p) #0 +__m128i testXMMout(__m128i *p) { + __m128i xmm0; + __asm__("vmovdqu %1, %0" :"=v"(xmm0) : "m"(*(__m128i*)p)); + return xmm0; +} + +// CHECK: define <4 x i64> @testYMMout(<4 x i64>* %p) #1 +__m256i testYMMout(__m256i *p) { + __m256i ymm0; + __asm__("vmovdqu %1, %0" :"=v"(ymm0) : "m"(*(__m256i*)p)); + return ymm0; +} + +// CHECK: define <8 x i64> @testZMMout(<8 x i64>* %p) #2 +__m512i testZMMout(__m512i *p) { + __m512i zmm0; + __asm__("vmovdqu64 %1, %0" :"=v"(zmm0) : "m"(*(__m512i*)p)); + return zmm0; +} + +// CHECK: define void @testXMMin(<2 x i64> %xmm0, <2 x i64>* %p) #0 +void testXMMin(__m128i xmm0, __m128i *p) { + __asm__("vmovdqu %0, %1" : : "v"(xmm0), "m"(*(__m128i*)p)); +} + +// CHECK: define void @testYMMin(<4 x i64> %ymm0, <4 x i64>* %p) #1 +void testYMMin(__m256i ymm0, __m256i *p) { + __asm__("vmovdqu %0, %1" : : "v"(ymm0), "m"(*(__m256i*)p)); +} + +// CHECK: define void @testZMMin(<8 x i64> %zmm0, <8 x i64>* %p) #2 +void testZMMin(__m512i zmm0, __m512i *p) { + __asm__("vmovdqu64 %0, %1" : : "v"(zmm0), "m"(*(__m512i*)p)); +} + +// CHECK: attributes #0 = {{.*}}"min-legal-vector-width"="128" +// CHECK: attributes #1 = {{.*}}"min-legal-vector-width"="256" +// CHECK: attributes #2 = {{.*}}"min-legal-vector-width"="512" Index: lib/CodeGen/CGStmt.cpp === --- lib/CodeGen/CGStmt.cpp +++ lib/CodeGen/CGStmt.cpp @@ -1979,6 +1979,11 @@ diag::err_asm_invalid_type_in_input) << OutExpr->getType() << OutputConstraint; } + + // Update largest vector width for any vector types. + if (auto *VT = dyn_cast(ResultRegTypes.back())) +LargestVectorWidth = std::max(LargestVectorWidth, + VT->getPrimitiveSizeInBits()); } else { ArgTypes.push_back(Dest.getAddress().getType()); Args.push_back(Dest.getPointer()); @@ -2000,6 +2005,10 @@ Arg->getType())) Arg = Builder.CreateBitCast(Arg, AdjTy); + // Update largest vector width for any vector types. + if (auto *VT = dyn_cast(Arg->getType())) +LargestVectorWidth = std::max(LargestVectorWidth, + VT->getPrimitiveSizeInBits()); if (Info.allowsRegister()) InOutConstraints += llvm::utostr(i); else @@ -2080,6 +2089,11 @@ CGM.getDiags().Report(S.getAsmLoc(), diag::err_asm_invalid_type_in_input) << InputExpr->getType() << InputConstraint; +// Update largest vector width for any vector types. +if (auto *VT = dyn_cast(Arg->getType())) + LargestVectorWidth = std::max(LargestVectorWidth, +VT->getPrimitiveSizeInBits()); + ArgTypes.push_back(Arg->getType()); Args.push_back(Arg); Constraints += InputConstraint; Index: test/CodeGen/x86-inline-asm-min-vector-width.c === --- /dev/null +++ test/CodeGen/x86-inline-asm-min-vector-width.c @@ -0,0 +1,45 @@ +// RUN: %clang_cc1 %s -triple x86_64-unknown-linux-gnu -emit-llvm -target-feature +avx512f -o - | FileCheck %s + +typedef long long __m128i __attribute__ ((vector_size (16))); +typedef long long __m256i __attribute__ ((vector_size (32))); +typedef long long __m512i __attribute__ ((vector_size (64))); + +// CHECK: define <2 x i64> @testXMMout(<
[PATCH] D46892: [X86] Lowering addus/subus intrinsics to native IR (Clang part)
craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM Repository: rC Clang https://reviews.llvm.org/D46892 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D50815: Establish the header
craig.topper added inline comments. Comment at: include/bit:145 + static_assert(sizeof(unsigned) == 4, ""); + return __popcnt(__x); +#endif How does this work on pre-Haswell X86 CPUs? Doesn't MSVC just blindly emit the popcnt instruction when it sees this? https://reviews.llvm.org/D50815 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D50815: Establish the header
craig.topper added inline comments. Comment at: include/bit:163 + static_assert(sizeof(unsigned long long) == 8, ""); + return __popcnt64(__x); +#endif I don't think __popcnt64 exists in MSVC when targeting 32-bit mode. https://reviews.llvm.org/D50815 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D50907: Make __shiftleft128 / __shiftright128 real compiler built-ins.
craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM https://reviews.llvm.org/D50907 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D50876: Clean up newly created header
craig.topper added inline comments. Comment at: include/bit:140 static_assert(sizeof(unsigned) == 4, ""); return __popcnt(__x); } MSVC blindly uses the popcnt instruction whenever it sees this intrinsic. So this only works on Nehalem and newer Intel CPUs. And Barcelona and newer AMD CPUs. This is why llvm uses a bit math version of popcnt for MSVC in include/llvm/Support/MathExtras.h Comment at: include/bit:150 static_assert(sizeof(unsigned long long) == 8, ""); return __popcnt64(__x); } This doesn't exist in 32-bit MSVC. https://reviews.llvm.org/D50876 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46540: [X86] ptwrite intrinsic
craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM https://reviews.llvm.org/D46540 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46541: [CodeGen] Improve diagnostics related to target attributes
craig.topper added inline comments. Comment at: lib/CodeGen/CodeGenFunction.cpp:2346 + return Feat.substr(1) == F.getKey(); + })) +ReqFeatures.insert(ReqFeatures.begin(), F.getKey()); This and the next line are indented funny. https://reviews.llvm.org/D46541 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46541: [CodeGen] Improve diagnostics related to target attributes
craig.topper added inline comments. Comment at: lib/CodeGen/CodeGenFunction.cpp:2342 // Only positive features are "required". - if (F.getValue()) + if (F.getValue()) { +if (std::any_of(ParsedAttr.Features.begin(), ParsedAttr.Features.end(), Rather than walking the ParsedAttr.Features for each feature in the map. And having to shift the ReqFeatures vectors sometimes. How about doing this -Walk through all features in ParsedAttr, for each feature with a +, query the callee feature map. If it's enabled there, push it to ReqFeatures. -Walk through all features in the callee feature map and if enabled push those. This will lead to duplicates in the list, but all the explicitly mentioned features will be listed first. https://reviews.llvm.org/D46541 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46349: [X86] Mark builtins 'const' where possible
craig.topper added a comment. Ping https://reviews.llvm.org/D46349 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46541: [CodeGen] Improve diagnostics related to target attributes
craig.topper added a comment. This looks pretty good to me. @echristo what do you think? https://reviews.llvm.org/D46541 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46656: [Builtins] Improve the IR emitted for MSVC compatible rotr/rotl builtins to match what the middle and backends understand
craig.topper created this revision. craig.topper added a reviewer: spatel. Currently we emit something like rotl(x, n) { n &= bitwidth -1; return n != 0 ? ((x << n) | (x >> (bitwidth - n)) : x; } We use a select to avoid the undefined behavior on the (bitwidth - n) shift. The middle and backend don't really recognize this as a rotate and end up emitting a cmov or control flow because of the select. A better pattern is (x << (n & mask)) | (x << (-n & mask)) where mask is bitwidth - 1. Fixes the main complaint in PR37387. There's still some work to be done if the user writes that sequence directly on a short or char where type promotion rules can prevent it from being recognized. The builtin is emitting direct IR with unpromoted types so that isn't a problem for it. https://reviews.llvm.org/D46656 Files: lib/CodeGen/CGBuiltin.cpp test/CodeGen/ms-intrinsics-rotations.c Index: test/CodeGen/ms-intrinsics-rotations.c === --- test/CodeGen/ms-intrinsics-rotations.c +++ test/CodeGen/ms-intrinsics-rotations.c @@ -30,69 +30,64 @@ return _rotl8(value, shift); } // CHECK: i8 @test_rotl8 -// CHECK: [[SHIFT:%[0-9]+]] = and i8 %{{[0-9]+}}, 7 -// CHECK: [[NEGSHIFT:%[0-9]+]] = sub i8 8, [[SHIFT]] -// CHECK: [[HIGH:%[0-9]+]] = shl i8 [[VALUE:%[0-9]+]], [[SHIFT]] -// CHECK: [[LOW:%[0-9]+]] = lshr i8 [[VALUE]], [[NEGSHIFT]] -// CHECK: [[ROTATED:%[0-9]+]] = or i8 [[HIGH]], [[LOW]] -// CHECK: [[ISZERO:%[0-9]+]] = icmp eq i8 [[SHIFT]], 0 -// CHECK: [[RESULT:%[0-9]+]] = select i1 [[ISZERO]], i8 [[VALUE]], i8 [[ROTATED]] +// CHECK: [[LSHIFT:%[0-9]+]] = and i8 [[SHIFT:%[0-9]+]], 7 +// CHECK: [[HIGH:%[0-9]+]] = shl i8 [[VALUE:%[0-9]+]], [[LSHIFT]] +// CHECK: [[NEGATE:%[0-9]+]] = sub i8 0, [[SHIFT]] +// CHECK: [[RSHIFT:%[0-9]+]] = and i8 [[NEGATE]], 7 +// CHECK: [[LOW:%[0-9]+]] = lshr i8 [[VALUE]], [[RSHIFT]] +// CHECK: [[RESULT:%[0-9]+]] = or i8 [[HIGH]], [[LOW]] // CHECK: ret i8 [[RESULT]] // CHECK } unsigned short test_rotl16(unsigned short value, unsigned char shift) { return _rotl16(value, shift); } // CHECK: i16 @test_rotl16 -// CHECK: [[SHIFT:%[0-9]+]] = and i16 %{{[0-9]+}}, 15 -// CHECK: [[NEGSHIFT:%[0-9]+]] = sub i16 16, [[SHIFT]] -// CHECK: [[HIGH:%[0-9]+]] = shl i16 [[VALUE:%[0-9]+]], [[SHIFT]] -// CHECK: [[LOW:%[0-9]+]] = lshr i16 [[VALUE]], [[NEGSHIFT]] -// CHECK: [[ROTATED:%[0-9]+]] = or i16 [[HIGH]], [[LOW]] -// CHECK: [[ISZERO:%[0-9]+]] = icmp eq i16 [[SHIFT]], 0 -// CHECK: [[RESULT:%[0-9]+]] = select i1 [[ISZERO]], i16 [[VALUE]], i16 [[ROTATED]] +// CHECK: [[LSHIFT:%[0-9]+]] = and i16 [[SHIFT:%[0-9]+]], 15 +// CHECK: [[HIGH:%[0-9]+]] = shl i16 [[VALUE:%[0-9]+]], [[LSHIFT]] +// CHECK: [[NEGATE:%[0-9]+]] = sub i16 0, [[SHIFT]] +// CHECK: [[RSHIFT:%[0-9]+]] = and i16 [[NEGATE]], 15 +// CHECK: [[LOW:%[0-9]+]] = lshr i16 [[VALUE]], [[RSHIFT]] +// CHECK: [[RESULT:%[0-9]+]] = or i16 [[HIGH]], [[LOW]] // CHECK: ret i16 [[RESULT]] // CHECK } unsigned int test_rotl(unsigned int value, int shift) { return _rotl(value, shift); } // CHECK: i32 @test_rotl -// CHECK: [[SHIFT:%[0-9]+]] = and i32 %{{[0-9]+}}, 31 -// CHECK: [[NEGSHIFT:%[0-9]+]] = sub i32 32, [[SHIFT]] -// CHECK: [[HIGH:%[0-9]+]] = shl i32 [[VALUE:%[0-9]+]], [[SHIFT]] -// CHECK: [[LOW:%[0-9]+]] = lshr i32 [[VALUE]], [[NEGSHIFT]] -// CHECK: [[ROTATED:%[0-9]+]] = or i32 [[HIGH]], [[LOW]] -// CHECK: [[ISZERO:%[0-9]+]] = icmp eq i32 [[SHIFT]], 0 -// CHECK: [[RESULT:%[0-9]+]] = select i1 [[ISZERO]], i32 [[VALUE]], i32 [[ROTATED]] +// CHECK: [[LSHIFT:%[0-9]+]] = and i32 [[SHIFT:%[0-9]+]], 31 +// CHECK: [[HIGH:%[0-9]+]] = shl i32 [[VALUE:%[0-9]+]], [[LSHIFT]] +// CHECK: [[NEGATE:%[0-9]+]] = sub i32 0, [[SHIFT]] +// CHECK: [[RSHIFT:%[0-9]+]] = and i32 [[NEGATE]], 31 +// CHECK: [[LOW:%[0-9]+]] = lshr i32 [[VALUE]], [[RSHIFT]] +// CHECK: [[RESULT:%[0-9]+]] = or i32 [[HIGH]], [[LOW]] // CHECK: ret i32 [[RESULT]] // CHECK } unsigned LONG test_lrotl(unsigned LONG value, int shift) { return _lrotl(value, shift); } // CHECK-32BIT-LONG: i32 @test_lrotl -// CHECK-32BIT-LONG: [[SHIFT:%[0-9]+]] = and i32 %{{[0-9]+}}, 31 -// CHECK-32BIT-LONG: [[NEGSHIFT:%[0-9]+]] = sub i32 32, [[SHIFT]] -// CHECK-32BIT-LONG: [[HIGH:%[0-9]+]] = shl i32 [[VALUE:%[0-9]+]], [[SHIFT]] -// CHECK-32BIT-LONG: [[LOW:%[0-9]+]] = lshr i32 [[VALUE]], [[NEGSHIFT]] -// CHECK-32BIT-LONG: [[ROTATED:%[0-9]+]] = or i32 [[HIGH]], [[LOW]] -// CHECK-32BIT-LONG: [[ISZERO:%[0-9]+]] = icmp eq i32 [[SHIFT]], 0 -// CHECK-32BIT-LONG: [[RESULT:%[0-9]+]] = select i1 [[ISZERO]], i32 [[VALUE]], i32 [[ROTATED]] +// CHECK-32BIT-LONG: [[LSHIFT:%[0-9]+]] = and i32 [[SHIFT:%[0-9]+]], 31 +// CHECK-32BIT-LONG: [[HIGH:%[0-9]+]] = shl i32 [[VALUE:%[0-9]+]], [[LSHIFT]] +// CHECK-32BIT-LONG: [[NEGATE:%[0-9]+]] = sub i32 0, [[SHIFT]] +// CHECK-32BIT-LONG: [[RSHIFT:%[0-9]+]] = and i32 [[NEGATE]], 31 +// CHECK-32BIT-LONG: [[
[PATCH] D46683: [X86] Assume alignment of movdir64b dst argument
craig.topper added a comment. What effect does this have? Repository: rC Clang https://reviews.llvm.org/D46683 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46683: [X86] Assume alignment of movdir64b dst argument
craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM Repository: rC Clang https://reviews.llvm.org/D46683 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46742: [X86] Use __builtin_convertvector to replace some of the avx512 truncate builtins.
craig.topper created this revision. craig.topper added reviewers: RKSimon, GBuella, tkrupa. As long as the destination type is a 256 or 128 bit vector we can use __builtin_convertvector to directly generate trunc IR instruction which will be handled natively by the backend. Repository: rC Clang https://reviews.llvm.org/D46742 Files: include/clang/Basic/BuiltinsX86.def lib/Headers/avx512bwintrin.h lib/Headers/avx512fintrin.h lib/Headers/avx512vlbwintrin.h lib/Headers/avx512vlintrin.h test/CodeGen/avx512bw-builtins.c test/CodeGen/avx512f-builtins.c test/CodeGen/avx512vl-builtins.c test/CodeGen/avx512vlbw-builtins.c Index: test/CodeGen/avx512vlbw-builtins.c === --- test/CodeGen/avx512vlbw-builtins.c +++ test/CodeGen/avx512vlbw-builtins.c @@ -1804,19 +1804,21 @@ __m128i test_mm256_cvtepi16_epi8(__m256i __A) { // CHECK-LABEL: @test_mm256_cvtepi16_epi8 - // CHECK: @llvm.x86.avx512.mask.pmov.wb.256 + // CHECK: trunc <16 x i16> %{{.*}} to <16 x i8> return _mm256_cvtepi16_epi8(__A); } __m128i test_mm256_mask_cvtepi16_epi8(__m128i __O, __mmask16 __M, __m256i __A) { // CHECK-LABEL: @test_mm256_mask_cvtepi16_epi8 - // CHECK: @llvm.x86.avx512.mask.pmov.wb.256 + // CHECK: trunc <16 x i16> %{{.*}} to <16 x i8> + // CHECK: select <16 x i1> %{{.*}}, <16 x i8> %{{.*}}, <16 x i8> %{{.*}} return _mm256_mask_cvtepi16_epi8(__O, __M, __A); } __m128i test_mm256_maskz_cvtepi16_epi8(__mmask16 __M, __m256i __A) { // CHECK-LABEL: @test_mm256_maskz_cvtepi16_epi8 - // CHECK: @llvm.x86.avx512.mask.pmov.wb.256 + // CHECK: trunc <16 x i16> %{{.*}} to <16 x i8> + // CHECK: select <16 x i1> %{{.*}}, <16 x i8> %{{.*}}, <16 x i8> %{{.*}} return _mm256_maskz_cvtepi16_epi8(__M, __A); } Index: test/CodeGen/avx512vl-builtins.c === --- test/CodeGen/avx512vl-builtins.c +++ test/CodeGen/avx512vl-builtins.c @@ -6577,19 +6577,21 @@ __m128i test_mm256_cvtepi32_epi16(__m256i __A) { // CHECK-LABEL: @test_mm256_cvtepi32_epi16 - // CHECK: @llvm.x86.avx512.mask.pmov.dw.256 + // CHECK: trunc <8 x i32> %{{.*}} to <8 x i16> return _mm256_cvtepi32_epi16(__A); } __m128i test_mm256_mask_cvtepi32_epi16(__m128i __O, __mmask8 __M, __m256i __A) { // CHECK-LABEL: @test_mm256_mask_cvtepi32_epi16 - // CHECK: @llvm.x86.avx512.mask.pmov.dw.256 + // CHECK: trunc <8 x i32> %{{.*}} to <8 x i16> + // CHECK: select <8 x i1> %{{.*}}, <8 x i16> %{{.*}}, <8 x i16> %{{.*}} return _mm256_mask_cvtepi32_epi16(__O, __M, __A); } __m128i test_mm256_maskz_cvtepi32_epi16(__mmask8 __M, __m256i __A) { // CHECK-LABEL: @test_mm256_maskz_cvtepi32_epi16 - // CHECK: @llvm.x86.avx512.mask.pmov.dw.256 + // CHECK: trunc <8 x i32> %{{.*}} to <8 x i16> + // CHECK: select <8 x i1> %{{.*}}, <8 x i16> %{{.*}}, <8 x i16> %{{.*}} return _mm256_maskz_cvtepi32_epi16(__M, __A); } @@ -6673,19 +6675,21 @@ __m128i test_mm256_cvtepi64_epi32(__m256i __A) { // CHECK-LABEL: @test_mm256_cvtepi64_epi32 - // CHECK: @llvm.x86.avx512.mask.pmov.qd.256 + // CHECK: trunc <4 x i64> %{{.*}} to <4 x i32> return _mm256_cvtepi64_epi32(__A); } __m128i test_mm256_mask_cvtepi64_epi32(__m128i __O, __mmask8 __M, __m256i __A) { // CHECK-LABEL: @test_mm256_mask_cvtepi64_epi32 - // CHECK: @llvm.x86.avx512.mask.pmov.qd.256 + // CHECK: trunc <4 x i64> %{{.*}} to <4 x i32> + // CHECK: select <4 x i1> %{{.*}}, <4 x i32> %{{.*}}, <4 x i32> %{{.*}} return _mm256_mask_cvtepi64_epi32(__O, __M, __A); } __m128i test_mm256_maskz_cvtepi64_epi32(__mmask8 __M, __m256i __A) { // CHECK-LABEL: @test_mm256_maskz_cvtepi64_epi32 - // CHECK: @llvm.x86.avx512.mask.pmov.qd.256 + // CHECK: trunc <4 x i64> %{{.*}} to <4 x i32> + // CHECK: select <4 x i1> %{{.*}}, <4 x i32> %{{.*}}, <4 x i32> %{{.*}} return _mm256_maskz_cvtepi64_epi32(__M, __A); } Index: test/CodeGen/avx512f-builtins.c === --- test/CodeGen/avx512f-builtins.c +++ test/CodeGen/avx512f-builtins.c @@ -5102,19 +5102,21 @@ __m128i test_mm512_cvtepi32_epi8(__m512i __A) { // CHECK-LABEL: @test_mm512_cvtepi32_epi8 - // CHECK: @llvm.x86.avx512.mask.pmov.db.512 + // CHECK: trunc <16 x i32> %{{.*}} to <16 x i8> return _mm512_cvtepi32_epi8(__A); } __m128i test_mm512_mask_cvtepi32_epi8(__m128i __O, __mmask16 __M, __m512i __A) { // CHECK-LABEL: @test_mm512_mask_cvtepi32_epi8 - // CHECK: @llvm.x86.avx512.mask.pmov.db.512 + // CHECK: trunc <16 x i32> %{{.*}} to <16 x i8> + // CHECK: select <16 x i1> %{{.*}}, <16 x i8> %{{.*}}, <16 x i8> %{{.*}} return _mm512_mask_cvtepi32_epi8(__O, __M, __A); } __m128i test_mm512_maskz_cvtepi32_epi8(__mmask16 __M, __m512i __A) { // CHECK-LABEL: @test_mm512_maskz_cvtepi32_epi8 - // CHECK: @llvm.x86.avx512.mask.pmov.db.512 + // CHECK: trunc <16 x i32> %{{.*}} to <16 x i8> + // CHECK: select <16 x i1> %
[PATCH] D46742: [X86] Use __builtin_convertvector to replace some of the avx512 truncate builtins.
craig.topper added a comment. Yeah the others will need codegen work. So I'm starting with the easy cases. Repository: rC Clang https://reviews.llvm.org/D46742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46863: [X86] Use __builtin_convertvector to implement some of the packed integer to packed flow conversion intrinsics.
craig.topper created this revision. craig.topper added reviewers: RKSimon, spatel. I believe this is safe assuming default rounding mode. The conversion might be inexact, but it can never overflow the FP type so this shouldn't be undefined behavior for the uitofp/sitofp instructions. We already do something similar for scalar conversions. Repository: rC Clang https://reviews.llvm.org/D46863 Files: include/clang/Basic/BuiltinsX86.def lib/Headers/avx512dqintrin.h lib/Headers/avx512fintrin.h lib/Headers/avx512vldqintrin.h lib/Headers/avx512vlintrin.h lib/Headers/avxintrin.h lib/Headers/emmintrin.h test/CodeGen/avx-builtins.c test/CodeGen/avx512dq-builtins.c test/CodeGen/avx512f-builtins.c test/CodeGen/avx512vl-builtins.c test/CodeGen/avx512vldq-builtins.c test/CodeGen/builtins-x86.c test/CodeGen/sse2-builtins.c Index: test/CodeGen/sse2-builtins.c === --- test/CodeGen/sse2-builtins.c +++ test/CodeGen/sse2-builtins.c @@ -468,7 +468,7 @@ __m128 test_mm_cvtepi32_ps(__m128i A) { // CHECK-LABEL: test_mm_cvtepi32_ps - // CHECK: call <4 x float> @llvm.x86.sse2.cvtdq2ps(<4 x i32> %{{.*}}) + // CHECK: sitofp <4 x i32> %{{.*}} to <4 x float> return _mm_cvtepi32_ps(A); } Index: test/CodeGen/builtins-x86.c === --- test/CodeGen/builtins-x86.c +++ test/CodeGen/builtins-x86.c @@ -338,7 +338,6 @@ tmp_V2LLi = __builtin_ia32_psadbw128(tmp_V16c, tmp_V16c); tmp_V2d = __builtin_ia32_sqrtpd(tmp_V2d); tmp_V2d = __builtin_ia32_sqrtsd(tmp_V2d); - tmp_V4f = __builtin_ia32_cvtdq2ps(tmp_V4i); tmp_V2LLi = __builtin_ia32_cvtpd2dq(tmp_V2d); tmp_V2i = __builtin_ia32_cvtpd2pi(tmp_V2d); tmp_V4f = __builtin_ia32_cvtpd2ps(tmp_V2d); Index: test/CodeGen/avx512vldq-builtins.c === --- test/CodeGen/avx512vldq-builtins.c +++ test/CodeGen/avx512vldq-builtins.c @@ -421,37 +421,41 @@ __m128d test_mm_cvtepi64_pd(__m128i __A) { // CHECK-LABEL: @test_mm_cvtepi64_pd - // CHECK: @llvm.x86.avx512.mask.cvtqq2pd.128 + // CHECK: sitofp <2 x i64> %{{.*}} to <2 x double> return _mm_cvtepi64_pd(__A); } __m128d test_mm_mask_cvtepi64_pd(__m128d __W, __mmask8 __U, __m128i __A) { // CHECK-LABEL: @test_mm_mask_cvtepi64_pd - // CHECK: @llvm.x86.avx512.mask.cvtqq2pd.128 + // CHECK: sitofp <2 x i64> %{{.*}} to <2 x double> + // CHECK: select <2 x i1> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}} return _mm_mask_cvtepi64_pd(__W, __U, __A); } __m128d test_mm_maskz_cvtepi64_pd(__mmask8 __U, __m128i __A) { // CHECK-LABEL: @test_mm_maskz_cvtepi64_pd - // CHECK: @llvm.x86.avx512.mask.cvtqq2pd.128 + // CHECK: sitofp <2 x i64> %{{.*}} to <2 x double> + // CHECK: select <2 x i1> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}} return _mm_maskz_cvtepi64_pd(__U, __A); } __m256d test_mm256_cvtepi64_pd(__m256i __A) { // CHECK-LABEL: @test_mm256_cvtepi64_pd - // CHECK: @llvm.x86.avx512.mask.cvtqq2pd.256 + // CHECK: sitofp <4 x i64> %{{.*}} to <4 x double> return _mm256_cvtepi64_pd(__A); } __m256d test_mm256_mask_cvtepi64_pd(__m256d __W, __mmask8 __U, __m256i __A) { // CHECK-LABEL: @test_mm256_mask_cvtepi64_pd - // CHECK: @llvm.x86.avx512.mask.cvtqq2pd.256 + // CHECK: sitofp <4 x i64> %{{.*}} to <4 x double> + // CHECK: select <4 x i1> %{{.*}}, <4 x double> %{{.*}}, <4 x double> %{{.*}} return _mm256_mask_cvtepi64_pd(__W, __U, __A); } __m256d test_mm256_maskz_cvtepi64_pd(__mmask8 __U, __m256i __A) { // CHECK-LABEL: @test_mm256_maskz_cvtepi64_pd - // CHECK: @llvm.x86.avx512.mask.cvtqq2pd.256 + // CHECK: sitofp <4 x i64> %{{.*}} to <4 x double> + // CHECK: select <4 x i1> %{{.*}}, <4 x double> %{{.*}}, <4 x double> %{{.*}} return _mm256_maskz_cvtepi64_pd(__U, __A); } @@ -637,37 +641,41 @@ __m128d test_mm_cvtepu64_pd(__m128i __A) { // CHECK-LABEL: @test_mm_cvtepu64_pd - // CHECK: @llvm.x86.avx512.mask.cvtuqq2pd.128 + // CHECK: uitofp <2 x i64> %{{.*}} to <2 x double> return _mm_cvtepu64_pd(__A); } __m128d test_mm_mask_cvtepu64_pd(__m128d __W, __mmask8 __U, __m128i __A) { // CHECK-LABEL: @test_mm_mask_cvtepu64_pd - // CHECK: @llvm.x86.avx512.mask.cvtuqq2pd.128 + // CHECK: uitofp <2 x i64> %{{.*}} to <2 x double> + // CHECK: select <2 x i1> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}} return _mm_mask_cvtepu64_pd(__W, __U, __A); } __m128d test_mm_maskz_cvtepu64_pd(__mmask8 __U, __m128i __A) { // CHECK-LABEL: @test_mm_maskz_cvtepu64_pd - // CHECK: @llvm.x86.avx512.mask.cvtuqq2pd.128 + // CHECK: uitofp <2 x i64> %{{.*}} to <2 x double> + // CHECK: select <2 x i1> %{{.*}}, <2 x double> %{{.*}}, <2 x double> %{{.*}} return _mm_maskz_cvtepu64_pd(__U, __A); } __m256d test_mm256_cvtepu64_pd(__m256i __A) { // CHECK-LABEL: @test_mm256_cvtepu64_pd - // CHECK: @llvm.x86.avx512.mask.cvtuqq2pd.256 + //
[PATCH] D46541: [CodeGen] Improve diagnostics related to target attributes
craig.topper added a comment. Ping @echristo https://reviews.llvm.org/D46541 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46881: [X86][CET] Changing -fcf-protection behavior to comply with gcc (clang part)
craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM Repository: rC Clang https://reviews.llvm.org/D46881 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46881: [X86][CET] Changing -fcf-protection behavior to comply with gcc (clang part)
craig.topper added a comment. LGTM https://reviews.llvm.org/D46881 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D47029: [X86] Remove some preprocessor feature checks from intrinsic headers
craig.topper created this revision. craig.topper added reviewers: echristo, RKSimon, spatel. These look to be a couple things that weren't remvoed when we switched to target attribute. The popcnt makes including just smmintrin.h also include popcntintrin.h. The popcnt file itself already contains target attrributes. The prefetch ones are just wrappers around __builtin_prefetch which we have graceful fallbacks for in the backend if the exact instruction isn't available. So there's no reason to hide them. And it makes them available in functions that have the write target attribute but not a -march command line flag. https://reviews.llvm.org/D47029 Files: lib/Headers/prfchwintrin.h lib/Headers/smmintrin.h Index: lib/Headers/smmintrin.h === --- lib/Headers/smmintrin.h +++ lib/Headers/smmintrin.h @@ -2458,8 +2458,6 @@ #undef __DEFAULT_FN_ATTRS -#ifdef __POPCNT__ #include -#endif #endif /* __SMMINTRIN_H */ Index: lib/Headers/prfchwintrin.h === --- lib/Headers/prfchwintrin.h +++ lib/Headers/prfchwintrin.h @@ -28,7 +28,6 @@ #ifndef __PRFCHWINTRIN_H #define __PRFCHWINTRIN_H -#if defined(__PRFCHW__) || defined(__3dNOW__) /// Loads a memory sequence containing the specified memory address into ///all data cache levels. The cache-coherency state is set to exclusive. ///Data can be read from and written to the cache line without additional @@ -66,6 +65,5 @@ { __builtin_prefetch (__P, 1, 3 /* _MM_HINT_T0 */); } -#endif #endif /* __PRFCHWINTRIN_H */ Index: lib/Headers/smmintrin.h === --- lib/Headers/smmintrin.h +++ lib/Headers/smmintrin.h @@ -2458,8 +2458,6 @@ #undef __DEFAULT_FN_ATTRS -#ifdef __POPCNT__ #include -#endif #endif /* __SMMINTRIN_H */ Index: lib/Headers/prfchwintrin.h === --- lib/Headers/prfchwintrin.h +++ lib/Headers/prfchwintrin.h @@ -28,7 +28,6 @@ #ifndef __PRFCHWINTRIN_H #define __PRFCHWINTRIN_H -#if defined(__PRFCHW__) || defined(__3dNOW__) /// Loads a memory sequence containing the specified memory address into ///all data cache levels. The cache-coherency state is set to exclusive. ///Data can be read from and written to the cache line without additional @@ -66,6 +65,5 @@ { __builtin_prefetch (__P, 1, 3 /* _MM_HINT_T0 */); } -#endif #endif /* __PRFCHWINTRIN_H */ ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D47125: [X86] Remove masking from pternlog llvm intrinsics and use a select instruction instead.
craig.topper created this revision. craig.topper added reviewers: RKSimon, spatel, GBuella. Because the intrinsics in the headers are implemented as macros, we can't just use a select builtin and pternlog builtin. This would require one of the macro arguments to be used twice. Depending on what was passed to the macro we could expand an expression twice leading to weird behavior. We could maybe declare our local variable in the macro, but that would need to worry about name collisions. To avoid that just generate IR directly in CGBuiltin.cpp. Repository: rC Clang https://reviews.llvm.org/D47125 Files: lib/CodeGen/CGBuiltin.cpp test/CodeGen/avx512f-builtins.c test/CodeGen/avx512vl-builtins.c Index: test/CodeGen/avx512vl-builtins.c === --- test/CodeGen/avx512vl-builtins.c +++ test/CodeGen/avx512vl-builtins.c @@ -5604,73 +5604,81 @@ __m128i test_mm_ternarylogic_epi32(__m128i __A, __m128i __B, __m128i __C) { // CHECK-LABEL: @test_mm_ternarylogic_epi32 - // CHECK: @llvm.x86.avx512.mask.pternlog.d.128 + // CHECK: @llvm.x86.avx512.pternlog.d.128 return _mm_ternarylogic_epi32(__A, __B, __C, 4); } __m128i test_mm_mask_ternarylogic_epi32(__m128i __A, __mmask8 __U, __m128i __B, __m128i __C) { // CHECK-LABEL: @test_mm_mask_ternarylogic_epi32 - // CHECK: @llvm.x86.avx512.mask.pternlog.d.128 + // CHECK: @llvm.x86.avx512.pternlog.d.128 + // CHECK: select <4 x i1> %{{.*}}, <4 x i32> %{{.*}}, <4 x i32> %{{.*}} return _mm_mask_ternarylogic_epi32(__A, __U, __B, __C, 4); } __m128i test_mm_maskz_ternarylogic_epi32(__mmask8 __U, __m128i __A, __m128i __B, __m128i __C) { // CHECK-LABEL: @test_mm_maskz_ternarylogic_epi32 - // CHECK: @llvm.x86.avx512.maskz.pternlog.d.128 + // CHECK: @llvm.x86.avx512.pternlog.d.128 + // CHECK: select <4 x i1> %{{.*}}, <4 x i32> %{{.*}}, <4 x i32> zeroinitializer return _mm_maskz_ternarylogic_epi32(__U, __A, __B, __C, 4); } __m256i test_mm256_ternarylogic_epi32(__m256i __A, __m256i __B, __m256i __C) { // CHECK-LABEL: @test_mm256_ternarylogic_epi32 - // CHECK: @llvm.x86.avx512.mask.pternlog.d.256 + // CHECK: @llvm.x86.avx512.pternlog.d.256 return _mm256_ternarylogic_epi32(__A, __B, __C, 4); } __m256i test_mm256_mask_ternarylogic_epi32(__m256i __A, __mmask8 __U, __m256i __B, __m256i __C) { // CHECK-LABEL: @test_mm256_mask_ternarylogic_epi32 - // CHECK: @llvm.x86.avx512.mask.pternlog.d.256 + // CHECK: @llvm.x86.avx512.pternlog.d.256 + // CHECK: select <8 x i1> %{{.*}}, <8 x i32> %{{.*}}, <8 x i32> %{{.*}} return _mm256_mask_ternarylogic_epi32(__A, __U, __B, __C, 4); } __m256i test_mm256_maskz_ternarylogic_epi32(__mmask8 __U, __m256i __A, __m256i __B, __m256i __C) { // CHECK-LABEL: @test_mm256_maskz_ternarylogic_epi32 - // CHECK: @llvm.x86.avx512.maskz.pternlog.d.256 + // CHECK: @llvm.x86.avx512.pternlog.d.256 + // CHECK: select <8 x i1> %{{.*}}, <8 x i32> %{{.*}}, <8 x i32> zeroinitializer return _mm256_maskz_ternarylogic_epi32(__U, __A, __B, __C, 4); } __m128i test_mm_ternarylogic_epi64(__m128i __A, __m128i __B, __m128i __C) { // CHECK-LABEL: @test_mm_ternarylogic_epi64 - // CHECK: @llvm.x86.avx512.mask.pternlog.q.128 + // CHECK: @llvm.x86.avx512.pternlog.q.128 return _mm_ternarylogic_epi64(__A, __B, __C, 4); } __m128i test_mm_mask_ternarylogic_epi64(__m128i __A, __mmask8 __U, __m128i __B, __m128i __C) { // CHECK-LABEL: @test_mm_mask_ternarylogic_epi64 - // CHECK: @llvm.x86.avx512.mask.pternlog.q.128 + // CHECK: @llvm.x86.avx512.pternlog.q.128 + // CHECK: select <2 x i1> %{{.*}}, <2 x i64> %{{.*}}, <2 x i64> %{{.*}} return _mm_mask_ternarylogic_epi64(__A, __U, __B, __C, 4); } __m128i test_mm_maskz_ternarylogic_epi64(__mmask8 __U, __m128i __A, __m128i __B, __m128i __C) { // CHECK-LABEL: @test_mm_maskz_ternarylogic_epi64 - // CHECK: @llvm.x86.avx512.maskz.pternlog.q.128 + // CHECK: @llvm.x86.avx512.pternlog.q.128 + // CHECK: select <2 x i1> %{{.*}}, <2 x i64> %{{.*}}, <2 x i64> zeroinitializer return _mm_maskz_ternarylogic_epi64(__U, __A, __B, __C, 4); } __m256i test_mm256_ternarylogic_epi64(__m256i __A, __m256i __B, __m256i __C) { // CHECK-LABEL: @test_mm256_ternarylogic_epi64 - // CHECK: @llvm.x86.avx512.mask.pternlog.q.256 + // CHECK: @llvm.x86.avx512.pternlog.q.256 return _mm256_ternarylogic_epi64(__A, __B, __C, 4); } __m256i test_mm256_mask_ternarylogic_epi64(__m256i __A, __mmask8 __U, __m256i __B, __m256i __C) { // CHECK-LABEL: @test_mm256_mask_ternarylogic_epi64 - // CHECK: @llvm.x86.avx512.mask.pternlog.q.256 + // CHECK: @llvm.x86.avx512.pternlog.q.256 + // CHECK: select <4 x i1> %{{.*}}, <4 x i64> %{{.*}}, <4 x i64> %{{.*}} return _mm256_mask_ternarylogic_epi64(__A, __U, __B, __C, 4); } __m256i test_mm256_maskz_ternarylogic_epi64(__mmask8 __U, __m256i __A, __m256i __B, __m256i __C) { // CHECK-LABEL: @test_mm256_maskz_ternarylogic_epi64 - // CHECK: @llvm.x86.a
[PATCH] D47125: [X86] Remove masking from pternlog llvm intrinsics and use a select instruction instead.
craig.topper added a comment. Because the builtins take one of the arguments as an immediate, they must be implemented as macros. This was the frontend can verify that it's an imediate or a constant expression. Repository: rC Clang https://reviews.llvm.org/D47125 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D47142: [x86] invpcid intrinsic
craig.topper added inline comments. Comment at: lib/Headers/cpuid.h:158 #define bit_BMI20x0100 +#define bit_INVCPID 0x0400 #define bit_ENH_MOVSB 0x0200 this should be below ENH_MOVSB to keep the bits in order Comment at: lib/Headers/intrin.h:196 + */ void __cdecl _invpcid(unsigned int, void *); +#endif @rnk, what's the right thing to do here? Repository: rC Clang https://reviews.llvm.org/D47142 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46541: [CodeGen] Improve diagnostics related to target attributes
craig.topper added a comment. I think you can pass StringRef(F).substr(1). That won't create a temporary string. It will just create a StringRef pointing into the middle of an existing std::string stored in the parsed attributes. https://reviews.llvm.org/D46541 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D46863: [X86] Use __builtin_convertvector to implement some of the packed integer to packed float conversion intrinsics.
craig.topper added a comment. So I think we've covered the whether this is ok to do questions. If someone can double check signed/unsigned and vector element sizes are all correct and approve this that would be great. Repository: rC Clang https://reviews.llvm.org/D46863 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D47174: [X86] Move 128-bit f16c intrinsics to __emmintrin_f16c.h include from emmintrin.h. Move 256-bit f16c intrinsics back to f16cintrin.h
craig.topper created this revision. craig.topper added reviewers: RKSimon, spatel, echristo, DavidKreitzer. Intel documents the 128-bit versions as being in emmintrin.h and the 256-bit version as being in immintrin.h. This patch makes a new __emmtrin_f16c.h to hold the 128-bit versions to be included from emmintrin.h. And makes the existing f16cintrin.h contain the 256-bit versions and include it from immintrin.h with an error if its included directly. Repository: rC Clang https://reviews.llvm.org/D47174 Files: lib/Headers/__emmintrin_f16c.h lib/Headers/emmintrin.h lib/Headers/f16cintrin.h lib/Headers/immintrin.h Index: lib/Headers/immintrin.h === --- lib/Headers/immintrin.h +++ lib/Headers/immintrin.h @@ -69,54 +69,8 @@ #if !defined(_MSC_VER) || __has_feature(modules) || defined(__AVX2__) #include -/* The 256-bit versions of functions in f16cintrin.h. - Intel documents these as being in immintrin.h, and - they depend on typedefs from avxintrin.h. */ - -/// Converts a 256-bit vector of [8 x float] into a 128-bit vector -///containing 16-bit half-precision float values. -/// -/// \headerfile -/// -/// \code -/// __m128i _mm256_cvtps_ph(__m256 a, const int imm); -/// \endcode -/// -/// This intrinsic corresponds to the VCVTPS2PH instruction. -/// -/// \param a -///A 256-bit vector containing 32-bit single-precision float values to be -///converted to 16-bit half-precision float values. -/// \param imm -///An immediate value controlling rounding using bits [2:0]: \n -///000: Nearest \n -///001: Down \n -///010: Up \n -///011: Truncate \n -///1XX: Use MXCSR.RC for rounding -/// \returns A 128-bit vector containing the converted 16-bit half-precision -///float values. -#define _mm256_cvtps_ph(a, imm) __extension__ ({ \ - (__m128i)__builtin_ia32_vcvtps2ph256((__v8sf)(__m256)(a), (imm)); }) - -/// Converts a 128-bit vector containing 16-bit half-precision float -///values into a 256-bit vector of [8 x float]. -/// -/// \headerfile -/// -/// This intrinsic corresponds to the VCVTPH2PS instruction. -/// -/// \param __a -///A 128-bit vector containing 16-bit half-precision float values to be -///converted to 32-bit single-precision float values. -/// \returns A vector of [8 x float] containing the converted 32-bit -///single-precision float values. -static __inline __m256 __attribute__((__always_inline__, __nodebug__, __target__("f16c"))) -_mm256_cvtph_ps(__m128i __a) -{ - return (__m256)__builtin_ia32_vcvtph2ps256((__v8hi)__a); -} -#endif /* __AVX2__ */ +#if !defined(_MSC_VER) || __has_feature(modules) || defined(__F16C__) +#include #if !defined(_MSC_VER) || __has_feature(modules) || defined(__VPCLMULQDQ__) #include Index: lib/Headers/emmintrin.h === --- lib/Headers/emmintrin.h +++ lib/Headers/emmintrin.h @@ -44,7 +44,7 @@ * appear in the interface though. */ typedef signed char __v16qs __attribute__((__vector_size__(16))); -#include +#include <__emmintrin_f16c.h> /* Define the default attributes for the functions in this file. */ #define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__, __target__("sse2"))) Index: lib/Headers/f16cintrin.h === --- lib/Headers/f16cintrin.h +++ lib/Headers/f16cintrin.h @@ -21,8 +21,8 @@ *===---=== */ -#if !defined __X86INTRIN_H && !defined __EMMINTRIN_H && !defined __IMMINTRIN_H -#error "Never use directly; include instead." +#if !defined __IMMINTRIN_H +#error "Never use directly; include instead." #endif #ifndef __F16CINTRIN_H @@ -32,91 +32,52 @@ #define __DEFAULT_FN_ATTRS \ __attribute__((__always_inline__, __nodebug__, __target__("f16c"))) -/// Converts a 16-bit half-precision float value into a 32-bit float -///value. -/// -/// \headerfile -/// -/// This intrinsic corresponds to the VCVTPH2PS instruction. -/// -/// \param __a -///A 16-bit half-precision float value. -/// \returns The converted 32-bit float value. -static __inline float __DEFAULT_FN_ATTRS -_cvtsh_ss(unsigned short __a) -{ - __v8hi v = {(short)__a, 0, 0, 0, 0, 0, 0, 0}; - __v4sf r = __builtin_ia32_vcvtph2ps(v); - return r[0]; -} - -/// Converts a 32-bit single-precision float value to a 16-bit -///half-precision float value. -/// -/// \headerfile -/// -/// \code -/// unsigned short _cvtss_sh(float a, const int imm); -/// \endcode -/// -/// This intrinsic corresponds to the VCVTPS2PH instruction. -/// -/// \param a -///A 32-bit single-precision float value to be converted to a 16-bit -///half-precision float value. -/// \param imm -///An immediate value controlling rounding using bits [2:0]: \n -///000: Nearest \n -///001: Down \n -///010: Up \n -///011: Truncate \n -///
[PATCH] D47174: [X86] Move 128-bit f16c intrinsics to __emmintrin_f16c.h include from emmintrin.h. Move 256-bit f16c intrinsics back to f16cintrin.h
craig.topper added inline comments. Comment at: lib/Headers/immintrin.h:72 -/* The 256-bit versions of functions in f16cintrin.h. - Intel documents these as being in immintrin.h, and Interesting this to note here, the 256-bit f16c intrinsics were being guarded by __AVX2__ when MSC_VER was defined and modules weren't supported. This was definitely incorrect. Repository: rC Clang https://reviews.llvm.org/D47174 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D47182: [X86] Move all Intel defined intrinsic includes into immintrin.h
craig.topper created this revision. craig.topper added reviewers: DavidKreitzer, echristo, RKSimon, rnk. This matches the Intel documentation which shows them available by importing immintrin.h. x86intrin.h also includes immintrin.h so anyone including x86intrin.h will still get them. This is different than gcc, but I don't think we were a perfect match there already. I'm unclear what gcc's policy is about how they choose which to add things to. Repository: rC Clang https://reviews.llvm.org/D47182 Files: lib/Headers/bmi2intrin.h lib/Headers/bmiintrin.h lib/Headers/cldemoteintrin.h lib/Headers/clzerointrin.h lib/Headers/immintrin.h lib/Headers/lzcntintrin.h lib/Headers/movdirintrin.h lib/Headers/pconfigintrin.h lib/Headers/ptwriteintrin.h lib/Headers/rdseedintrin.h lib/Headers/sgxintrin.h lib/Headers/waitpkgintrin.h lib/Headers/wbnoinvdintrin.h lib/Headers/x86intrin.h Index: lib/Headers/x86intrin.h === --- lib/Headers/x86intrin.h +++ lib/Headers/x86intrin.h @@ -32,26 +32,6 @@ #include #endif -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__BMI__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__BMI2__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__LZCNT__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__POPCNT__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__RDSEED__) -#include -#endif - #if !defined(_MSC_VER) || __has_feature(modules) || defined(__PRFCHW__) #include #endif @@ -76,45 +56,8 @@ #include #endif -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__F16C__) -#include -#endif - #if !defined(_MSC_VER) || __has_feature(modules) || defined(__MWAITX__) #include #endif -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__CLZERO__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__WBNOINVD__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__CLDEMOTE__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__WAITPKG__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || \ - defined(__MOVDIRI__) || defined(__MOVDIR64B__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__PCONFIG__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__SGX__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__PTWRITE__) -#include -#endif - #endif /* __X86INTRIN_H */ Index: lib/Headers/wbnoinvdintrin.h === --- lib/Headers/wbnoinvdintrin.h +++ lib/Headers/wbnoinvdintrin.h @@ -21,8 +21,8 @@ *===---=== */ -#ifndef __X86INTRIN_H -#error "Never use directly; include instead." +#ifndef __IMMINTRIN_H +#error "Never use directly; include instead." #endif #ifndef __WBNOINVDINTRIN_H Index: lib/Headers/waitpkgintrin.h === --- lib/Headers/waitpkgintrin.h +++ lib/Headers/waitpkgintrin.h @@ -20,8 +20,8 @@ * *===---=== */ -#ifndef __X86INTRIN_H -#error "Never use directly; include instead." +#ifndef __IMMINTRIN_H +#error "Never use directly; include instead." #endif #ifndef __WAITPKGINTRIN_H Index: lib/Headers/sgxintrin.h === --- lib/Headers/sgxintrin.h +++ lib/Headers/sgxintrin.h @@ -21,8 +21,8 @@ *===---=== */ -#ifndef __X86INTRIN_H -#error "Never use directly; include instead." +#ifndef __IMMINTRIN_H +#error "Never use directly; include instead." #endif #ifndef __SGXINTRIN_H Index: lib/Headers/rdseedintrin.h === --- lib/Headers/rdseedintrin.h +++ lib/Headers/rdseedintrin.h @@ -21,8 +21,8 @@ *===---=== */ -#ifndef __X86INTRIN_H -#error "Never use directly; include instead." +#ifndef __IMMINTRIN_H +#error "Never use directly; include instead." #endif #ifndef __RDSEEDINTRIN_H Index: lib/Headers/ptwriteintrin.h === --- lib/Headers/ptwriteintrin.h +++ lib/Headers/ptwriteintrin.h @@ -21,8 +21,8 @@ *===---=== */ -#ifndef __X86INTRIN_H -#error "Never use directly; include instead." +#ifndef __IMMINTRIN_H +#error "Never use directly; include instead." #endif #ifndef __PTWRITEINTRIN_H Index: lib/Headers/pconfigint
[PATCH] D47182: [X86] Move all Intel defined intrinsic includes into immintrin.h
craig.topper added a comment. First there was mmintrin.h which covered MMX instructions. Then xmmintrin.h came along to support SSE1 and implicitly included mmintrin.h. The emmintrin.h to support SSE2 and implicitly included xmmintrin.h. This repeated for each new version of SSE. With each header file including the previous header file. I think most of the later SSE headers start with the first letter of the code name of the CPU that added that SSE level including cancelled projects. So nmmintrin.h refers to Nehalem. tmmintrin.h came from the cancelled Tejas CPU. wmmintrin.h referes to Westmere. pmmintrin.h refers to Penryn. Eventually this was determined to not be very scalable to remember which header file contained what intrinsics and you have to change it with each generation to get the latest.. So immintrin.h was created to just include everything. I assume the 'i' here just stands for Intel. The 'mm' is just historic legacy due to the earlier file names. x86intrin.h was created by gcc to hold the intrinsics not specified by Intel. I think icc has an x86intrin.h that just includes immintrin.h. Repository: rC Clang https://reviews.llvm.org/D47182 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D47202: [CodeGen] use nsw negation for abs
craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. This seems right to me. GCC believes believes that __bultin_abs always returns a positive number. https://reviews.llvm.org/D47202 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D47174: [X86] Move 128-bit f16c intrinsics to __emmintrin_f16c.h include from emmintrin.h. Move 256-bit f16c intrinsics back to f16cintrin.h
craig.topper added a comment. It is odd, but they really are split in the icc include files. So they got split a while back in clang to match the Intel Intrinsic Guide documentation. Repository: rC Clang https://reviews.llvm.org/D47174 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D47174: [X86] Move 128-bit f16c intrinsics to __emmintrin_f16c.h include from emmintrin.h. Move 256-bit f16c intrinsics back to f16cintrin.h
craig.topper added a comment. Implemented @DavidKreitzer's suggestion in r333033 Repository: rL LLVM https://reviews.llvm.org/D47174 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D47182: [X86] Move all Intel defined intrinsic includes into immintrin.h
craig.topper updated this revision to Diff 148115. craig.topper added a comment. Leave the message still saying x86intrin.h. Change the error checks to look for either x86intrin.h or immintrin.h to have been included. Really only the immintrin.h check is necessary since that's the header that does the include, but I put both so the error message saying x86intrin.h would be less confusing. https://reviews.llvm.org/D47182 Files: lib/Headers/cldemoteintrin.h lib/Headers/clzerointrin.h lib/Headers/immintrin.h lib/Headers/movdirintrin.h lib/Headers/pconfigintrin.h lib/Headers/ptwriteintrin.h lib/Headers/rdseedintrin.h lib/Headers/sgxintrin.h lib/Headers/waitpkgintrin.h lib/Headers/wbnoinvdintrin.h lib/Headers/x86intrin.h Index: lib/Headers/x86intrin.h === --- lib/Headers/x86intrin.h +++ lib/Headers/x86intrin.h @@ -32,26 +32,6 @@ #include #endif -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__BMI__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__BMI2__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__LZCNT__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__POPCNT__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__RDSEED__) -#include -#endif - #if !defined(_MSC_VER) || __has_feature(modules) || defined(__PRFCHW__) #include #endif @@ -76,45 +56,8 @@ #include #endif -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__F16C__) -#include -#endif - #if !defined(_MSC_VER) || __has_feature(modules) || defined(__MWAITX__) #include #endif -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__CLZERO__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__WBNOINVD__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__CLDEMOTE__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__WAITPKG__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || \ - defined(__MOVDIRI__) || defined(__MOVDIR64B__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__PCONFIG__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__SGX__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__PTWRITE__) -#include -#endif - #endif /* __X86INTRIN_H */ Index: lib/Headers/wbnoinvdintrin.h === --- lib/Headers/wbnoinvdintrin.h +++ lib/Headers/wbnoinvdintrin.h @@ -21,7 +21,7 @@ *===---=== */ -#ifndef __X86INTRIN_H +#if !defined __X86INTRIN_H && !defined __IMMINTRIN_H #error "Never use directly; include instead." #endif Index: lib/Headers/waitpkgintrin.h === --- lib/Headers/waitpkgintrin.h +++ lib/Headers/waitpkgintrin.h @@ -20,7 +20,7 @@ * *===---=== */ -#ifndef __X86INTRIN_H +#if !defined __X86INTRIN_H && !defined __IMMINTRIN_H #error "Never use directly; include instead." #endif Index: lib/Headers/sgxintrin.h === --- lib/Headers/sgxintrin.h +++ lib/Headers/sgxintrin.h @@ -21,7 +21,7 @@ *===---=== */ -#ifndef __X86INTRIN_H +#if !defined __X86INTRIN_H && !defined __IMMINTRIN_H #error "Never use directly; include instead." #endif Index: lib/Headers/rdseedintrin.h === --- lib/Headers/rdseedintrin.h +++ lib/Headers/rdseedintrin.h @@ -21,7 +21,7 @@ *===---=== */ -#ifndef __X86INTRIN_H +#if !defined __X86INTRIN_H && !defined __IMMINTRIN_H #error "Never use directly; include instead." #endif Index: lib/Headers/ptwriteintrin.h === --- lib/Headers/ptwriteintrin.h +++ lib/Headers/ptwriteintrin.h @@ -21,7 +21,7 @@ *===---=== */ -#ifndef __X86INTRIN_H +#if !defined __X86INTRIN_H && !defined __IMMINTRIN_H #error "Never use directly; include instead." #endif Index: lib/Headers/pconfigintrin.h === --- lib/Headers/pconfigintrin.h +++ lib/Headers/pconfigintrin.h @@ -21,7 +21,7 @@ *===---=== */ -#ifndef __X86INTRIN_H +#if !defined __X86INTRIN_H && !defined __IMMINTRIN_H #error "Never use directly; include instead." #endif Index:
[PATCH] D47182: [X86] Move all Intel defined intrinsic includes into immintrin.h
craig.topper updated this revision to Diff 148252. craig.topper added a comment. Add back popcntintrin.h https://reviews.llvm.org/D47182 Files: lib/Headers/cldemoteintrin.h lib/Headers/clzerointrin.h lib/Headers/immintrin.h lib/Headers/movdirintrin.h lib/Headers/pconfigintrin.h lib/Headers/ptwriteintrin.h lib/Headers/rdseedintrin.h lib/Headers/sgxintrin.h lib/Headers/waitpkgintrin.h lib/Headers/wbnoinvdintrin.h lib/Headers/x86intrin.h Index: lib/Headers/x86intrin.h === --- lib/Headers/x86intrin.h +++ lib/Headers/x86intrin.h @@ -32,26 +32,6 @@ #include #endif -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__BMI__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__BMI2__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__LZCNT__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__POPCNT__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__RDSEED__) -#include -#endif - #if !defined(_MSC_VER) || __has_feature(modules) || defined(__PRFCHW__) #include #endif @@ -76,45 +56,8 @@ #include #endif -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__F16C__) -#include -#endif - #if !defined(_MSC_VER) || __has_feature(modules) || defined(__MWAITX__) #include #endif -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__CLZERO__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__WBNOINVD__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__CLDEMOTE__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__WAITPKG__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || \ - defined(__MOVDIRI__) || defined(__MOVDIR64B__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__PCONFIG__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__SGX__) -#include -#endif - -#if !defined(_MSC_VER) || __has_feature(modules) || defined(__PTWRITE__) -#include -#endif - #endif /* __X86INTRIN_H */ Index: lib/Headers/wbnoinvdintrin.h === --- lib/Headers/wbnoinvdintrin.h +++ lib/Headers/wbnoinvdintrin.h @@ -21,7 +21,7 @@ *===---=== */ -#ifndef __X86INTRIN_H +#if !defined __X86INTRIN_H && !defined __IMMINTRIN_H #error "Never use directly; include instead." #endif Index: lib/Headers/waitpkgintrin.h === --- lib/Headers/waitpkgintrin.h +++ lib/Headers/waitpkgintrin.h @@ -20,7 +20,7 @@ * *===---=== */ -#ifndef __X86INTRIN_H +#if !defined __X86INTRIN_H && !defined __IMMINTRIN_H #error "Never use directly; include instead." #endif Index: lib/Headers/sgxintrin.h === --- lib/Headers/sgxintrin.h +++ lib/Headers/sgxintrin.h @@ -21,7 +21,7 @@ *===---=== */ -#ifndef __X86INTRIN_H +#if !defined __X86INTRIN_H && !defined __IMMINTRIN_H #error "Never use directly; include instead." #endif Index: lib/Headers/rdseedintrin.h === --- lib/Headers/rdseedintrin.h +++ lib/Headers/rdseedintrin.h @@ -21,7 +21,7 @@ *===---=== */ -#ifndef __X86INTRIN_H +#if !defined __X86INTRIN_H && !defined __IMMINTRIN_H #error "Never use directly; include instead." #endif Index: lib/Headers/ptwriteintrin.h === --- lib/Headers/ptwriteintrin.h +++ lib/Headers/ptwriteintrin.h @@ -21,7 +21,7 @@ *===---=== */ -#ifndef __X86INTRIN_H +#if !defined __X86INTRIN_H && !defined __IMMINTRIN_H #error "Never use directly; include instead." #endif Index: lib/Headers/pconfigintrin.h === --- lib/Headers/pconfigintrin.h +++ lib/Headers/pconfigintrin.h @@ -21,7 +21,7 @@ *===---=== */ -#ifndef __X86INTRIN_H +#if !defined __X86INTRIN_H && !defined __IMMINTRIN_H #error "Never use directly; include instead." #endif Index: lib/Headers/movdirintrin.h === --- lib/Headers/movdirintrin.h +++ lib/Headers/movdirintrin.h @@ -20,7 +20,7 @@ * *===---=== */ -#ifndef __X86INTRIN_H +
[PATCH] D46863: [X86] Use __builtin_convertvector to implement some of the packed integer to packed float conversion intrinsics.
craig.topper added a comment. Hi @aemerson, I'm not opposed to adding it back. But the clang policy for vector builtins has always been that we won't support all the builtins that gcc does and to encourage the use of the _mm_* wrappers which are guaranteed to work in both compilers. It possible to change your source code to use the portable intrinsic name? Repository: rC Clang https://reviews.llvm.org/D46863 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D47277: [modules] Mark __wmmintrin_pclmul.h/__wmmintrin_aes.h as textual
craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM https://reviews.llvm.org/D47277 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D47142: [x86] invpcid intrinsic
craig.topper added a comment. LGTM, if you fix the ordering in cpuid.h. https://reviews.llvm.org/D47142 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D47401: [X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible.
craig.topper added inline comments. Comment at: cfe/trunk/lib/Headers/avx512fintrin.h:9855 + __v8di __t6 = (__v8di)_mm512_##op(__t4, __t5); \ + return __t6[0]; RKSimon wrote: > Would it be dumb to allow VLX capable CPUs to use 128/256 variants of the > VPMAXUQ etc ? Or is it better to focus on improving SimplifyDemandedElts to > handle this (and many other reduction cases that all tend to keep to the > original vector width)? I'm not sure how to do that from clang. Should we be using a reduction intrinsic and do custom lowering in the backend? Repository: rL LLVM https://reviews.llvm.org/D47401 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D47444: [X86] Lowering FMA intrinsics to native IR (Clang part)
craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM Comment at: lib/CodeGen/CGBuiltin.cpp:8416 +static Value *EmitX86FMAExpr(CodeGenFunction &CGF, ArrayRef Ops, + unsigned BuiltinID) { + Please indent this to line up with the first argument on the previous line. Repository: rC Clang https://reviews.llvm.org/D47444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D47474: Implement cpu_dispatch/cpu_specific Multiversioning
craig.topper added inline comments. Comment at: include/clang/Basic/X86Target.def:295 +CPU_SPECIFIC("pentium_iii", 'H', + (1ULL << FEATURE_CMOV | 1ULL << FEATURE_MMX | 1ULL << FEATURE_SSE)) +CPU_SPECIFIC("pentium_iii_no_xmm_regs", 'H', Could we just make the features a comma separated string? Then we wouldn't need a third version of EmitX86CpuSupports? Yeah it would incur string processing costs, but is that a big deal? Comment at: lib/Sema/SemaDecl.cpp:9214 +return MultiVersioning::Target; + else if (FD->hasAttr()) +return MultiVersioning::CpuDispatch; No need for else after return. Comment at: lib/Sema/SemaDeclAttr.cpp:1901 +const TargetInfo &Target = S.Context.getTargetInfo(); +if (llvm::find_if(Cpus, [CpuName, &Target](const IdentifierInfo *Cur) { + return Target.cpuSpecificManglingCharacter(CpuName) == Maybe use llvm::any_of since you don't actually care about the iterator returned. https://reviews.llvm.org/D47474 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D38824: [X86] Synchronize the CPU predefined macros with gcc
craig.topper created this revision. Herald added a subscriber: krytarowski. We were using corei7 for a large swatch of Intel CPUs. gcc has a different defines that more closely match the march flags. This updates to match. It also fixes skylake-avx512 and adds silvermont in addition to slm. https://reviews.llvm.org/D38824 Files: lib/Basic/Targets/X86.cpp test/Preprocessor/predefined-arch-macros.c Index: test/Preprocessor/predefined-arch-macros.c === --- test/Preprocessor/predefined-arch-macros.c +++ test/Preprocessor/predefined-arch-macros.c @@ -427,11 +427,14 @@ // CHECK_COREI7_AVX_M32: #define __SSSE3__ 1 // CHECK_COREI7_AVX_M32: #define __XSAVEOPT__ 1 // CHECK_COREI7_AVX_M32: #define __XSAVE__ 1 -// CHECK_COREI7_AVX_M32: #define __corei7 1 -// CHECK_COREI7_AVX_M32: #define __corei7__ 1 +// CHECK_COREI7_AVX_M32: #define __corei7_avx 1 +// CHECK_COREI7_AVX_M32: #define __corei7_avx__ 1 // CHECK_COREI7_AVX_M32: #define __i386 1 // CHECK_COREI7_AVX_M32: #define __i386__ 1 -// CHECK_COREI7_AVX_M32: #define __tune_corei7__ 1 +// CHECK_COREI7_AVX_M32: #define __sandybridge 1 +// CHECK_COREI7_AVX_M32: #define __sandybridge__ 1 +// CHECK_COREI7_AVX_M32: #define __tune_corei7_avx__ 1 +// CHECK_COREI7_AVX_M32: #define __tune_sandybridge__ 1 // CHECK_COREI7_AVX_M32: #define i386 1 // RUN: %clang -march=corei7-avx -m64 -E -dM %s -o - 2>&1 \ // RUN: -target i386-unknown-linux \ @@ -454,9 +457,12 @@ // CHECK_COREI7_AVX_M64: #define __XSAVE__ 1 // CHECK_COREI7_AVX_M64: #define __amd64 1 // CHECK_COREI7_AVX_M64: #define __amd64__ 1 -// CHECK_COREI7_AVX_M64: #define __corei7 1 -// CHECK_COREI7_AVX_M64: #define __corei7__ 1 -// CHECK_COREI7_AVX_M64: #define __tune_corei7__ 1 +// CHECK_COREI7_AVX_M64: #define __corei7_avx 1 +// CHECK_COREI7_AVX_M64: #define __corei7_avx__ 1 +// CHECK_COREI7_AVX_M64: #define __sandybridge 1 +// CHECK_COREI7_AVX_M64: #define __sandybridge__ 1 +// CHECK_COREI7_AVX_M64: #define __tune_corei7_avx__ 1 +// CHECK_COREI7_AVX_M64: #define __tune_sandybridge__ 1 // CHECK_COREI7_AVX_M64: #define __x86_64 1 // CHECK_COREI7_AVX_M64: #define __x86_64__ 1 // @@ -477,11 +483,14 @@ // CHECK_CORE_AVX_I_M32: #define __SSSE3__ 1 // CHECK_CORE_AVX_I_M32: #define __XSAVEOPT__ 1 // CHECK_CORE_AVX_I_M32: #define __XSAVE__ 1 -// CHECK_CORE_AVX_I_M32: #define __corei7 1 -// CHECK_CORE_AVX_I_M32: #define __corei7__ 1 +// CHECK_CORE_AVX_I_M32: #define __corei7_avx 1 +// CHECK_CORE_AVX_I_M32: #define __corei7_avx__ 1 // CHECK_CORE_AVX_I_M32: #define __i386 1 // CHECK_CORE_AVX_I_M32: #define __i386__ 1 -// CHECK_CORE_AVX_I_M32: #define __tune_corei7__ 1 +// CHECK_CORE_AVX_I_M32: #define __sandybridge 1 +// CHECK_CORE_AVX_I_M32: #define __sandybridge__ 1 +// CHECK_CORE_AVX_I_M32: #define __tune_corei7_avx__ 1 +// CHECK_CORE_AVX_I_M32: #define __tune_sandybridge__ 1 // CHECK_CORE_AVX_I_M32: #define i386 1 // RUN: %clang -march=core-avx-i -m64 -E -dM %s -o - 2>&1 \ // RUN: -target i386-unknown-linux \ @@ -504,9 +513,12 @@ // CHECK_CORE_AVX_I_M64: #define __XSAVE__ 1 // CHECK_CORE_AVX_I_M64: #define __amd64 1 // CHECK_CORE_AVX_I_M64: #define __amd64__ 1 -// CHECK_CORE_AVX_I_M64: #define __corei7 1 -// CHECK_CORE_AVX_I_M64: #define __corei7__ 1 -// CHECK_CORE_AVX_I_M64: #define __tune_corei7__ 1 +// CHECK_CORE_AVX_I_M64: #define __corei7_avx 1 +// CHECK_CORE_AVX_I_M64: #define __corei7_avx__ 1 +// CHECK_CORE_AVX_I_M64: #define __sandybridge 1 +// CHECK_CORE_AVX_I_M64: #define __sandybridge__ 1 +// CHECK_CORE_AVX_I_M64: #define __tune_corei7_avx__ 1 +// CHECK_CORE_AVX_I_M64: #define __tune_sandybridge__ 1 // CHECK_CORE_AVX_I_M64: #define __x86_64 1 // CHECK_CORE_AVX_I_M64: #define __x86_64__ 1 // @@ -533,11 +545,14 @@ // CHECK_CORE_AVX2_M32: #define __SSSE3__ 1 // CHECK_CORE_AVX2_M32: #define __XSAVEOPT__ 1 // CHECK_CORE_AVX2_M32: #define __XSAVE__ 1 -// CHECK_CORE_AVX2_M32: #define __corei7 1 -// CHECK_CORE_AVX2_M32: #define __corei7__ 1 +// CHECK_CORE_AVX2_M32: #define __core_avx2 1 +// CHECK_CORE_AVX2_M32: #define __core_avx2__ 1 +// CHECK_CORE_AVX2_M32: #define __haswell 1 +// CHECK_CORE_AVX2_M32: #define __haswell__ 1 // CHECK_CORE_AVX2_M32: #define __i386 1 // CHECK_CORE_AVX2_M32: #define __i386__ 1 -// CHECK_CORE_AVX2_M32: #define __tune_corei7__ 1 +// CHECK_CORE_AVX2_M32: #define __tune_core_avx2__ 1 +// CHECK_CORE_AVX2_M32: #define __tune_haswell__ 1 // CHECK_CORE_AVX2_M32: #define i386 1 // RUN: %clang -march=core-avx2 -m64 -E -dM %s -o - 2>&1 \ // RUN: -target i386-unknown-linux \ @@ -566,9 +581,12 @@ // CHECK_CORE_AVX2_M64: #define __XSAVE__ 1 // CHECK_CORE_AVX2_M64: #define __amd64 1 // CHECK_CORE_AVX2_M64: #define __amd64__ 1 -// CHECK_CORE_AVX2_M64: #define __corei7 1 -// CHECK_CORE_AVX2_M64: #define __corei7__ 1 -// CHECK_CORE_AVX2_M64: #define __tune_corei7__ 1 +// CHECK_CORE_AVX2_M64: #define __core_avx2 1 +// CHECK_CORE_AVX2_M64: #define __core_avx2__ 1 +// CHECK_CORE_AVX2_M64: #def
[PATCH] D38824: [X86] Synchronize the CPU predefined macros with gcc
craig.topper added inline comments. Comment at: lib/Basic/Targets/X86.cpp:844-845 -// FIXME: Historically, we defined this legacy name, it would be nice to -// remove it at some point. We've never exposed fine-grained names for -// recent primary x86 CPUs, and we should keep it that way. -defineCPUMacros(Builder, "corei7"); chandlerc wrote: > This seems to undo the idea that we should keep avoiding exposing > fine-grained CPU names? What's new that changes this? CPUs newer than the ones with that comment seem to have ignored said comment. Probably be cause we don't have a definition for what to do for new CPUs if we aren't going to expose fine grained names. Do we just call everything corei7 forever? Comment at: lib/Basic/Targets/X86.cpp:852 +defineCPUMacros(Builder, "core_avx2"); +defineCPUMacros(Builder, "haswell"); break; chandlerc wrote: > I find calling a Westmere CPU `nehalem` a little odd. Calling IvyBridge a > `sandybridge' CPU seems quite confusing. But calling Skylake (client) and > Cannonlake (all? client?) `haswell` seems deeply weird. This implementation matches what gcc does. I agree its weird. gcc doesn't implement cannonlake yet so i don't know what they'll do. https://reviews.llvm.org/D38824 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D38781: [X86] Add CLWB intrinsic. clang part
craig.topper updated this revision to Diff 118816. craig.topper added a comment. Address review feedback https://reviews.llvm.org/D38781 Files: include/clang/Basic/BuiltinsX86.def lib/Headers/CMakeLists.txt lib/Headers/clwbintrin.h lib/Headers/immintrin.h test/CodeGen/builtin-clwb.c Index: test/CodeGen/builtin-clwb.c === --- /dev/null +++ test/CodeGen/builtin-clwb.c @@ -0,0 +1,9 @@ +// RUN: %clang_cc1 %s -ffreestanding -triple=x86_64-apple-darwin -target-feature +clwb -emit-llvm -o - -Wall -Werror | FileCheck %s + +#include + +void test_mm_clwb(const void *__m) { + //CHECK-LABEL: @test_mm_clwb + //CHECK: @llvm.x86.clwb + _mm_clwb(__m); +} Index: lib/Headers/immintrin.h === --- lib/Headers/immintrin.h +++ lib/Headers/immintrin.h @@ -58,6 +58,10 @@ #include #endif +#if !defined(_MSC_VER) || __has_feature(modules) || defined(__CLWB__) +#include +#endif + #if !defined(_MSC_VER) || __has_feature(modules) || defined(__AVX__) #include #endif Index: lib/Headers/clwbintrin.h === --- /dev/null +++ lib/Headers/clwbintrin.h @@ -0,0 +1,52 @@ +/*=== clwbintrin.h - CLWB intrinsic === + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + * + *===---=== + */ + +#ifndef __IMMINTRIN_H +#error "Never use directly; include instead." +#endif + +#ifndef __CLWBINTRIN_H +#define __CLWBINTRIN_H + +/* Define the default attributes for the functions in this file. */ +#define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__, __target__("clwb"))) + +/// \brief Writes back to memory the cache line (if modified) that contains the +/// linear address specified in \a __m from any level of the cache hierarchy in +/// the cache coherence domain +/// +/// \headerfile +/// +/// This intrinsic corresponds to the CLWB instruction. +/// +/// \param __p +///A pointer to the memory location used to identify the cache line to be +///written back. +static __inline__ void __DEFAULT_FN_ATTRS +_mm_clwb(void const *__m) { + __builtin_ia32_clwb(__m); +} + +#undef __DEFAULT_FN_ATTRS + +#endif Index: lib/Headers/CMakeLists.txt === --- lib/Headers/CMakeLists.txt +++ lib/Headers/CMakeLists.txt @@ -33,6 +33,7 @@ clzerointrin.h cpuid.h clflushoptintrin.h + clwbintrin.h emmintrin.h f16cintrin.h float.h Index: include/clang/Basic/BuiltinsX86.def === --- include/clang/Basic/BuiltinsX86.def +++ include/clang/Basic/BuiltinsX86.def @@ -641,6 +641,9 @@ //CLFLUSHOPT TARGET_BUILTIN(__builtin_ia32_clflushopt, "vvC*", "", "clflushopt") +//CLWB +TARGET_BUILTIN(__builtin_ia32_clwb, "vvC*", "", "clwb") + // ADX TARGET_BUILTIN(__builtin_ia32_addcarryx_u32, "UcUcUiUiUi*", "", "adx") TARGET_BUILTIN(__builtin_ia32_addcarry_u32, "UcUcUiUiUi*", "", "") ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D38824: [X86] Synchronize the CPU predefined macros with gcc
craig.topper updated this revision to Diff 118973. craig.topper added a comment. Only define "corei7" on nehalem/westmere to match gcc. Don't define anything for the CPUs newer than that. Add comments to the CPUs where gcc has two sets of defines and we have only one. https://reviews.llvm.org/D38824 Files: lib/Basic/Targets/X86.cpp test/Preprocessor/predefined-arch-macros.c Index: test/Preprocessor/predefined-arch-macros.c === --- test/Preprocessor/predefined-arch-macros.c +++ test/Preprocessor/predefined-arch-macros.c @@ -427,11 +427,8 @@ // CHECK_COREI7_AVX_M32: #define __SSSE3__ 1 // CHECK_COREI7_AVX_M32: #define __XSAVEOPT__ 1 // CHECK_COREI7_AVX_M32: #define __XSAVE__ 1 -// CHECK_COREI7_AVX_M32: #define __corei7 1 -// CHECK_COREI7_AVX_M32: #define __corei7__ 1 // CHECK_COREI7_AVX_M32: #define __i386 1 // CHECK_COREI7_AVX_M32: #define __i386__ 1 -// CHECK_COREI7_AVX_M32: #define __tune_corei7__ 1 // CHECK_COREI7_AVX_M32: #define i386 1 // RUN: %clang -march=corei7-avx -m64 -E -dM %s -o - 2>&1 \ // RUN: -target i386-unknown-linux \ @@ -454,9 +451,6 @@ // CHECK_COREI7_AVX_M64: #define __XSAVE__ 1 // CHECK_COREI7_AVX_M64: #define __amd64 1 // CHECK_COREI7_AVX_M64: #define __amd64__ 1 -// CHECK_COREI7_AVX_M64: #define __corei7 1 -// CHECK_COREI7_AVX_M64: #define __corei7__ 1 -// CHECK_COREI7_AVX_M64: #define __tune_corei7__ 1 // CHECK_COREI7_AVX_M64: #define __x86_64 1 // CHECK_COREI7_AVX_M64: #define __x86_64__ 1 // @@ -477,11 +471,8 @@ // CHECK_CORE_AVX_I_M32: #define __SSSE3__ 1 // CHECK_CORE_AVX_I_M32: #define __XSAVEOPT__ 1 // CHECK_CORE_AVX_I_M32: #define __XSAVE__ 1 -// CHECK_CORE_AVX_I_M32: #define __corei7 1 -// CHECK_CORE_AVX_I_M32: #define __corei7__ 1 // CHECK_CORE_AVX_I_M32: #define __i386 1 // CHECK_CORE_AVX_I_M32: #define __i386__ 1 -// CHECK_CORE_AVX_I_M32: #define __tune_corei7__ 1 // CHECK_CORE_AVX_I_M32: #define i386 1 // RUN: %clang -march=core-avx-i -m64 -E -dM %s -o - 2>&1 \ // RUN: -target i386-unknown-linux \ @@ -504,9 +495,6 @@ // CHECK_CORE_AVX_I_M64: #define __XSAVE__ 1 // CHECK_CORE_AVX_I_M64: #define __amd64 1 // CHECK_CORE_AVX_I_M64: #define __amd64__ 1 -// CHECK_CORE_AVX_I_M64: #define __corei7 1 -// CHECK_CORE_AVX_I_M64: #define __corei7__ 1 -// CHECK_CORE_AVX_I_M64: #define __tune_corei7__ 1 // CHECK_CORE_AVX_I_M64: #define __x86_64 1 // CHECK_CORE_AVX_I_M64: #define __x86_64__ 1 // @@ -533,11 +521,8 @@ // CHECK_CORE_AVX2_M32: #define __SSSE3__ 1 // CHECK_CORE_AVX2_M32: #define __XSAVEOPT__ 1 // CHECK_CORE_AVX2_M32: #define __XSAVE__ 1 -// CHECK_CORE_AVX2_M32: #define __corei7 1 -// CHECK_CORE_AVX2_M32: #define __corei7__ 1 // CHECK_CORE_AVX2_M32: #define __i386 1 // CHECK_CORE_AVX2_M32: #define __i386__ 1 -// CHECK_CORE_AVX2_M32: #define __tune_corei7__ 1 // CHECK_CORE_AVX2_M32: #define i386 1 // RUN: %clang -march=core-avx2 -m64 -E -dM %s -o - 2>&1 \ // RUN: -target i386-unknown-linux \ @@ -566,9 +551,6 @@ // CHECK_CORE_AVX2_M64: #define __XSAVE__ 1 // CHECK_CORE_AVX2_M64: #define __amd64 1 // CHECK_CORE_AVX2_M64: #define __amd64__ 1 -// CHECK_CORE_AVX2_M64: #define __corei7 1 -// CHECK_CORE_AVX2_M64: #define __corei7__ 1 -// CHECK_CORE_AVX2_M64: #define __tune_corei7__ 1 // CHECK_CORE_AVX2_M64: #define __x86_64 1 // CHECK_CORE_AVX2_M64: #define __x86_64__ 1 // @@ -597,11 +579,8 @@ // CHECK_BROADWELL_M32: #define __SSSE3__ 1 // CHECK_BROADWELL_M32: #define __XSAVEOPT__ 1 // CHECK_BROADWELL_M32: #define __XSAVE__ 1 -// CHECK_BROADWELL_M32: #define __corei7 1 -// CHECK_BROADWELL_M32: #define __corei7__ 1 // CHECK_BROADWELL_M32: #define __i386 1 // CHECK_BROADWELL_M32: #define __i386__ 1 -// CHECK_BROADWELL_M32: #define __tune_corei7__ 1 // CHECK_BROADWELL_M32: #define i386 1 // RUN: %clang -march=broadwell -m64 -E -dM %s -o - 2>&1 \ // RUN: -target i386-unknown-linux \ @@ -632,9 +611,6 @@ // CHECK_BROADWELL_M64: #define __XSAVE__ 1 // CHECK_BROADWELL_M64: #define __amd64 1 // CHECK_BROADWELL_M64: #define __amd64__ 1 -// CHECK_BROADWELL_M64: #define __corei7 1 -// CHECK_BROADWELL_M64: #define __corei7__ 1 -// CHECK_BROADWELL_M64: #define __tune_corei7__ 1 // CHECK_BROADWELL_M64: #define __x86_64 1 // CHECK_BROADWELL_M64: #define __x86_64__ 1 // @@ -890,9 +866,6 @@ // CHECK_SKX_M32: #define __XSAVE__ 1 // CHECK_SKX_M32: #define __i386 1 // CHECK_SKX_M32: #define __i386__ 1 -// CHECK_SKX_M32: #define __skx 1 -// CHECK_SKX_M32: #define __skx__ 1 -// CHECK_SKX_M32: #define __tune_skx__ 1 // CHECK_SKX_M32: #define i386 1 // RUN: %clang -march=skylake-avx512 -m64 -E -dM %s -o - 2>&1 \ @@ -934,9 +907,6 @@ // CHECK_SKX_M64: #define __XSAVE__ 1 // CHECK_SKX_M64: #define __amd64 1 // CHECK_SKX_M64: #define __amd64__ 1 -// CHECK_SKX_M64: #define __skx 1 -// CHECK_SKX_M64: #define __skx__ 1 -// CHECK_SKX_M64: #define __tune_skx__ 1 // CHECK_SKX_M64: #define __x86_64 1 // CHECK_SKX_M64: #define __x86_64__ 1 // Index: lib/Basic/Tar
[PATCH] D38737: [X86] test/testn intrinsics lowering to IR. clang side
craig.topper added inline comments. Comment at: lib/Headers/avx512bwintrin.h:2109 + return _mm512_cmp_epi8_mask(_mm512_and_epi32(__A, __B), +_mm512_setzero_qi(), 4); } Can you align this with the opening paren on the line above? Same with all the other functions below. https://reviews.llvm.org/D38737 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D51510: Move AESNI generation to Skylake and Goldmont
craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM. Can you update lib/Target/X86/X86.td in LLVM repo as well? Repository: rC Clang https://reviews.llvm.org/D51510 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D51510: Move AESNI generation to Skylake and Goldmont
craig.topper added a comment. Do you have commit access, or do you need someone to commit this for you? Repository: rC Clang https://reviews.llvm.org/D51510 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D51771: [X86] Modify addcarry/subborrow builtins to emit an 2 result and intrinsic and an store instruction.
craig.topper created this revision. craig.topper added reviewers: RKSimon, spatel. Herald added subscribers: cfe-commits, kristina. This is the clang side of https://reviews.llvm.org/D51769. The llvm intrinsics now return two results instead of using an out parameter. Repository: rC Clang https://reviews.llvm.org/D51771 Files: lib/CodeGen/CGBuiltin.cpp test/CodeGen/adc-builtins.c test/CodeGen/adx-builtins.c Index: test/CodeGen/adx-builtins.c === --- test/CodeGen/adx-builtins.c +++ test/CodeGen/adx-builtins.c @@ -5,14 +5,20 @@ unsigned char test_addcarryx_u32(unsigned char __cf, unsigned int __x, unsigned int __y, unsigned int *__p) { // CHECK-LABEL: test_addcarryx_u32 -// CHECK: call i8 @llvm.x86.addcarryx.u32 +// CHECK: [[ADC:%.*]] = call { i8, i32 } @llvm.x86.addcarryx.u32 +// CHECK: [[DATA:%.*]] = extractvalue { i8, i32 } [[ADC]], 1 +// CHECK: store i32 [[DATA]], i32* %{{.*}} +// CHECK: [[CF:%.*]] = extractvalue { i8, i32 } [[ADC]], 0 return _addcarryx_u32(__cf, __x, __y, __p); } unsigned char test_addcarryx_u64(unsigned char __cf, unsigned long long __x, unsigned long long __y, unsigned long long *__p) { // CHECK-LABEL: test_addcarryx_u64 -// CHECK: call i8 @llvm.x86.addcarryx.u64 +// CHECK: [[ADC:%.*]] = call { i8, i64 } @llvm.x86.addcarryx.u64 +// CHECK: [[DATA:%.*]] = extractvalue { i8, i64 } [[ADC]], 1 +// CHECK: store i64 [[DATA]], i64* %{{.*}} +// CHECK: [[CF:%.*]] = extractvalue { i8, i64 } [[ADC]], 0 return _addcarryx_u64(__cf, __x, __y, __p); } Index: test/CodeGen/adc-builtins.c === --- test/CodeGen/adc-builtins.c +++ test/CodeGen/adc-builtins.c @@ -5,29 +5,41 @@ unsigned char test_addcarry_u32(unsigned char __cf, unsigned int __x, unsigned int __y, unsigned int *__p) { // CHECK-LABEL: test_addcarry_u32 -// CHECK: call i8 @llvm.x86.addcarry.u32 +// CHECK: [[ADC:%.*]] = call { i8, i32 } @llvm.x86.addcarry.u32 +// CHECK: [[DATA:%.*]] = extractvalue { i8, i32 } [[ADC]], 1 +// CHECK: store i32 [[DATA]], i32* %{{.*}} +// CHECK: [[CF:%.*]] = extractvalue { i8, i32 } [[ADC]], 0 return _addcarry_u32(__cf, __x, __y, __p); } unsigned char test_addcarry_u64(unsigned char __cf, unsigned long long __x, unsigned long long __y, unsigned long long *__p) { // CHECK-LABEL: test_addcarry_u64 -// CHECK: call i8 @llvm.x86.addcarry.u64 +// CHECK: [[ADC:%.*]] = call { i8, i64 } @llvm.x86.addcarry.u64 +// CHECK: [[DATA:%.*]] = extractvalue { i8, i64 } [[ADC]], 1 +// CHECK: store i64 [[DATA]], i64* %{{.*}} +// CHECK: [[CF:%.*]] = extractvalue { i8, i64 } [[ADC]], 0 return _addcarry_u64(__cf, __x, __y, __p); } unsigned char test_subborrow_u32(unsigned char __cf, unsigned int __x, unsigned int __y, unsigned int *__p) { // CHECK-LABEL: test_subborrow_u32 -// CHECK: call i8 @llvm.x86.subborrow.u32 +// CHECK: [[SBB:%.*]] = call { i8, i32 } @llvm.x86.subborrow.u32 +// CHECK: [[DATA:%.*]] = extractvalue { i8, i32 } [[SBB]], 1 +// CHECK: store i32 [[DATA]], i32* %{{.*}} +// CHECK: [[CF:%.*]] = extractvalue { i8, i32 } [[SBB]], 0 return _subborrow_u32(__cf, __x, __y, __p); } unsigned char test_subborrow_u64(unsigned char __cf, unsigned long long __x, unsigned long long __y, unsigned long long *__p) { // CHECK-LABEL: test_subborrow_u64 -// CHECK: call i8 @llvm.x86.subborrow.u64 +// CHECK: [[SBB:%.*]] = call { i8, i64 } @llvm.x86.subborrow.u64 +// CHECK: [[DATA:%.*]] = extractvalue { i8, i64 } [[SBB]], 1 +// CHECK: store i64 [[DATA]], i64* %{{.*}} +// CHECK: [[CF:%.*]] = extractvalue { i8, i64 } [[SBB]], 0 return _subborrow_u64(__cf, __x, __y, __p); } Index: lib/CodeGen/CGBuiltin.cpp === --- lib/CodeGen/CGBuiltin.cpp +++ lib/CodeGen/CGBuiltin.cpp @@ -10405,6 +10405,41 @@ Ops[0]); return Builder.CreateExtractValue(Call, 1); } + case X86::BI__builtin_ia32_addcarryx_u32: + case X86::BI__builtin_ia32_addcarryx_u64: + case X86::BI__builtin_ia32_addcarry_u32: + case X86::BI__builtin_ia32_addcarry_u64: + case X86::BI__builtin_ia32_subborrow_u32: + case X86::BI__builtin_ia32_subborrow_u64: { +Intrinsic::ID IID; +switch (BuiltinID) { +default: llvm_unreachable("Unsupported intrinsic!"); +case X86::BI__builtin_ia32_addcarryx_u32: + IID = Intrinsic::x86_addcarryx_u32; + break; +case X86::BI__builtin_ia32_addcarryx_u64: + IID = Intrinsic::x86_addcarryx_u64; + break; +case X86::BI__builtin_ia32_addcarry_u32: + IID = Intrinsic::x86_addcarry_u32; + break; +case X86::BI__builtin_ia32_addcarry
[PATCH] D51805: [X86] Custom emit __builtin_rdtscp so we can emit an explicit store for the out parameter
craig.topper created this revision. craig.topper added reviewers: RKSimon, spatel. This is the clang side of https://reviews.llvm.org/D51803. The llvm intrinsic now returns two results. So we need to emit an explicit store in IR for the out parameter. This is similar to addcarry/subborrow/rdrand/rdseed. https://reviews.llvm.org/D51805 Files: lib/CodeGen/CGBuiltin.cpp test/CodeGen/rd-builtins.c Index: test/CodeGen/rd-builtins.c === --- test/CodeGen/rd-builtins.c +++ test/CodeGen/rd-builtins.c @@ -14,3 +14,12 @@ // CHECK: @test_rdtsc // CHECK: call i64 @llvm.x86.rdtsc } + +unsigned long long test_rdtscp(unsigned int *a) { +// CHECK: @test_rdtscp +// CHECK: [[RDTSCP:%.*]] = call { i64, i32 } @llvm.x86.rdtscp +// CHECK: [[TSC_AUX:%.*]] = extractvalue { i64, i32 } [[RDTSCP]], 1 +// CHECK: store i32 [[TSC_AUX]], i32* %{{.*}} +// CHECK: [[TSC:%.*]] = extractvalue { i64, i32 } [[RDTSCP]], 0 + return __rdtscp(a); +} Index: lib/CodeGen/CGBuiltin.cpp === --- lib/CodeGen/CGBuiltin.cpp +++ lib/CodeGen/CGBuiltin.cpp @@ -9158,6 +9158,12 @@ case X86::BI__rdtsc: { return Builder.CreateCall(CGM.getIntrinsic(Intrinsic::x86_rdtsc)); } + case X86::BI__builtin_ia32_rdtscp: { +Value *Call = Builder.CreateCall(CGM.getIntrinsic(Intrinsic::x86_rdtscp)); +Builder.CreateDefaultAlignedStore(Builder.CreateExtractValue(Call, 1), + Ops[0]); +return Builder.CreateExtractValue(Call, 0); + } case X86::BI__builtin_ia32_undef128: case X86::BI__builtin_ia32_undef256: case X86::BI__builtin_ia32_undef512: Index: test/CodeGen/rd-builtins.c === --- test/CodeGen/rd-builtins.c +++ test/CodeGen/rd-builtins.c @@ -14,3 +14,12 @@ // CHECK: @test_rdtsc // CHECK: call i64 @llvm.x86.rdtsc } + +unsigned long long test_rdtscp(unsigned int *a) { +// CHECK: @test_rdtscp +// CHECK: [[RDTSCP:%.*]] = call { i64, i32 } @llvm.x86.rdtscp +// CHECK: [[TSC_AUX:%.*]] = extractvalue { i64, i32 } [[RDTSCP]], 1 +// CHECK: store i32 [[TSC_AUX]], i32* %{{.*}} +// CHECK: [[TSC:%.*]] = extractvalue { i64, i32 } [[RDTSCP]], 0 + return __rdtscp(a); +} Index: lib/CodeGen/CGBuiltin.cpp === --- lib/CodeGen/CGBuiltin.cpp +++ lib/CodeGen/CGBuiltin.cpp @@ -9158,6 +9158,12 @@ case X86::BI__rdtsc: { return Builder.CreateCall(CGM.getIntrinsic(Intrinsic::x86_rdtsc)); } + case X86::BI__builtin_ia32_rdtscp: { +Value *Call = Builder.CreateCall(CGM.getIntrinsic(Intrinsic::x86_rdtscp)); +Builder.CreateDefaultAlignedStore(Builder.CreateExtractValue(Call, 1), + Ops[0]); +return Builder.CreateExtractValue(Call, 0); + } case X86::BI__builtin_ia32_undef128: case X86::BI__builtin_ia32_undef256: case X86::BI__builtin_ia32_undef512: ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D49606: [ms] Add __shiftleft128 / __shiftright128 intrinsics
craig.topper added a comment. @spatel, should this ultimately use funnel shift? https://reviews.llvm.org/D49606 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D49606: [ms] Add __shiftleft128 / __shiftright128 intrinsics
craig.topper added a comment. Here are the IR patterns for this that work. Not sure if we can do this directly in C, we need a 128 bit type, but maybe we can emit it from CGBuiltin.cpp? define i64 @__shiftleft128(i64 %x, i64 %y, i8 %amt) { %a = zext i64 %x to i128 %b = zext i64 %y to i128 %c = shl i128 %b, 64 %d = or i128 %a, %c %amtmask = and i8 %amt, 63 %e = zext i8 %amtmask to i128 %f = shl i128 %d, %e %g = lshr i128 %f, 64 %h = trunc i128 %g to i64 ret i64 %h } define i64 @__shiftright128(i64 %x, i64 %y, i8 %amt) { %a = zext i64 %x to i128 %b = zext i64 %y to i128 %c = shl i128 %b, 64 %d = or i128 %a, %c %amtmask = and i8 %amt, 63 %e = zext i8 %amtmask to i128 %f = lshr i128 %d, %e %g = trunc i128 %f to i64 ret i64 %g } https://reviews.llvm.org/D49606 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D49606: [ms] Add __shiftleft128 / __shiftright128 intrinsics
craig.topper added a comment. I'd prefer the pattern over inline assembly. It'll give us more flexibility in the backend if we should be using some other instruction on different targets. https://reviews.llvm.org/D49606 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D49606: [ms] Add __shiftleft128 / __shiftright128 intrinsics
craig.topper added a comment. @spatel, yes its exactly funnel shift. I wasn't sure if we were ready for clang to create it yet or not. Can we let this go as is and change it to funnel shift once we have the variable case fixed in the backend? https://reviews.llvm.org/D49606 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D49606: [ms] Add __shiftleft128 / __shiftright128 intrinsics
craig.topper added a comment. The only weird thing that I can really think of with the C version is that the 'and' on the shift amount might get hoisted out of a loop and not get dropped during isel. https://reviews.llvm.org/D49606 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D49606: [ms] Add __shiftleft128 / __shiftright128 intrinsics
craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM. I'm inclined to let this go in now since we have a requested use for it. We can change it to funnel shift once we're confident in the backend. https://reviews.llvm.org/D49606 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D50168: [Builtins] Implement __builtin_clrsb to be compatible with gcc
craig.topper created this revision. craig.topper added reviewers: bkramer, efriedma, spatel. Herald added a reviewer: javed.absar. Herald added a subscriber: kristof.beyls. gcc defines an intrinsic called __builtin_clrsb which counts the number of extra sign bits on a number. This is equivalent to counting the number of leading zeros on a positive number or the number of leading ones on a negative number and subtracting one from the result. Since we can't count leading ones we need to invert negative numbers to count zeros. The emitted sequence contains a bit of trickery stolen from an LLVM AArch64 test arm64-clrsb.ll to prevent passing a value of 0 to ctlz. I used a icmp slt and a select to conditionally negate, but InstCombine will turn that into an ashr+xor. I can emit that directly if that's prefered. I know @spatel has been trying to remove some of the bit tricks from InstCombine so I'm not sure if the ashr+xor form will be canonical going forward. This patch will cause the builtin to be expanded inline while gcc uses a call to a function like __clrsbdi2 that is implemented in libgcc. But this is similar to what we already do for popcnt. And I don't think compiler-rt supports __clrsbdi2. https://reviews.llvm.org/D50168 Files: include/clang/Basic/Builtins.def lib/CodeGen/CGBuiltin.cpp Index: lib/CodeGen/CGBuiltin.cpp === --- lib/CodeGen/CGBuiltin.cpp +++ lib/CodeGen/CGBuiltin.cpp @@ -1537,6 +1537,34 @@ return RValue::get(ComplexVal.second); } + case Builtin::BI__builtin_clrsb: + case Builtin::BI__builtin_clrsbl: + case Builtin::BI__builtin_clrsbll: { +// clrsb(x) -> clz(x < 0 ? ~x : x) - 1 or +// -> clz(((x < 0 ? ~x : x) << 1) | 1) +Value *ArgValue = EmitScalarExpr(E->getArg(0)); + +llvm::Type *ArgType = ArgValue->getType(); +Value *F = CGM.getIntrinsic(Intrinsic::ctlz, ArgType); + +llvm::Type *ResultType = ConvertType(E->getType()); +Value *Zero = llvm::Constant::getNullValue(ArgType); +Value *IsNeg = Builder.CreateICmpSLT(ArgValue, Zero, "isneg"); +Value *Inverse = Builder.CreateNot(ArgValue, "not"); +Value *Tmp = Builder.CreateSelect(IsNeg, Inverse, ArgValue); +// Now we need to calculate ctlz(Tmp)-1, but Tmp might be zero. We know +// the sign bit is zero, so we can shift it out. Then put a 1 in the LSB. +// This removes one leading zero like the subtract does, and replaces it +// with a guaranteed one to prevent the value being 0. +Value *One = llvm::ConstantInt::get(ArgType, 1); +Tmp = Builder.CreateShl(Tmp, One); +Tmp = Builder.CreateOr(Tmp, One); +Value *Result = Builder.CreateCall(F, {Tmp, Builder.getTrue()}); +if (Result->getType() != ResultType) + Result = Builder.CreateIntCast(Result, ResultType, /*isSigned*/true, + "cast"); +return RValue::get(Result); + } case Builtin::BI__builtin_ctzs: case Builtin::BI__builtin_ctz: case Builtin::BI__builtin_ctzl: Index: include/clang/Basic/Builtins.def === --- include/clang/Basic/Builtins.def +++ include/clang/Basic/Builtins.def @@ -413,6 +413,9 @@ BUILTIN(__builtin_popcount , "iUi" , "nc") BUILTIN(__builtin_popcountl , "iULi" , "nc") BUILTIN(__builtin_popcountll, "iULLi", "nc") +BUILTIN(__builtin_clrsb , "ii" , "nc") +BUILTIN(__builtin_clrsbl , "iLi" , "nc") +BUILTIN(__builtin_clrsbll, "iLLi", "nc") // FIXME: These type signatures are not correct for targets with int != 32-bits // or with ULL != 64-bits. Index: lib/CodeGen/CGBuiltin.cpp === --- lib/CodeGen/CGBuiltin.cpp +++ lib/CodeGen/CGBuiltin.cpp @@ -1537,6 +1537,34 @@ return RValue::get(ComplexVal.second); } + case Builtin::BI__builtin_clrsb: + case Builtin::BI__builtin_clrsbl: + case Builtin::BI__builtin_clrsbll: { +// clrsb(x) -> clz(x < 0 ? ~x : x) - 1 or +// -> clz(((x < 0 ? ~x : x) << 1) | 1) +Value *ArgValue = EmitScalarExpr(E->getArg(0)); + +llvm::Type *ArgType = ArgValue->getType(); +Value *F = CGM.getIntrinsic(Intrinsic::ctlz, ArgType); + +llvm::Type *ResultType = ConvertType(E->getType()); +Value *Zero = llvm::Constant::getNullValue(ArgType); +Value *IsNeg = Builder.CreateICmpSLT(ArgValue, Zero, "isneg"); +Value *Inverse = Builder.CreateNot(ArgValue, "not"); +Value *Tmp = Builder.CreateSelect(IsNeg, Inverse, ArgValue); +// Now we need to calculate ctlz(Tmp)-1, but Tmp might be zero. We know +// the sign bit is zero, so we can shift it out. Then put a 1 in the LSB. +// This removes one leading zero like the subtract does, and replaces it +// with a guaranteed one to prevent the value being 0. +Value *One = llvm::ConstantInt::get(ArgType, 1); +Tmp = Builder.CreateShl(Tmp, One); +Tmp = Builder.CreateOr(Tmp, One); +Value *Res
[PATCH] D43041: Add X86 Support to ValidCPUList (enabling march notes)
craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM https://reviews.llvm.org/D43041 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D41880: Adding nocf_check attribute for cf-protection fine tuning
craig.topper added inline comments. Comment at: lib/Sema/SemaDeclAttr.cpp:2007 + +bool Sema::CheckAttrNoArgs(const AttributeList &Attr) { + if (!checkAttributeNumArgs(*this, Attr, 0)) { Wy did this get renamed? Comment at: lib/Sema/SemaDeclAttr.cpp:2016 -bool Sema::CheckNoCallerSavedRegsAttr(const AttributeList &Attr) { +bool Sema::CheckAttrTarget(const AttributeList &Attr) { // Check whether the attribute is valid on the current target. Why did this get renamed? Repository: rL LLVM https://reviews.llvm.org/D41880 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D43394: [X86] Add 'sahf' CPU feature, and emit __LAHFSAHF__ for it
craig.topper added inline comments. Comment at: lib/Basic/Targets/X86.cpp:295 setFeatureEnabledImpl(Features, "xsave", true); setFeatureEnabledImpl(Features, "movbe", true); break; KNM and KNL should both have sahf Comment at: lib/Basic/Targets/X86.cpp:308 setFeatureEnabledImpl(Features, "lzcnt", true); setFeatureEnabledImpl(Features, "popcnt", true); LLVM_FALLTHROUGH; sahf should be available on amdfam10 Comment at: lib/Basic/Targets/X86.cpp:1049 + if (HasLAHFSAHF) +Builder.defineMacro("__LAHFSAHF__"); Does gcc define this? It's such a low level instruction I have a hard time believing this define would be useful. Repository: rC Clang https://reviews.llvm.org/D43394 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D43394: [X86] Add 'sahf' CPU feature, and emit __LAHFSAHF__ for it
craig.topper added inline comments. Comment at: lib/Basic/Targets/X86.cpp:1049 + if (HasLAHFSAHF) +Builder.defineMacro("__LAHFSAHF__"); dim wrote: > craig.topper wrote: > > Does gcc define this? It's such a low level instruction I have a hard time > > believing this define would be useful. > I tried gcc 6, 7 and 8, and while they do expose stuff like `__POPCNT__`, I > see no `__LAHFSAHF__`. I am supposing Jonathan's original intent with this > was to make it easily testable in source, so you can insert the right > assembly for the target CPU. The same could really be said for things like > `__RDSEED__`, and some other defines... Most of the defines indicate the availability of intrinsics. At least that was their original intent. They used to control what intrinsic header were included in x86intrin.h or immintrin.h. Though now we include everything except in MSVC compatible mode and allow the target attribute to provide per function control. I'd prefer not to add this one if gcc doesn't have it. Repository: rC Clang https://reviews.llvm.org/D43394 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D43394: [X86] Add 'sahf' CPU feature to frontend
craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM Repository: rC Clang https://reviews.llvm.org/D43394 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D43459: [X86] Disable CLWB in Cannon Lake
craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM Repository: rC Clang https://reviews.llvm.org/D43459 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D33356: [Nios2] Changes in frontend to support Nios2 LLVM target
craig.topper added a comment. Is there enough functional here that there should be tests for? i.e. make sure march/mcpu switches are recognized, that the target is recognized, etc. https://reviews.llvm.org/D33356 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D33356: [Nios2] Changes in frontend to support Nios2 LLVM target
craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM Comment at: lib/Basic/Targets.cpp:7678 +for (const char *feature : allFeatures) { +Features[feature] = isFeatureSupportedByCPU(feature, CPU); +} This is indented too far. Can you fix when you commit? https://reviews.llvm.org/D33356 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D40478: Added control flow architecture protection Flag
craig.topper added inline comments. Comment at: lib/CodeGen/CGCall.cpp:1737 llvm::toStringRef(CodeGenOpts.NoSignedZeros)); +FuncAttrs.addAttribute("shstk-compatible", + llvm::toStringRef(CodeGenOpts.ShstkCompatible)); If the command line option is intended to be target independent, shouldn't these generically named? Comment at: lib/CodeGen/CodeGenFunction.cpp:876 // Apply xray attributes to the function (as a string, for now) - if (D && ShouldXRayInstrumentFunction()) { + bool InstrumentXray = ShouldXRayInstrumentFunction(); + if (D && InstrumentXray) { Why this change? Repository: rL LLVM https://reviews.llvm.org/D40478 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D40478: Added control flow architecture protection Flag
craig.topper added a comment. Are we sure we want a different command line option name from gcc? From our internal conversations with the gcc folks I thought they were suggesting that -fcf-protection could imply a software mechanism if a hardware mechanism was not available thorugh -mibt or -march? Should we emit an error to the user if -mibt isn't available? We should be able to add virtual methods on TargetInfo that X86 can customize to check for ibt and shstk. Can you provide more information about the miscompile on MSVC? I think we should do more to understand that, this sounds like it could be a time bomb waiting to fail somewhere else. Repository: rL LLVM https://reviews.llvm.org/D40478 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D41517: mmintrin.h documentation fixes and updates
craig.topper added inline comments. Comment at: lib/Headers/mmintrin.h:88 /// -/// This intrinsic corresponds to the VMOVQ / MOVD instruction. +/// This intrinsic corresponds to the MOVD instruction. /// Shouldn't this be MOVQ? Comment at: lib/Headers/mmintrin.h:104 /// -/// This intrinsic corresponds to the VMOVQ / MOVD instruction. +/// This intrinsic corresponds to the MOVD instruction. /// Shouldn't this be MOVQ? Comment at: lib/Headers/mmintrin.h:1292 /// -/// This intrinsic corresponds to the VXORPS / XORPS instruction. +/// This intrinsic corresponds to the XOR instruction. /// PXOR? Comment at: lib/Headers/mmintrin.h:1384 /// -/// This intrinsic corresponds to the VPSHUFD / PSHUFD instruction. +/// This intrinsic corresponds to the PSHUFD instruction. /// This is overly specific there is no guarantee we'd use those instructions. If it was a constant we'd probably just use a load. Comment at: lib/Headers/mmintrin.h:1402 /// -/// This intrinsic corresponds to the VPSHUFLW / PSHUFLW instruction. +/// This intrinsic corresponds to the PSHUFLW instruction. /// This is overly specific Comment at: lib/Headers/mmintrin.h:1419 /// -/// This intrinsic corresponds to the VPUNPCKLBW + VPSHUFLW / PUNPCKLBW + -///PSHUFLW instruction. +/// This intrinsic corresponds to the PUNPCKLBW + PSHUFLW instruction. /// This is overly specific https://reviews.llvm.org/D41517 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D41523: xmmintrin.h documentation fixes and updates
craig.topper added inline comments. Comment at: lib/Headers/xmmintrin.h:1706 /// -/// This intrinsic corresponds to the VMOVSS / MOVSS + shuffling +/// This intrinsic corresponds to the VBROADCASTSS / BROADCASTSS ///instruction. There is no BROADCASTSS instruction. That's an AVX instruction that only exists as VBROADCASTSS. The orginal comment was correct for pre-AVX. Comment at: lib/Headers/xmmintrin.h:2199 /// -/// This intrinsic corresponds to the VPINSRW / PINSRW instruction. +/// This intrinsic corresponds to the PINSRW instruction. /// Why is VPINSRW removed? Comment at: lib/Headers/xmmintrin.h:2659 /// -/// This intrinsic corresponds to the VMOVSS / MOVSS instruction. +/// This intrinsic corresponds to the VBLENDPS / BLENDPS instruction. /// MOVSS is correct for pre SSE4.1 targets. https://reviews.llvm.org/D41523 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D41516: emmintrin.h documentation fixes and updates
craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM https://reviews.llvm.org/D41516 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D41557: [x86][icelake][vbmi2]
craig.topper added a comment. Update the ICL macros in test/Preprocessor/predefined-arch-macros.c Comment at: include/clang/Basic/BuiltinsX86.def:1254 +TARGET_BUILTIN(__builtin_ia32_vpshldd512_mask, "V16iV16iV16iiV16iUs", "", "avx512vbmi2") +TARGET_BUILTIN(__builtin_ia32_vpshldq128_mask, "V2LLiV2LLiV2LLiiV2LLiUc", "", "avx512vl,avx512vbmi2") +TARGET_BUILTIN(__builtin_ia32_vpshldq256_mask, "V4LLiV4LLiV4LLiiV4LLiUc", "", "avx512vl,avx512vbmi2") Arguments corresponding to immediates need a capital 'I' in front of them so clang will error if they are a compile time constant. Comment at: lib/Basic/Targets/X86.cpp:135 case CK_Icelake: -// TODO: Add icelake features here. LLVM_FALLTHROUGH; Dont' remove the TODO until all features are added. Comment at: lib/Basic/Targets/X86.cpp:589 +// Enable BWI instruction if VBMI/VBMI2 is being enabled. +if (Name.startswith("avx512vbmi") && Enabled) Features["avx512bw"] = true; Do two equality checks ORed together. I think bad target attributes on functions only issue a warning and are discarded in codegen. So strings like avx512vbmifoo can get here and we should ignore them. Repository: rC Clang https://reviews.llvm.org/D41557 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D41557: [x86][icelake][vbmi2]
craig.topper added a comment. Add tests for -mavx512vbmi2 and -mno-avx512vbmi2 to test/Driver/x86-target-features.c Add a test for -mno-avx512bw also disabling avx512vbmi2 to test/Preprocessor/x86_target_features.c. Look for AVX512VBMINOAVX512BW for the existing test for avx512vbmi. Also add the test -mavx512vbmi2. Repository: rC Clang https://reviews.llvm.org/D41557 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D41558: [x86][icelake][vnni]
craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM Repository: rC Clang https://reviews.llvm.org/D41558 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D41564: [x86][icelake][bitalg]
craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM Repository: rC Clang https://reviews.llvm.org/D41564 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D41573: [x86][icelake][vpclmulqdq]
craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM Repository: rC Clang https://reviews.llvm.org/D41573 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D41557: [x86][icelake][vbmi2]
craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM Repository: rC Clang https://reviews.llvm.org/D41557 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D41582: [x86][icelake][gfni]
craig.topper added inline comments. Comment at: test/CodeGen/gfni-builtins.c:45 + +#ifdef AVX512 +__m512i test_mm512_gf2p8affineinv_epi64_epi8(__m512i A, __m512i B) { Doesn't the define have underscores around it? Repository: rC Clang https://reviews.llvm.org/D41582 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D41582: [x86][icelake][gfni]
craig.topper added inline comments. Comment at: test/CodeGen/gfni-builtins.c:45 + +#ifdef AVX512 +__m512i test_mm512_gf2p8affineinv_epi64_epi8(__m512i A, __m512i B) { craig.topper wrote: > Doesn't the define have underscores around it? Oh its your own define. You can problably just use the builtin one. Repository: rC Clang https://reviews.llvm.org/D41582 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D41583: [x86][icelake][vaes]
craig.topper added inline comments. Comment at: lib/Basic/Targets/X86.cpp:573 setMMXLevel(Features, AMD3DNowAthlon, Enabled); } else if (Name == "aes") { if (Enabled) Shouldn't -aes imply -vaes? Repository: rC Clang https://reviews.llvm.org/D41583 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D41573: [x86][icelake][vpclmulqdq]
craig.topper requested changes to this revision. craig.topper added inline comments. This revision now requires changes to proceed. Comment at: lib/Basic/Targets/X86.cpp:577 } else if (Name == "pclmul") { if (Enabled) setSSELevel(Features, SSE2, Enabled); I think we should disable vpclmulqdq when pclmul is being disabled. Repository: rC Clang https://reviews.llvm.org/D41573 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D41573: [x86][icelake][vpclmulqdq]
craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM Repository: rC Clang https://reviews.llvm.org/D41573 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D41583: [x86][icelake][vaes]
craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM Repository: rC Clang https://reviews.llvm.org/D41583 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D41582: [x86][icelake][gfni]
craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM Repository: rC Clang https://reviews.llvm.org/D41582 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D40478: Added control flow architecture protection Flag
craig.topper added inline comments. Comment at: lib/CodeGen/CodeGenModule.cpp:501 + if (CodeGenOpts.CFProtectionReturn) { +Target.checkCFProtectionReturnSupported(getDiags()); +// Indicate that we want to instrument return control flow protection. Should we still be adding the module flag if the target says its not supported? Repository: rL LLVM https://reviews.llvm.org/D40478 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D41517: mmintrin.h documentation fixes and updates
craig.topper added inline comments. Comment at: lib/Headers/mmintrin.h:1292 /// -/// This intrinsic corresponds to the VXORPS / XORPS instruction. +/// This intrinsic corresponds to the XOR instruction. /// kromanova wrote: > craig.topper wrote: > > PXOR? > For which platform/compiler? > > I checked, for x86_64 Linux XORPS(no avx)/VXORPS (with -mavx) is generated. > For PS4 we generate XORL. > > I guess, we need to write something more generic, implying that an > appropriate platform-specific XOR instruction is generated. Ideally to interoperate with other mmx intrinsics it should have been a PXOR into an mmx register. But apparently our mmx support is so limited that we aren't capable of that and instead create it in another domain and move it over. I guess just indicate it as a utility function with no specific instruction. Comment at: lib/Headers/mmintrin.h:1384 /// -/// This intrinsic corresponds to the VPSHUFD / PSHUFD instruction. +/// This intrinsic corresponds to the PSHUFD instruction. /// kromanova wrote: > craig.topper wrote: > > This is overly specific there is no guarantee we'd use those instructions. > > If it was a constant we'd probably just use a load. > That's right. I think we should use the following wording to match other > _mm_set* intrinsics documentation in this file. > > /// This intrinsic is a utility function and does not correspond to a specific > ///instruction. > Agreed. https://reviews.llvm.org/D41517 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D40478: Added control flow architecture protection Flag
craig.topper accepted this revision. craig.topper added a comment. This revision is now accepted and ready to land. LGTM Repository: rL LLVM https://reviews.llvm.org/D40478 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D41357: WIP: Fix Diagnostic layering, moving diagnostics out of Basic
craig.topper added inline comments. Comment at: lib/Basic/DiagnosticIDs.cpp:58 /// GetDiagInfo - Return the StaticDiagInfoRec entry for the specified DiagID, /// or null if the ID is invalid. This comment is out of date with the struct being renamed. Comment at: lib/Basic/DiagnosticIDs.cpp:73 -#define VALIDATE_DIAG_SIZE(NAME) \ - static_assert( \ - static_cast(diag::NUM_BUILTIN_##NAME##_DIAGNOSTICS) < \ Did this static_assert stuff get lost? Comment at: tools/driver/cc1_main.cpp:210 + +/// GetDiagInfo - Return the StaticDiagInfoRec entry for the specified DiagID, +/// or null if the ID is invalid. StaticDiagInfoRec was renamed. Repository: rC Clang https://reviews.llvm.org/D41357 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D41517: mmintrin.h documentation fixes and updates
craig.topper added inline comments. Comment at: lib/Headers/mmintrin.h:55 /// -/// This intrinsic corresponds to the VMOVD / MOVD instruction. +/// This intrinsic corresponds to the MOVD instruction. /// kromanova wrote: > I tried clang on Linux, x86_64, and if -mavx option is passed, we generate > VMOVD, if this option is omitted, we generate MOVD. > I think I understand the rational behind this change (namely, to keep MOVD, > but remove VMOVD), > since this intrinsic should use MMX registers and shouldn't have > corresponding AVX instruction(s). > > However, that's what we generate at the moment when -mavx is passed (I > suspect because our MMX support is limited) > vmovd %edi, %xmm0 > > Since we are writing the documentation for clang compiler, we should document > what clang compiler is doing, not what is should be doing. > Craig, what do you think? Should we revert back to VMOVD/MOVD? > We can change it back to VMOVD/MOVD https://reviews.llvm.org/D41517 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D41523: xmmintrin.h documentation fixes and updates
craig.topper added inline comments. Comment at: lib/Headers/xmmintrin.h:2199 /// -/// This intrinsic corresponds to the VPINSRW / PINSRW instruction. +/// This intrinsic corresponds to the PINSRW instruction. /// kromanova wrote: > craig.topper wrote: > > Why is VPINSRW removed? > I suspect the rational is the same I talked about in mmintrin.h review. > This intrinsic should use MMX registers and shouldn't have corresponding AVX > instruction(s). > > I've tried this and with or without -mavx for Linux/x86_64 we generate PINSRW > in both cases (i.e. I wasn't able to trigger generation of VEX prefixed > instruction). > > __m64 foo (__m64 a, int b) > { > __m64 x; > x = _mm_insert_pi16 (a, b, 0); > return x; > } > I didn't realize this was a MMX type intrinsic in the xmmintrin file. So VPINSRW being removed makes sense. Sorry for the noise. https://reviews.llvm.org/D41523 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D41523: xmmintrin.h documentation fixes and updates
craig.topper added a comment. The builtins are tested in tests like test/CodeGen/sse-builtins.c https://reviews.llvm.org/D41523 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D41517: mmintrin.h documentation fixes and updates
craig.topper added inline comments. Comment at: lib/Headers/mmintrin.h:1402 /// -/// This intrinsic corresponds to the VPSHUFLW / PSHUFLW instruction. +/// This intrinsic corresponds to the PSHUFLW instruction. /// dyung wrote: > craig.topper wrote: > > This is overly specific > Just to be clear, when you say this is overly specific, you are saying that > it should be replaced with the text "This intrinsic is a utility function and > does not correspond to a specific instruction." Is that correct? Yeah. I just shortened my comment from _mm_set1_pi32. https://reviews.llvm.org/D41517 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D43817: [x86] wbnoinvd intrinsic
craig.topper added inline comments. Comment at: docs/ClangCommandLineReference.rst:2359 +.. option:: -mwbnoinvd, -mno-wbnoinvd + Did you manually add these? This file is normally generated by a tool and should be in alphabetical order. Comment at: lib/Basic/Targets/X86.cpp:136 // TODO: Add icelake features here. +setFeatureEnabledImpl(Features, "wbnoinvd", true); LLVM_FALLTHROUGH; Is this based on an old repo? Icelake features have been coded here since late December. But as I said in the llvm patch we probably can't enable this on all icelakes. Comment at: lib/Headers/ia32intrin.h:79 +#define _wbinvd() __builtin_ia32_wbinvd() + Can you separate wbinvd out of this patch? This has some Microsoft compatibility issues that need to be carefully checked. We seem to already have __wbinvd() defined in intrin.h but I'm not sure it does anything. Comment at: lib/Headers/wbnoinvdintrin.h:31 + +#define _wbnoinvd __builtin_ia32_wbnoinvd + Use an inline function, not a macro. Repository: rC Clang https://reviews.llvm.org/D43817 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits