[PATCH] D136311: [CUDA,NVPTX] Implement __bf16 support for NVPTX.
Allen added inline comments. Comment at: llvm/lib/Target/NVPTX/NVPTXInstrInfo.td:186 + !eq(name, "v2f16"): Float16x2Regs, + !eq(name, "bf16"): Float16Regs, + !eq(name, "v2bf16"): Float16x2Regs, sorry for a basic question: what's the different between bf16 and f16 ? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D136311/new/ https://reviews.llvm.org/D136311 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D136311: [CUDA,NVPTX] Implement __bf16 support for NVPTX.
Allen added inline comments. Comment at: llvm/lib/Target/NVPTX/NVPTXInstrInfo.td:186 + !eq(name, "v2f16"): Float16x2Regs, + !eq(name, "bf16"): Float16Regs, + !eq(name, "v2bf16"): Float16x2Regs, tra wrote: > tra wrote: > > Allen wrote: > > > sorry for a basic question: what's the different between bf16 and f16 ? > > https://en.wikipedia.org/wiki/Bfloat16_floating-point_format > > > If your question is why both bf16 and f16 use Float16Regs, then the answer is > that both use 'untyped' 16-bit *integer * registers. > The difference from Int16Regs is that those are signed. PTX has some awkward > restrictions on matching instructions and register kinds, even though under > the hood it all boils down to everything using 32-bit registers. Thanks for your explanation. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D136311/new/ https://reviews.llvm.org/D136311 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D150867: [AArch64][FMV] Prevent target attribute using for multiversioning.
Allen added inline comments. Comment at: clang/lib/Sema/SemaDecl.cpp:11544 + // Target attribute on AArch64 is not used for multiversioning + if (NewTA && S.getASTContext().getTargetInfo().getTriple().isAArch64()) +return false; I find the attribute **target_clones/target_version** also not be used to generate multiversioning function? https://godbolt.org/z/cYWsbrPn9 so it is not support on AArch64 ? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D150867/new/ https://reviews.llvm.org/D150867 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D144704: [SVE] Add intrinsics for uniform dsp operations that explicitly undefine the result for inactive lanes.
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGec67d703cfb0: [SVE] Add intrinsics for uniform dsp operations that explicitly undefine the… (authored by dewen, committed by Allen). Herald added a project: clang. Herald added a subscriber: cfe-commits. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D144704/new/ https://reviews.llvm.org/D144704 Files: clang/include/clang/Basic/arm_sve.td clang/test/CodeGen/aarch64-sve2-intrinsics/acle_sve2_qsub.c clang/test/CodeGen/aarch64-sve2-intrinsics/acle_sve2_qsubr.c llvm/include/llvm/IR/IntrinsicsAArch64.td llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/test/CodeGen/AArch64/sve2-intrinsics-uniform-dsp-undef.ll Index: llvm/test/CodeGen/AArch64/sve2-intrinsics-uniform-dsp-undef.ll === --- /dev/null +++ llvm/test/CodeGen/AArch64/sve2-intrinsics-uniform-dsp-undef.ll @@ -0,0 +1,108 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s + +; +; SQSUB +; + +define @sqsub_i8_u( %pg, %a, %b) { +; CHECK-LABEL: sqsub_i8_u: +; CHECK: // %bb.0: +; CHECK-NEXT:sqsub z0.b, z0.b, z1.b +; CHECK-NEXT:ret + %out = call @llvm.aarch64.sve.sqsub.u.nxv16i8( %pg, +%a, +%b) + ret %out +} + +define @sqsub_i16_u( %pg, %a, %b) { +; CHECK-LABEL: sqsub_i16_u: +; CHECK: // %bb.0: +; CHECK-NEXT:sqsub z0.h, z0.h, z1.h +; CHECK-NEXT:ret + %out = call @llvm.aarch64.sve.sqsub.u.nxv8i16( %pg, +%a, +%b) + ret %out +} + +define @sqsub_i32_u( %pg, %a, %b) { +; CHECK-LABEL: sqsub_i32_u: +; CHECK: // %bb.0: +; CHECK-NEXT:sqsub z0.s, z0.s, z1.s +; CHECK-NEXT:ret + %out = call @llvm.aarch64.sve.sqsub.u.nxv4i32( %pg, +%a, +%b) + ret %out +} + +define @sqsub_i64_u( %pg, %a, %b) { +; CHECK-LABEL: sqsub_i64_u: +; CHECK: // %bb.0: +; CHECK-NEXT:sqsub z0.d, z0.d, z1.d +; CHECK-NEXT:ret + %out = call @llvm.aarch64.sve.sqsub.u.nxv2i64( %pg, +%a, +%b) + ret %out +} + +; +; UQSUB +; + +define @uqsub_i8_u( %pg, %a, %b) { +; CHECK-LABEL: uqsub_i8_u: +; CHECK: // %bb.0: +; CHECK-NEXT:uqsub z0.b, z0.b, z1.b +; CHECK-NEXT:ret + %out = call @llvm.aarch64.sve.uqsub.u.nxv16i8( %pg, +%a, +%b) + ret %out +} + +define @uqsub_i16_u( %pg, %a, %b) { +; CHECK-LABEL: uqsub_i16_u: +; CHECK: // %bb.0: +; CHECK-NEXT:uqsub z0.h, z0.h, z1.h +; CHECK-NEXT:ret + %out = call @llvm.aarch64.sve.uqsub.u.nxv8i16( %pg, +%a, +%b) + ret %out +} + +define @uqsub_i32_u( %pg, %a, %b) { +; CHECK-LABEL: uqsub_i32_u: +; CHECK: // %bb.0: +; CHECK-NEXT:uqsub z0.s, z0.s, z1.s +; CHECK-NEXT:ret + %out = call @llvm.aarch64.sve.uqsub.u.nxv4i32( %pg, +%a, +%b) + ret %out +} + +define @uqsub_i64_u( %pg, %a, %b) { +; CHECK-LABEL: uqsub_i64_u: +; CHECK: // %bb.0: +; CHECK-NEXT:uqsub z0.d, z0.d, z1.d +; CHECK-NEXT:ret + %out = call @llvm.aarch64.sve.uqsub.u.nxv2i64( %pg, +%a, +%b) + ret %out +} + +declare @llvm.aarch64.sve.uqsub.u.nxv16i8(, , ) +declare @llvm.aarch64.sve.uqsub.u.nxv8i16(, , ) +declare @llvm.aarch64.sve.uqsub.u.nxv4i32(, , ) +declare @llvm.aarch64.sve.uqsub.u.nxv2i64(, , ) + +declare @llvm.aarch64.sve.sqsub.u.nxv16i8(, , ) +declare @llvm.aarch64.sve.sqsub.u.nxv8i16(, , ) +declare @llvm.aarch64.sve.sqsub.u.nxv4i32(, , ) +declare @llvm.aarch64.sve.sqsub.u.nxv2i64(, , ) \ No newline at end of file Index: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp === --- llvm/lib/Target/AArch64/AArch64ISelLowering.cpp +++ llvm/lib/Target/AArch64/AArch64ISelLowering.cpp @@ -18449,10 +18449,16 @@ return convertMergedOpToPredOp(N, ISD::S
[PATCH] D72820: [FPEnv] Add pragma FP_CONTRACT support under strict FP.
Allen added inline comments. Herald added a project: All. Comment at: clang/lib/CodeGen/CGExprScalar.cpp:3386 +FMulAdd = Builder.CreateConstrainedFPCall( +CGF.CGM.getIntrinsic(llvm::Intrinsic::experimental_constrained_fmuladd, + Addend->getType()), Sorry, I'm not familiar with the optimization of the clang front end. I'd like to ask, is this optimization supposed to assume that all the backends have instructions like Fmuladd? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D72820/new/ https://reviews.llvm.org/D72820 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D70253: [AArch64][SVE2] Implement remaining SVE2 floating-point intrinsics
Allen added inline comments. Herald added a project: All. Comment at: llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-int-binary-logarithm.ll:31 +; CHECK-NEXT: ret + %out = call @llvm.aarch64.sve.flogb.nxv2f64( %a, + %pg, hi, kmclaughlin: Sorry for the naive question: flogb is an unary instruction showed in assemble . Why shall we need %a as an **input** operand in the instrinsic? can it be similar with ``` %a = call @llvm.aarch64.sve.flogb.nxv2f64( %pg, %b) ``` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D70253/new/ https://reviews.llvm.org/D70253 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D93793: [IR] Let IRBuilder's CreateVectorSplat/CreateShuffleVector use poison as placeholder
Allen added a comment. I have a babyism question, why poison is preferred to the undef in the pattern ConstantVector::getSplat ? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D93793/new/ https://reviews.llvm.org/D93793 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits