[PATCH] D136311: [CUDA,NVPTX] Implement __bf16 support for NVPTX.

2022-10-25 Thread Allen zhong via Phabricator via cfe-commits
Allen added inline comments.



Comment at: llvm/lib/Target/NVPTX/NVPTXInstrInfo.td:186
+ !eq(name, "v2f16"): Float16x2Regs,
+ !eq(name, "bf16"): Float16Regs,
+ !eq(name, "v2bf16"): Float16x2Regs,

sorry for a basic question: what's the different between bf16 and f16 ?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136311/new/

https://reviews.llvm.org/D136311

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D136311: [CUDA,NVPTX] Implement __bf16 support for NVPTX.

2022-10-25 Thread Allen zhong via Phabricator via cfe-commits
Allen added inline comments.



Comment at: llvm/lib/Target/NVPTX/NVPTXInstrInfo.td:186
+ !eq(name, "v2f16"): Float16x2Regs,
+ !eq(name, "bf16"): Float16Regs,
+ !eq(name, "v2bf16"): Float16x2Regs,

tra wrote:
> tra wrote:
> > Allen wrote:
> > > sorry for a basic question: what's the different between bf16 and f16 ?
> > https://en.wikipedia.org/wiki/Bfloat16_floating-point_format
> > 
> If your question is why both bf16 and f16 use Float16Regs, then the answer is 
> that both use 'untyped' 16-bit *integer * registers.
> The difference from Int16Regs is that those are signed. PTX has some awkward 
> restrictions on matching instructions and register kinds, even though under 
> the hood it all boils down to everything using 32-bit registers.
Thanks for your explanation.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136311/new/

https://reviews.llvm.org/D136311

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D150867: [AArch64][FMV] Prevent target attribute using for multiversioning.

2023-09-06 Thread Allen zhong via Phabricator via cfe-commits
Allen added inline comments.



Comment at: clang/lib/Sema/SemaDecl.cpp:11544
+  // Target attribute on AArch64 is not used for multiversioning
+  if (NewTA && S.getASTContext().getTargetInfo().getTriple().isAArch64())
+return false;

I find the attribute **target_clones/target_version** also not be used to 
generate multiversioning function? https://godbolt.org/z/cYWsbrPn9
so it is not support on AArch64 ?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150867/new/

https://reviews.llvm.org/D150867

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D144704: [SVE] Add intrinsics for uniform dsp operations that explicitly undefine the result for inactive lanes.

2023-02-27 Thread Allen zhong via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGec67d703cfb0: [SVE] Add intrinsics for uniform dsp 
operations that explicitly undefine the… (authored by dewen, committed by 
Allen).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144704/new/

https://reviews.llvm.org/D144704

Files:
  clang/include/clang/Basic/arm_sve.td
  clang/test/CodeGen/aarch64-sve2-intrinsics/acle_sve2_qsub.c
  clang/test/CodeGen/aarch64-sve2-intrinsics/acle_sve2_qsubr.c
  llvm/include/llvm/IR/IntrinsicsAArch64.td
  llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
  llvm/test/CodeGen/AArch64/sve2-intrinsics-uniform-dsp-undef.ll

Index: llvm/test/CodeGen/AArch64/sve2-intrinsics-uniform-dsp-undef.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AArch64/sve2-intrinsics-uniform-dsp-undef.ll
@@ -0,0 +1,108 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s
+
+;
+; SQSUB
+;
+
+define  @sqsub_i8_u( %pg,  %a,  %b) {
+; CHECK-LABEL: sqsub_i8_u:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:sqsub z0.b, z0.b, z1.b
+; CHECK-NEXT:ret
+  %out = call  @llvm.aarch64.sve.sqsub.u.nxv16i8( %pg,
+%a,
+%b)
+  ret  %out
+}
+
+define  @sqsub_i16_u( %pg,  %a,  %b) {
+; CHECK-LABEL: sqsub_i16_u:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:sqsub z0.h, z0.h, z1.h
+; CHECK-NEXT:ret
+  %out = call  @llvm.aarch64.sve.sqsub.u.nxv8i16( %pg,
+%a,
+%b)
+  ret  %out
+}
+
+define  @sqsub_i32_u( %pg,  %a,  %b) {
+; CHECK-LABEL: sqsub_i32_u:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:sqsub z0.s, z0.s, z1.s
+; CHECK-NEXT:ret
+  %out = call  @llvm.aarch64.sve.sqsub.u.nxv4i32( %pg,
+%a,
+%b)
+  ret  %out
+}
+
+define  @sqsub_i64_u( %pg,  %a,  %b) {
+; CHECK-LABEL: sqsub_i64_u:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:sqsub z0.d, z0.d, z1.d
+; CHECK-NEXT:ret
+  %out = call  @llvm.aarch64.sve.sqsub.u.nxv2i64( %pg,
+%a,
+%b)
+  ret  %out
+}
+
+;
+; UQSUB
+;
+
+define  @uqsub_i8_u( %pg,  %a,  %b) {
+; CHECK-LABEL: uqsub_i8_u:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:uqsub z0.b, z0.b, z1.b
+; CHECK-NEXT:ret
+  %out = call  @llvm.aarch64.sve.uqsub.u.nxv16i8( %pg,
+%a,
+%b)
+  ret  %out
+}
+
+define  @uqsub_i16_u( %pg,  %a,  %b) {
+; CHECK-LABEL: uqsub_i16_u:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:uqsub z0.h, z0.h, z1.h
+; CHECK-NEXT:ret
+  %out = call  @llvm.aarch64.sve.uqsub.u.nxv8i16( %pg,
+%a,
+%b)
+  ret  %out
+}
+
+define  @uqsub_i32_u( %pg,  %a,  %b) {
+; CHECK-LABEL: uqsub_i32_u:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:uqsub z0.s, z0.s, z1.s
+; CHECK-NEXT:ret
+  %out = call  @llvm.aarch64.sve.uqsub.u.nxv4i32( %pg,
+%a,
+%b)
+  ret  %out
+}
+
+define  @uqsub_i64_u( %pg,  %a,  %b) {
+; CHECK-LABEL: uqsub_i64_u:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:uqsub z0.d, z0.d, z1.d
+; CHECK-NEXT:ret
+  %out = call  @llvm.aarch64.sve.uqsub.u.nxv2i64( %pg,
+%a,
+%b)
+  ret  %out
+}
+
+declare  @llvm.aarch64.sve.uqsub.u.nxv16i8(, , )
+declare  @llvm.aarch64.sve.uqsub.u.nxv8i16(, , )
+declare  @llvm.aarch64.sve.uqsub.u.nxv4i32(, , )
+declare  @llvm.aarch64.sve.uqsub.u.nxv2i64(, , )
+
+declare  @llvm.aarch64.sve.sqsub.u.nxv16i8(, , )
+declare  @llvm.aarch64.sve.sqsub.u.nxv8i16(, , )
+declare  @llvm.aarch64.sve.sqsub.u.nxv4i32(, , )
+declare  @llvm.aarch64.sve.sqsub.u.nxv2i64(, , )
\ No newline at end of file
Index: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
===
--- llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -18449,10 +18449,16 @@
 return convertMergedOpToPredOp(N, ISD::S

[PATCH] D72820: [FPEnv] Add pragma FP_CONTRACT support under strict FP.

2023-02-27 Thread Allen zhong via Phabricator via cfe-commits
Allen added inline comments.
Herald added a project: All.



Comment at: clang/lib/CodeGen/CGExprScalar.cpp:3386
+FMulAdd = Builder.CreateConstrainedFPCall(
+CGF.CGM.getIntrinsic(llvm::Intrinsic::experimental_constrained_fmuladd,
+ Addend->getType()),

Sorry, I'm not familiar with the optimization of the clang front end. 
I'd like to ask, is this optimization supposed to assume that all the backends 
have instructions like Fmuladd?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72820/new/

https://reviews.llvm.org/D72820

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70253: [AArch64][SVE2] Implement remaining SVE2 floating-point intrinsics

2022-12-07 Thread Allen zhong via Phabricator via cfe-commits
Allen added inline comments.
Herald added a project: All.



Comment at: 
llvm/test/CodeGen/AArch64/sve2-intrinsics-fp-int-binary-logarithm.ll:31
+; CHECK-NEXT: ret
+  %out = call  @llvm.aarch64.sve.flogb.nxv2f64( %a,
+  %pg,

hi,  kmclaughlin: 
  Sorry for the naive question:
  flogb is an unary instruction showed in assemble . Why shall we need %a as an 
**input** operand in the instrinsic? can it be similar with
```
%a = call  @llvm.aarch64.sve.flogb.nxv2f64( 
%pg, %b)
```


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70253/new/

https://reviews.llvm.org/D70253

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D93793: [IR] Let IRBuilder's CreateVectorSplat/CreateShuffleVector use poison as placeholder

2022-01-05 Thread Allen zhong via Phabricator via cfe-commits
Allen added a comment.

I have a babyism question, why poison is preferred to the undef in the pattern 
ConstantVector::getSplat ?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93793/new/

https://reviews.llvm.org/D93793

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits