[libclc] [libclc] Refine __clc_fp*_subnormals_supported and __clc_flush_denormal_if_not_supported (PR #157633)

2025-09-11 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/157633 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[libclc] [libclc] Refine __clc_fp*_subnormals_supported and __clc_flush_denormal_if_not_supported (PR #157633)

2025-09-09 Thread Matt Arsenault via cfe-commits
@@ -66,13 +65,11 @@ bool __attribute__((noinline)) __clc_runtime_has_hw_fma32(void); #define LOG_MAGIC_NUM_SP32 (1 + NUMEXPBITS_SP32 - EXPBIAS_SP32) _CLC_OVERLOAD _CLC_INLINE float __clc_flush_denormal_if_not_supported(float x) { - int ix = __clc_as_int(x); - if (!__clc_fp

[clang] [llvm] MC: Use Triple form of lookupTarget in more places (PR #157591)

2025-09-09 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/157591 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Clang] Add `__builtin_stack_address` (PR #148281)

2025-09-09 Thread Matt Arsenault via cfe-commits
@@ -3680,6 +3681,7 @@ bool SelectionDAGLegalize::ExpandNode(SDNode *Node) { Results.push_back(Tmp1); break; } + case ISD::STACKADDRESS: case ISD::STACKSAVE: // Expand to CopyFromReg if the target set // StackPointerRegisterToSaveRestore.

[clang] [llvm] MC: Use Triple form of lookupTarget in more places (PR #157591)

2025-09-09 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/157591 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [lldb] [llvm] [mlir] MC: Add Triple overloads for more MC constructors (PR #157321)

2025-09-07 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/157321 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [lldb] [llvm] [mlir] MC: Add Triple overloads for more MC constructors (PR #157321)

2025-09-07 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/157321 >From 5f2205d454e38e63ab6d9ed2a41ff8d8b674ec6b Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sun, 7 Sep 2025 09:03:22 +0900 Subject: [PATCH 1/2] MC: Add Triple overloads for more MC constructors Avoids mor

[clang] [lldb] [llvm] [mlir] MC: Add Triple overloads for more MC constructors (PR #157321)

2025-09-07 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm auto_merge_enabled https://github.com/llvm/llvm-project/pull/157321 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [lldb] [llvm] [mlir] MC: Add Triple overloads for more MC constructors (PR #157321)

2025-09-07 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/157321 >From 5f2205d454e38e63ab6d9ed2a41ff8d8b674ec6b Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Sun, 7 Sep 2025 09:03:22 +0900 Subject: [PATCH] MC: Add Triple overloads for more MC constructors Avoids more Tr

[clang] [flang] [llvm] [mlir] [Support][NFC] Move OptimizationLevel to the Support directory (PR #157057)

2025-09-07 Thread Matt Arsenault via cfe-commits
arsenm wrote: > Perhaps the TargetParser directory is more suitable for them. Not sure what > they are used for. It's closer but not quite right. I think there probably should be some kind of ABI-information for this sort of stuff. TargetParser is more specific to frontend-backend interaction

[clang] [llvm] [mlir] MC: Add Triple overloads for more MC constructors (PR #157321)

2025-09-06 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/157321 Avoids more Triple->string->Triple round trip. This is a continuation of f137c3d592e96330e450a8fd63ef7e8877fc1908 >From 233037d81eee84cc6aafd0708758f898e6b96593 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date

[clang] [llvm] [mlir] MC: Add Triple overloads for more MC constructors (PR #157321)

2025-09-06 Thread Matt Arsenault via cfe-commits
arsenm wrote: * **#157321** https://app.graphite.dev/github/pr/llvm/llvm-project/157321?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/15732

[clang] [llvm] [mlir] MC: Add Triple overloads for more MC constructors (PR #157321)

2025-09-06 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/157321 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[libclc] [libclc] Implement erf/erfc vector function with loop since scalar function is large (PR #157055)

2025-09-05 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/157055 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Support][NFC] Move OptimizationLevel to the Support directory (PR #157057)

2025-09-05 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm commented: At some point we should split Support into "ReallySupport" and "RandomStuffThatIRAndCodeGenUse" https://github.com/llvm/llvm-project/pull/157057 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists

[clang] [Clang][HIP][CUDA] Add `__cluster_dims__` and `__no_cluster__` attribute (PR #156686)

2025-09-04 Thread Matt Arsenault via cfe-commits
@@ -1557,6 +1557,23 @@ def HIPManaged : InheritableAttr { let Documentation = [HIPManagedAttrDocs]; } +def CUDAClusterDims : InheritableAttr { + let Spellings = [GNU<"cluster_dims">, Declspec<"__cluster_dims__">]; + let Args = [ExprArgument<"X">, ExprArgument<"Y", 1>, Expr

[clang] [llvm] [Driver][AMDGPU][HIP][SPIRV] Disable optimizations for AMDGCN SPIR-V (PR #154765)

2025-09-04 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm commented: https://godbolt.org/z/Y6vYeYvaW This shows my main concern. We should not be skipping the mandatory passes on this path. The SPIRV consumer should not be taking on the responsibility of handling the IR-lowered-in-frontend features (mainly always-inline and

[clang] [llvm] [Dwarf] Support heterogeneous DW_{OP,AT}s needed for AMDGPU CFI (PR #153883)

2025-09-03 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/153883 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Driver][AMDGPU][HIP][SPIRV] Disable optimizations for AMDGCN SPIR-V (PR #154765)

2025-09-03 Thread Matt Arsenault via cfe-commits
arsenm wrote: > Not sure what loop(loop-idiom-vectorize) or globaldce are doing there at -O0, > those seem like bugs The loop-idiom-recognize [appears to be an aarch64 specific bug](https://github.com/llvm/llvm-project/issues/156787). I would also consider the globaldce to be a bug, but I ass

[clang] [llvm] [Dwarf] Support heterogeneous DW_{OP,AT}s needed for AMDGPU CFI (PR #153883)

2025-09-03 Thread Matt Arsenault via cfe-commits
@@ -56,10 +56,19 @@ static bool printOp(const DWARFExpression::Operation *Op, raw_ostream &OS, assert(!Name.empty() && "DW_OP has no name!"); OS << Name; + std::optional SubOpcode = Op->getSubCode(); + if (SubOpcode) { +StringRef SubName = SubOperationEncodingString

[libclc] [libclc] Override generic symbol using llvm-link --override flag instead of using weak linkage (PR #156778)

2025-09-03 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/156778 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Dwarf] Support heterogeneous DW_{OP,AT}s needed for AMDGPU CFI (PR #153883)

2025-09-03 Thread Matt Arsenault via cfe-commits
@@ -70,10 +79,8 @@ static bool printOp(const DWARFExpression::Operation *Op, raw_ostream &OS, unsigned Signed = Size & DWARFExpression::Operation::SignBit; if (Size == DWARFExpression::Operation::SizeSubOpLEB) { - StringRef SubName = - SubOperationEncodi

[clang] [llvm] [AMDGPU][gfx1250] Add 128B cooperative atomics (PR #156418)

2025-09-03 Thread Matt Arsenault via cfe-commits
@@ -6776,6 +6776,28 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) { "invalid vector type for format", &Call, Src1, Call.getArgOperand(2)); break; } + case Intrinsic::amdgcn_cooperative_atomic_load_32x4B: + case Intrinsic::amdgcn_coop

[clang] [llvm] [AMDGPU][gfx1250] Add 128B cooperative atomics (PR #156418)

2025-09-02 Thread Matt Arsenault via cfe-commits
@@ -6776,6 +6776,28 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) { "invalid vector type for format", &Call, Src1, Call.getArgOperand(2)); break; } + case Intrinsic::amdgcn_cooperative_atomic_load_32x4B: + case Intrinsic::amdgcn_coop

[libclc] [libclc] update __clc_mem_fence: add MemorySemantic arg and use __builtin_amdgcn_fence for AMDGPU (PR #152275)

2025-08-31 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/152275 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [flang] [llvm] [mlir] [IR][CodeGen] Remove "approx-func-fp-math" attribute (PR #155740)

2025-08-27 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/155740 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [IR][CodeGen] Remove "approx-func-fp-math" attribute (PR #155740)

2025-08-27 Thread Matt Arsenault via cfe-commits
arsenm wrote: flang is also adding it https://github.com/llvm/llvm-project/pull/155740 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Enable constexpr handling for builtin elementwise fshl/fshr (PR #153572)

2025-08-24 Thread Matt Arsenault via cfe-commits
@@ -961,3 +961,51 @@ static_assert(fmaDouble1[3] == 26.0); constexpr float fmaArray[] = {2.0f, 2.0f, 2.0f, 2.0f}; constexpr float fmaResult = __builtin_elementwise_fma(fmaArray[1], fmaArray[2], fmaArray[3]); static_assert(fmaResult == 6.0f, ""); + +static_assert(__builtin_elem

[clang] [llvm] Proofread DebuggingCoroutines.rst (PR #154681)

2025-08-21 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/154681 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [WIP][AMDGPU] Support for type inferring image load/store builtins for AMDGPU (PR #140210)

2025-08-20 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,731 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5 +// RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1100 %s -emit-llvm -o - | FileCheck %s + +typedef int int8 __attribute__((ext_vector_type(8))); +typedef flo

[clang] [clang] Introduce elementwise ctlz/cttz builtins (PR #131995)

2025-08-20 Thread Matt Arsenault via cfe-commits
arsenm wrote: > Dropping in after the fact, is there a reason we called this > `__builtin_elementwise_ctlz` instead of `__builtin_elementwise_clzg`? The > builtin is just `clzg` done on each element so the name is confusing me. It matches the llvm intrinsic name, and the second argument is a d

[clang] [llvm] [AMDGPU] Error out in clang if wavefront64 is used on gfx1250 (PR #153693)

2025-08-20 Thread Matt Arsenault via cfe-commits
@@ -774,6 +774,18 @@ static bool isWave32Capable(StringRef GPU, const Triple &T) { return IsWave32Capable; } +static bool isWave64Capable(StringRef GPU, const Triple &T) { + if (T.isAMDGCN()) { arsenm wrote: Yes https://github.com/llvm/llvm-project/pull

[clang] [Clang][Headers] Fix for SYCL (PR #152314)

2025-08-20 Thread Matt Arsenault via cfe-commits
arsenm wrote: > > I'd rather just fix clang. This is a workaround that shouldn't be > > necessary, and seems to spread the knowledge of the dodgy > > !__opencl_c_generic_address_space address space handling case to a new place > > By fixing clang you mean the cast `(void [[clang::address_space

[clang] [WIP][AMDGPU] Support for type inferring image load/store builtins for AMDGPU (PR #140210)

2025-08-20 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/140210 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [WIP][AMDGPU] Support for type inferring image load/store builtins for AMDGPU (PR #140210)

2025-08-20 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,731 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5 +// RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1100 %s -emit-llvm -o - | FileCheck %s + +typedef int int8 __attribute__((ext_vector_type(8))); +typedef flo

[clang] [WIP][AMDGPU] Support for type inferring image load/store builtins for AMDGPU (PR #140210)

2025-08-20 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,731 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5 +// RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1100 %s -emit-llvm -o - | FileCheck %s + +typedef int int8 __attribute__((ext_vector_type(8))); +typedef flo

[clang] [WIP][AMDGPU] Support for type inferring image load/store builtins for AMDGPU (PR #140210)

2025-08-20 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm commented: The main issue now is I think this should start out using an opaque type, similar to the buffer intrinsics, for the image descriptor instead of the integer vector https://github.com/llvm/llvm-project/pull/140210

[clang] [WIP][AMDGPU] Support for type inferring image load/store builtins for AMDGPU (PR #140210)

2025-08-20 Thread Matt Arsenault via cfe-commits
@@ -160,6 +163,27 @@ static Value *EmitAMDGCNBallotForExec(CodeGenFunction &CGF, const CallExpr *E, return Call; } +template arsenm wrote: This doesn't need to be a template function. You can just check E->getNumArgs for the loop bounds, and use the know

[clang] [WIP][AMDGPU] Support for type inferring image load/store builtins for AMDGPU (PR #140210)

2025-08-20 Thread Matt Arsenault via cfe-commits
@@ -112,11 +112,12 @@ bool SemaAMDGPU::CheckAMDGCNBuiltinFunctionCall(unsigned BuiltinID, case AMDGPU::BI__builtin_amdgcn_image_load_mip_3d_v4f16_i32: case AMDGPU::BI__builtin_amdgcn_image_load_mip_cube_v4f32_i32: case AMDGPU::BI__builtin_amdgcn_image_load_mip_cube_v4f16

[clang] [clang] Introduce elementwise ctlz/cttz builtins (PR #131995)

2025-08-20 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. I think the only issue is documentation phrasing, but the existing builtins already have internally inconsistent phrasing, so that can be addressed later https://github.com/llvm/llvm-project/pull/131995 _

[clang] [clang] Enable constexpr handling for __builtin_elementwise_fma (PR #152919)

2025-08-20 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/152919 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang] Proofread StandardCPlusPlusModules.rst (PR #154474)

2025-08-19 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/154474 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang] Enable constexpr handling for __builtin_elementwise_fma (PR #152919)

2025-08-19 Thread Matt Arsenault via cfe-commits
@@ -11658,6 +11658,29 @@ bool VectorExprEvaluator::VisitCallExpr(const CallExpr *E) { return Success(APValue(ResultElements.data(), ResultElements.size()), E); } + case Builtin::BI__builtin_elementwise_fma: { +APValue SourceX, SourceY, SourceZ; +if (!EvaluateAs

[clang] [llvm] [AMDGPU] Error out in clang if wavefront64 is used on gfx1250 (PR #153693)

2025-08-19 Thread Matt Arsenault via cfe-commits
@@ -774,6 +774,18 @@ static bool isWave32Capable(StringRef GPU, const Triple &T) { return IsWave32Capable; } +static bool isWave64Capable(StringRef GPU, const Triple &T) { + if (T.isAMDGCN()) { arsenm wrote: Everything should be feature driven, not random

[clang] [Clang][Headers] Fix for SYCL (PR #152314)

2025-08-19 Thread Matt Arsenault via cfe-commits
arsenm wrote: I'd rather just fix clang. This is a workaround that shouldn't be necessary, and seems to spread the knowledge of the dodgy !__opencl_c_generic_address_space address space handling case to a new place https://github.com/llvm/llvm-project/pull/152314 _

[libclc] [libclc] Implement __clc_get_local_size/__clc_get_max_sub_group_size for amdgcn (PR #153785)

2025-08-18 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/153785 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Dwarf] Support heterogeneous DW_{OP,AT}s needed for AMDGPU CFI (PR #153883)

2025-08-15 Thread Matt Arsenault via cfe-commits
@@ -56,10 +56,19 @@ static bool printOp(const DWARFExpression::Operation *Op, raw_ostream &OS, assert(!Name.empty() && "DW_OP has no name!"); OS << Name; + std::optional SubOpcode = Op->getSubCode(); + if (SubOpcode) { +StringRef SubName = SubOperationEncodingString

[clang] [llvm] [Dwarf] Support heterogeneous DW_{OP,AT}s needed for AMDGPU CFI (PR #153883)

2025-08-15 Thread Matt Arsenault via cfe-commits
@@ -1011,6 +1018,7 @@ LLVM_ABI StringRef IndexString(unsigned Idx); LLVM_ABI StringRef FormatString(DwarfFormat Format); LLVM_ABI StringRef FormatString(bool IsDWARF64); LLVM_ABI StringRef RLEString(unsigned RLE); +LLVM_ABI StringRef AddressSpaceString(unsigned AS, llvm::Triple

[clang] [llvm] [Dwarf] Support heterogeneous DW_{OP,AT}s needed for AMDGPU CFI (PR #153883)

2025-08-15 Thread Matt Arsenault via cfe-commits
@@ -120,6 +120,46 @@ inline bool isConstantAddressSpace(unsigned AS) { return false; } } + +namespace DWARFAS { +enum : unsigned { + GLOBAL = 0, + GENERIC = 1, + REGION = 2, + LOCAL = 3, + PRIVATE_LANE = 5, + PRIVATE_WAVE = 6, + DEFAULT = GLOBAL, +}; +} // namespac

[libclc] [libclc] Implement __clc_get_local_size/__clc_get_max_sub_group_size for amdgcn (PR #153785)

2025-08-15 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/153785 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[libclc] [libclc] Fix out-of-bound value for workitem functions according to OpenCL spec (PR #153784)

2025-08-15 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/153784 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Error out in clang if wavefront64 is used on gfx1250 (PR #153693)

2025-08-14 Thread Matt Arsenault via cfe-commits
@@ -774,6 +774,18 @@ static bool isWave32Capable(StringRef GPU, const Triple &T) { return IsWave32Capable; } +static bool isWave64Capable(StringRef GPU, const Triple &T) { + if (T.isAMDGCN()) { arsenm wrote: This should be added to the feature flags in th

[libclc] [libclc] Enable -ffp-contract=fast compile option for math native_* functions (PR #153137)

2025-08-11 Thread Matt Arsenault via cfe-commits
@@ -304,7 +304,7 @@ set_source_files_properties( ${CMAKE_CURRENT_SOURCE_DIR}/opencl/lib/generic/math/native_sin.cl ${CMAKE_CURRENT_SOURCE_DIR}/opencl/lib/generic/math/native_sqrt.cl ${CMAKE_CURRENT_SOURCE_DIR}/opencl/lib/generic/math/native_tan.cl - PROPERTIES COMPILE_OP

[libclc] [libclc] Enable -ffp-contract=fast compile option for math native_* functions (PR #153137)

2025-08-11 Thread Matt Arsenault via cfe-commits
arsenm wrote: I think fp contract should be globally enabled in the build, and selectively disabled in the handful of places that it is problematic (namely specific blocks in expF, sinbF, and trig reductions) https://github.com/llvm/llvm-project/pull/153137 ___

[clang] [llvm] [CodeGen] Fix VNInfo mapping in LiveRange::assign (PR #148790)

2025-08-11 Thread Matt Arsenault via cfe-commits
@@ -257,11 +257,13 @@ namespace llvm { assert(Other.segmentSet == nullptr && "Copying of LiveRanges with active SegmentSets is not supported"); // Duplicate valnos. + auto FirstNewVNIIdx = valnos.size(); arsenm wrote: ```suggestio

[clang] [llvm] [CodeGen] Fix VNInfo mapping in LiveRange::assign (PR #148790)

2025-08-11 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm commented: Probably do need to drop to a unit test for this https://github.com/llvm/llvm-project/pull/148790 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [CodeGen] Fix VNInfo mapping in LiveRange::assign (PR #148790)

2025-08-11 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/148790 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Always pass detected CUDA path to 'clang-nvlink-wrapper' (PR #152789)

2025-08-11 Thread Matt Arsenault via cfe-commits
@@ -691,9 +696,12 @@ Error runNVLink(ArrayRef Files, const ArgList &Args) { if (Args.hasArg(OPT_lto_emit_asm) || Args.hasArg(OPT_lto_emit_llvm)) return Error::success(); - std::string CudaPath = Args.getLastArgValue(OPT_cuda_path_EQ).str(); - Expected NVLinkPath = -

[clang] [clang] Enable constexpr handling for __builtin_elementwise_fma (PR #152919)

2025-08-10 Thread Matt Arsenault via cfe-commits
@@ -141,6 +141,16 @@ static void diagnoseNonConstexprBuiltin(InterpState &S, CodePtr OpPC, S.CCEDiag(Loc, diag::note_invalid_subexpr_in_const_expr); } +// Same implementation as Compiler::getRoundingMode. +static llvm::RoundingMode getRoundingMode(const InterpState &S, co

[clang] [Clang] Always pass detected CUDA path to 'clang-nvlink-wrapper' (PR #152789)

2025-08-09 Thread Matt Arsenault via cfe-commits
@@ -691,9 +696,12 @@ Error runNVLink(ArrayRef Files, const ArgList &Args) { if (Args.hasArg(OPT_lto_emit_asm) || Args.hasArg(OPT_lto_emit_llvm)) return Error::success(); - std::string CudaPath = Args.getLastArgValue(OPT_cuda_path_EQ).str(); - Expected NVLinkPath = -

[clang] [flang] [llvm] [openmp] [OpenMP][Offload] Add support for dyn_groupprivate clause (PR #152651)

2025-08-08 Thread Matt Arsenault via cfe-commits
@@ -2725,6 +2727,22 @@ void OMPClausePrinter::VisitOMPXDynCGroupMemClause( OS << ")"; } +void OMPClausePrinter::VisitOMPDynGroupprivateClause(OMPDynGroupprivateClause *Node) { + OS << "dyn_groupprivate("; + if (Node->getFirstDynGroupprivateModifier() != OMPC_SCHEDULE_MOD

[libclc] [libclc] update __clc_mem_fence: add MemorySemantic arg and use __builtin_amdgcn_fence for AMDGPU (PR #152275)

2025-08-07 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,21 @@ +//===--===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apac

[clang] [Sema] Remove an unnecessary cast (NFC) (PR #152440)

2025-08-06 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/152440 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[libclc] [libclc] Implement __clc_rsqrt with __ocml_rsqrt_* functions (PR #152436)

2025-08-06 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm commented: ocml rsqrt should be deleted, this is implementable by ``` float rsqrt(float x) { #pragma clang fp contract(fast) return 1.0f / __builtin_elementwise_sqrt(x); } ``` https://github.com/llvm/llvm-project/pull/152436

[libclc] [libclc] update __clc_mem_fence: add MemorySemantic arg and use __builtin_amdgcn_fence for AMDGPU (PR #152275)

2025-08-06 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,21 @@ +//===--===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apac

[clang] [libclang] Remove unnecessary casts (NFC) (PR #152259)

2025-08-06 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/152259 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Clang][AMDGPU] Add builtins for some buffer resource atomics (PR #149216)

2025-08-05 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/149216 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang] Introduce elementwise ctlz/cttz builtins (PR #131995)

2025-08-05 Thread Matt Arsenault via cfe-commits
arsenm wrote: I'd view assertively stating its undefined behavior as a move away from the status quo. For `__builtin_clz`, GCC states `the result is undefined`. Clang appears to not have separate documentation for that exact spelling. It does have documentation for `__builtin_clzg`, which has

[libclc] [libclc] Move mem_fence and barrier to clc library (PR #151446)

2025-08-05 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/151446 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[libclc] [libclc] Move mem_fence and barrier to clc library (PR #151446)

2025-08-05 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,37 @@ +//===--===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apac

[clang] [clang] Use llvm::iterator_range::empty (NFC) (PR #152088)

2025-08-04 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/152088 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Hide `offload-arch` initialization errors behind verbose flag (PR #151964)

2025-08-04 Thread Matt Arsenault via cfe-commits
@@ -165,8 +165,9 @@ int printGPUsByHIP() { llvm::sys::DynamicLibrary::getPermanentLibrary(DynamicHIPPath.c_str(), &ErrMsg)); if (!DynlibHandle->isValid()) { -llvm::errs() << "Failed to load " << DynamicHIPPath <<

[clang] [Clang] Hide `offload-arch` initialization errors behind verbose flag (PR #151964)

2025-08-04 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/151964 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] Fix nested cases (PR #151918)

2025-08-04 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/151918 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Sema] Use llvm::iterator_range::empty (NFC) (PR #151852)

2025-08-03 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/151852 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] v_cvt_scale_pk16 gfx1250 instructions (PR #151804)

2025-08-02 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/151804 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang] Introduce elementwise ctlz/cttz builtins (PR #131995)

2025-08-01 Thread Matt Arsenault via cfe-commits
arsenm wrote: > My literal reading of this is already that it's always undefined behaviour > for all targets (as you're requesting), That's not what I'm requesting, it would be an undefined *value*. It is not instant undefined behavior, you would get UB on use of that value https://github.co

[clang] [llvm] [Clang][AMDGPU] Add builtins for some buffer resource atomics (PR #149216)

2025-08-01 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,7 @@ +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -S -verify=expected -o - %s +// REQUIRES: amdgpu-registered-target + +void test_raw_ptr_atomics(__amdgpu_buffer_rsrc_t rsrc, float f32, double f64, int offset, int soffset) { + f32 = __builtin_amdgcn_raw_ptr_buff

[clang] [llvm] [Clang][AMDGPU] Add builtins for some buffer resource atomics (PR #149216)

2025-08-01 Thread Matt Arsenault via cfe-commits
@@ -459,6 +459,8 @@ void AMDGPU::fillAMDGPUFeatureMap(StringRef GPU, const Triple &T, Features["atomic-global-pk-add-bf16-inst"] = true; Features["atomic-ds-pk-add-16-insts"] = true; Features["setprio-inc-wg-inst"] = true; + Features["atomic-fmin-fmax-gl

[clang] [llvm] [Clang][AMDGPU] Add builtins for some buffer resource atomics (PR #149216)

2025-08-01 Thread Matt Arsenault via cfe-commits
@@ -163,6 +163,13 @@ BUILTIN(__builtin_amdgcn_raw_buffer_load_b64, "V2UiQbiiIi", "n") BUILTIN(__builtin_amdgcn_raw_buffer_load_b96, "V3UiQbiiIi", "n") BUILTIN(__builtin_amdgcn_raw_buffer_load_b128, "V4UiQbiiIi", "n") +BUILTIN(__builtin_amdgcn_raw_ptr_buffer_atomic_add_i32, "i

[clang] [llvm] [Clang][AMDGPU] Add builtins for some buffer resource atomics (PR #149216)

2025-08-01 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,9 @@ +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx90a -target-feature +atomic-fmin-fmax-global-f32 -target-feature +atomic-fmin-fmax-global-f64 -S -verify=expected -o - %s +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx942 -tar

[clang] [Clang][AMDGPU] Add builtins for some buffer resource atomics (PR #149216)

2025-08-01 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,24 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5 +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-feature +atomic-fmin-fmax-global-f32 -target-feature +atomic-fmin-fmax-global-f64 -emit-llvm -o - %s

[libclc] [libclc] Move mem_fence and barrier to clc library (PR #151446)

2025-08-01 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,36 @@ +//===--===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apac

[clang] [Clang] Only C link device libraries by default for OpenMP (PR #151239)

2025-07-30 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm commented: Missing test https://github.com/llvm/llvm-project/pull/151239 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Only C link device libraries by default for OpenMP (PR #151239)

2025-07-30 Thread Matt Arsenault via cfe-commits
arsenm wrote: > as it could potentially override functions intended to be provided by the > ROCm device libraries. These don't provide any of the standard entry point names https://github.com/llvm/llvm-project/pull/151239 ___ cfe-commits mailing l

[clang] [AST] Remove an unnecessary cast (NFC) (PR #151278)

2025-07-29 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/151278 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP] Link the LLVM libc libraries in no-RDC mode (PR #151046)

2025-07-29 Thread Matt Arsenault via cfe-commits
arsenm wrote: > HIP does not only not need the C device runtime, It does. The language bakes in the assumption that all of the libc and libm calls exist in the ambient environment. A header file wrapper is not sufficient (e.g. we need workarounds like 77de8a0c0abc9d245a7c6278670554b47ae183ea)

[clang] [HIP] Move HIP to the new driver by default (PR #123359)

2025-07-25 Thread Matt Arsenault via cfe-commits
@@ -74,30 +74,6 @@ // O0-CGO2-SAME: "-O0" // O0-CGO2-NOT: "--lto-CGO2" -// ALL-NOT: "{{.*}}opt" arsenm wrote: If the old driver isn't being removed, the tests should have run lines with the new and old version https://github.com/llvm/llvm-project/pull/1233

[clang] [Clang] Fix new driver device only compilation for `amdgcnspirv` target (PR #150110)

2025-07-22 Thread Matt Arsenault via cfe-commits
@@ -4886,6 +4886,10 @@ Action *Driver::BuildOffloadingActions(Compilation &C, // individually. for (Action *&A : DeviceActions) { if ((A->getType() != types::TY_Object && + !(A->getOffloadingToolChain() && + A->getOffloadingToolChain()->getTr

[clang] [llvm] AMDGPU: Support v_wmma_f32_16x16x128_f8f6f4 on gfx1250 (PR #149684)

2025-07-21 Thread Matt Arsenault via cfe-commits
@@ -6627,6 +6627,54 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) { "invalid vector type for format", &Call, Src1, Call.getArgOperand(5)); break; } + case Intrinsic::amdgcn_wmma_f32_16x16x128_f8f6f4: { +Value *Src0 = Call.getArgOp

[clang] [clang] Proofread UsersManual.rst (NFC) (PR #149763)

2025-07-21 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/149763 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Add support for `v_prng_b32` on gfx1250 (PR #149450)

2025-07-18 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/149450 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP][Clang][Driver] Move BC preference logic into ROCm detection (PR #149294)

2025-07-18 Thread Matt Arsenault via cfe-commits
@@ -77,6 +79,82 @@ class RocmInstallationDetector { SPACKReleaseStr(SPACKReleaseStr.str()) {} }; + struct CommonBitcodeLibsPreferences { +CommonBitcodeLibsPreferences(const Driver &D, + const llvm::opt::ArgList &DriverArgs, +

[clang] [llvm] [AMDGPU] Add support for `v_exp_bf16` on gfx1250 (PR #149229)

2025-07-17 Thread Matt Arsenault via cfe-commits
@@ -25,4 +25,27 @@ define amdgpu_ps void @llvm_log2_bf16_s(ptr addrspace(1) %out, bfloat inreg %src ret void } +define amdgpu_ps void @llvm_exp2_bf16_v(ptr addrspace(1) %out, bfloat %src) { +; GCN-LABEL: llvm_exp2_bf16_v: +; GCN: ; %bb.0: +; GCN-NEXT:v_exp_bf16_e3

[clang] [llvm] [Clang] Add `__builtin_stack_address` (PR #148281)

2025-07-17 Thread Matt Arsenault via cfe-commits
@@ -121,6 +121,11 @@ enum NodeType { /// function calling this intrinsic. SPONENTRY, + /// STACKADDR - Represents the llvm.stackaddr intrinsic. Takes no argument + /// and returns the starting address of the stack region that may be used + /// by called functions. + ST

[clang] [Sema] Remove unnecessary casts (NFC) (PR #149253)

2025-07-16 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/149253 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Add support for `v_exp_bf16` on gfx1250 (PR #149229)

2025-07-16 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,240 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 arsenm wrote: Can you give this a bf16 suffix instead of bfloat, and add the other targets where it's not legal https://github.com/llvm/llvm

[clang] [llvm] [CodeGen] Fix VNInfo mapping in LiveRange::assign (PR #148790)

2025-07-15 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm commented: Testcase? https://github.com/llvm/llvm-project/pull/148790 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Sema] Remove unnecessary casts (NFC) (PR #148762)

2025-07-14 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/148762 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NFC][AMDGPU] Rename "amdgpu-as" to "amdgpu-synchronize-as" (PR #148627)

2025-07-14 Thread Matt Arsenault via cfe-commits
@@ -704,12 +704,12 @@ void diagnoseUnknownMMRAASName(const MachineInstr &MI, StringRef AS) { DiagnosticInfoUnsupported(Fn, Str.str(), MI.getDebugLoc(), DS_Warning)); } -/// Reads \p MI's MMRAs to parse the "amdgpu-as" MMRA. +/// Reads \p MI's MMRAs to parse the "amdgpu-

  1   2   3   4   5   6   7   8   9   10   >