date:20250218

[llvm-branch-commits] [clang] release/20.x: [TBAA] Don't emit pointer-tbaa for void pointers. (#122116) (PR #125206)

2025-02-18 Thread via llvm-branch-commits


https://github.com/llvmbot updated 
https://github.com/llvm/llvm-project/pull/125206

>From 7643bd660236cd72345c0f3cbbdc75e2726ff32b Mon Sep 17 00:00:00 2001
From: Florian Hahn 
Date: Fri, 31 Jan 2025 11:38:14 +
Subject: [PATCH] [TBAA] Don't emit pointer-tbaa for void pointers. (#122116)

While there are no special rules in the standards regarding void
pointers and strict aliasing, emitting distinct tags for void pointers
break some common idioms and there is no good alternative to re-write
the code without strict-aliasing violations. An example is to count the
entries in an array of pointers:

int count_elements(void * values) {
  void **seq = values;
  int count;
  for (count = 0; seq && seq[count]; count++);
  return count;
}

https://clang.godbolt.org/z/8dTv51v8W

An example in the wild is from
https://github.com/llvm/llvm-project/issues/119099

This patch avoids emitting distinct tags for void pointers, to avoid
those idioms causing mis-compiles for now.

Fixes https://github.com/llvm/llvm-project/issues/119099.
Fixes https://github.com/llvm/llvm-project/issues/122537.

PR: https://github.com/llvm/llvm-project/pull/122116
(cherry picked from commit 77d3f8a92564b533a3c60a8c8e0657c38fd88ba1)
---
 clang/docs/UsersManual.rst| 88 +--
 clang/lib/CodeGen/CodeGenTBAA.cpp |  8 ++
 clang/test/CodeGen/tbaa-pointers.c| 13 +--
 .../CodeGenOpenCL/amdgpu-enqueue-kernel.cl| 25 +++---
 clang/unittests/CodeGen/TBAAMetadataTest.cpp  | 12 +--
 5 files changed, 105 insertions(+), 41 deletions(-)

diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index a56c9425ebb75..943a9218ccbc2 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2489,6 +2489,82 @@ are listed below.
 
 $ clang -fuse-ld=lld -Oz -Wl,--icf=safe -fcodegen-data-use code.cc
 
+.. _strict_aliasing:
+
+Strict Aliasing
+---
+
+The C and C++ standards require accesses to objects in memory to use l-values 
of
+an appropriate type for the object. This is called *strict aliasing* or
+*type-based alias analysis*. Strict aliasing enhances a variety of powerful
+memory optimizations, including reordering, combining, and eliminating memory
+accesses. These optimizations can lead to unexpected behavior in code that
+violates the strict aliasing rules. For example:
+
+.. code-block:: c++
+
+void advance(size_t *index, double *data) {
+  double value = data[*index];
+  /* Clang may assume that this store does not change the contents of 
`data`. */
+  *index += 1;
+  /* Clang may assume that this store does not change the contents of 
`index`. */
+  data[*index] = value;
+  /* Either of these facts may create significant optimization 
opportunities
+   if Clang is able to inline this function. */
+  }
+
+Strict aliasing can be explicitly enabled with ``-fstrict-aliasing`` and
+disabled with ``-fno-strict-aliasing``. ``clang-cl`` defaults to
+``-fno-strict-aliasing``; see . Otherwise, Clang defaults to 
``-fstrict-aliasing``.
+
+C and C++ specify slightly different rules for strict aliasing. To improve
+language interoperability, Clang allows two types to alias if either language
+would permit it. This includes applying the C++ similar types rule to C,
+allowing ``int **`` to alias ``int const * const *``. Clang also relaxes the
+standard aliasing rules in the following ways:
+
+* All integer types of the same size are permitted to alias each other,
+  including signed and unsigned types.
+* ``void*`` is permitted to alias any pointer type, ``void**`` is permitted to
+  alias any pointer to pointer type, and so on.
+
+Code which violates strict aliasing has undefined behavior. A program that
+works in one version of Clang may not work in another because of changes to the
+optimizer. Clang provides a :doc:`TypeSanitizer` to help detect
+violations of the strict aliasing rules, but it is currently still 
experimental.
+Code that is known to violate strict aliasing should generally be built with
+``-fno-strict-aliasing`` if the violation cannot be fixed.
+
+Clang supports several ways to fix a violation of strict aliasing:
+
+* L-values of the character types ``char`` and ``unsigned char`` (as well as
+  other types, depending on the standard) are permitted to access objects of
+  any type.
+
+* Library functions such as ``memcpy`` and ``memset`` are specified as treating
+  memory as characters and therefore are not limited by strict aliasing. If a
+  value of one type must be reinterpreted as another (e.g. to read the bits of 
a
+  floating-point number), use ``memcpy`` to copy the representation to an 
object
+  of the destination type. This has no overhead over a direct l-value access
+  because Clang should reliably optimize calls to these functions to use simple
+  loads and stores when they are used with small constant sizes.
+
+* The attribute ``may_alias`` can be added to a ``typedef`` to

[llvm-branch-commits] [mlir] de09986 - [mlir][math] `powf(a, b)` drop support when a < 0 (#126338)

2025-02-18 Thread via llvm-branch-commits


Author: Hyunsung Lee
Date: 2025-02-13T08:01:47-08:00
New Revision: de09986596c9bbc89262456dda319715fb49353f

URL: 
https://github.com/llvm/llvm-project/commit/de09986596c9bbc89262456dda319715fb49353f
DIFF: 
https://github.com/llvm/llvm-project/commit/de09986596c9bbc89262456dda319715fb49353f.diff

LOG: [mlir][math] `powf(a, b)` drop support when a < 0  (#126338)

Related: #124402

- change inefficient implementation of `powf(a, b)` to handle `a < 0`
case
  - thus drop `a < 0` case support

However, some special cases are being used such as:
  - `a < 0` and `b = 0, b = 0.5, b = 1 or b = 2`
  - convert those special cases into simpler ops.

Added: 


Modified: 
mlir/lib/Dialect/Math/Transforms/ExpandPatterns.cpp
mlir/test/Dialect/Math/expand-math.mlir
mlir/test/mlir-runner/test-expand-math-approx.mlir

Removed: 




diff  --git a/mlir/lib/Dialect/Math/Transforms/ExpandPatterns.cpp 
b/mlir/lib/Dialect/Math/Transforms/ExpandPatterns.cpp
index 3dadf9474cf4f..d7953719d44b5 100644
--- a/mlir/lib/Dialect/Math/Transforms/ExpandPatterns.cpp
+++ b/mlir/lib/Dialect/Math/Transforms/ExpandPatterns.cpp
@@ -19,6 +19,7 @@
 #include "mlir/IR/ImplicitLocOpBuilder.h"
 #include "mlir/IR/TypeUtilities.h"
 #include "mlir/Transforms/DialectConversion.h"
+#include "llvm/ADT/APFloat.h"
 
 using namespace mlir;
 
@@ -311,40 +312,71 @@ static LogicalResult convertFPowIOp(math::FPowIOp op,
   return success();
 }
 
-// Converts  Powf(float a, float b) (meaning a^b) to exp^(b * ln(a))
+// Converts Powf(float a, float b) (meaning a^b) to exp^(b * ln(a))
+// Some special cases where b is constant are handled separately:
+// when b == 0, or |b| == 0.5, 1.0, or 2.0.
 static LogicalResult convertPowfOp(math::PowFOp op, PatternRewriter &rewriter) 
{
   ImplicitLocOpBuilder b(op->getLoc(), rewriter);
   Value operandA = op.getOperand(0);
   Value operandB = op.getOperand(1);
-  Type opType = operandA.getType();
-  Value zero = createFloatConst(op->getLoc(), opType, 0.00, rewriter);
-  Value one = createFloatConst(op->getLoc(), opType, 1.00, rewriter);
-  Value two = createFloatConst(op->getLoc(), opType, 2.00, rewriter);
-  Value negOne = createFloatConst(op->getLoc(), opType, -1.00, rewriter);
-  Value opASquared = b.create(opType, operandA, operandA);
-  Value opBHalf = b.create(opType, operandB, two);
-
-  Value logA = b.create(opType, opASquared);
-  Value mult = b.create(opType, opBHalf, logA);
-  Value expResult = b.create(opType, mult);
-  Value negExpResult = b.create(opType, expResult, negOne);
-  Value remainder = b.create(opType, operandB, two);
-  Value negCheck =
-  b.create(arith::CmpFPredicate::OLT, operandA, zero);
-  Value oddPower =
-  b.create(arith::CmpFPredicate::ONE, remainder, zero);
-  Value oddAndNeg = b.create(op->getLoc(), oddPower, negCheck);
-
-  // First, we select between the exp value and the adjusted value for odd
-  // powers of negatives. Then, we ensure that one is produced if `b` is zero.
-  // This corresponds to `libm` behavior, even for `0^0`. Without this check,
-  // `exp(0 * ln(0)) = exp(0 *-inf) = exp(-nan) = -nan`.
-  Value zeroCheck =
-  b.create(arith::CmpFPredicate::OEQ, operandB, zero);
-  Value res = b.create(op->getLoc(), oddAndNeg, negExpResult,
-expResult);
-  res = b.create(op->getLoc(), zeroCheck, one, res);
-  rewriter.replaceOp(op, res);
+  auto typeA = operandA.getType();
+  auto typeB = operandB.getType();
+
+  auto &sem =
+  cast(getElementTypeOrSelf(typeB)).getFloatSemantics();
+  APFloat valueB(sem);
+  if (matchPattern(operandB, m_ConstantFloat(&valueB))) {
+if (valueB.isZero()) {
+  // a^0 -> 1
+  Value one = createFloatConst(op->getLoc(), typeA, 1.0, rewriter);
+  rewriter.replaceOp(op, one);
+  return success();
+}
+if (valueB.isExactlyValue(1.0)) {
+  // a^1 -> a
+  rewriter.replaceOp(op, operandA);
+  return success();
+}
+if (valueB.isExactlyValue(-1.0)) {
+  // a^(-1) -> 1 / a
+  Value one = createFloatConst(op->getLoc(), typeA, 1.0, rewriter);
+  Value div = b.create(one, operandA);
+  rewriter.replaceOp(op, div);
+  return success();
+}
+if (valueB.isExactlyValue(0.5)) {
+  // a^(1/2) -> sqrt(a)
+  Value sqrt = b.create(operandA);
+  rewriter.replaceOp(op, sqrt);
+  return success();
+}
+if (valueB.isExactlyValue(-0.5)) {
+  // a^(-1/2) -> 1 / sqrt(a)
+  Value rsqrt = b.create(operandA);
+  rewriter.replaceOp(op, rsqrt);
+  return success();
+}
+if (valueB.isExactlyValue(2.0)) {
+  // a^2 -> a * a
+  Value mul = b.create(operandA, operandA);
+  rewriter.replaceOp(op, mul);
+  return success();
+}
+if (valueB.isExactlyValue(-2.0)) {
+  // a^(-2) -> 1 / (a * a)
+  Value mul = b.create(operandA, operandA);
+  Value one =
+  createFloatConst(op->getLoc(), op

[llvm-branch-commits] [llvm] b2165f2 - [CostModel] Account for power-2 urem in funnel shift costs (#127037)

2025-02-18 Thread via llvm-branch-commits


Author: David Green
Date: 2025-02-13T16:05:00Z
New Revision: b2165f214efab833a4b1a9e8268b1030fc5ebaeb

URL: 
https://github.com/llvm/llvm-project/commit/b2165f214efab833a4b1a9e8268b1030fc5ebaeb
DIFF: 
https://github.com/llvm/llvm-project/commit/b2165f214efab833a4b1a9e8268b1030fc5ebaeb.diff

LOG: [CostModel] Account for power-2 urem in funnel shift costs (#127037)

As can be seen in https://godbolt.org/z/qvMqY79cK, a urem by a power-2
constant will be code-generated as an And of a mask. The cost model for
funnel shifts tries to account for that by passing OP_PowerOf2 as the
operand info for the second operand. As far as I can tell returning a
lower cost for urem with a OP_PowerOf2 is only implemented on X86
though.

This patch short-cuts that by calling getArithmeticInstrCost(And, ..)
directly when we know the typesize will be a power-of-2. This is an
alternative to the patch in #126912 which is a more general solution for
power-2 udiv/urem costs, this more narrowly just fixes funnel shifts.

Added: 


Modified: 
llvm/include/llvm/CodeGen/BasicTTIImpl.h
llvm/test/Analysis/CostModel/AArch64/fshl.ll
llvm/test/Analysis/CostModel/AArch64/fshr.ll
llvm/test/Analysis/CostModel/AArch64/sve-intrinsics.ll
llvm/test/Analysis/CostModel/ARM/intrinsic-cost-kinds.ll

Removed: 




diff  --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h 
b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index 339b83637fa8f..c63d288ad1579 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -1891,10 +1891,6 @@ class BasicTTIImplBase : public 
TargetTransformInfoImplCRTPBase {
   const TTI::OperandValueInfo OpInfoX = TTI::getOperandInfo(X);
   const TTI::OperandValueInfo OpInfoY = TTI::getOperandInfo(Y);
   const TTI::OperandValueInfo OpInfoZ = TTI::getOperandInfo(Z);
-  const TTI::OperandValueInfo OpInfoBW =
-{TTI::OK_UniformConstantValue,
- isPowerOf2_32(RetTy->getScalarSizeInBits()) ? TTI::OP_PowerOf2
- : TTI::OP_None};
 
   // fshl: (X << (Z % BW)) | (Y >> (BW - (Z % BW)))
   // fshr: (X << (BW - (Z % BW))) | (Y >> (Z % BW))
@@ -1909,10 +1905,15 @@ class BasicTTIImplBase : public 
TargetTransformInfoImplCRTPBase {
   Cost += thisT()->getArithmeticInstrCost(
   BinaryOperator::LShr, RetTy, CostKind, OpInfoY,
   {OpInfoZ.Kind, TTI::OP_None});
-  // Non-constant shift amounts requires a modulo.
+  // Non-constant shift amounts requires a modulo. If the typesize is a
+  // power-2 then this will be converted to an and, otherwise it will use a
+  // urem.
   if (!OpInfoZ.isConstant())
-Cost += thisT()->getArithmeticInstrCost(BinaryOperator::URem, RetTy,
-CostKind, OpInfoZ, OpInfoBW);
+Cost += thisT()->getArithmeticInstrCost(
+isPowerOf2_32(RetTy->getScalarSizeInBits()) ? BinaryOperator::And
+: BinaryOperator::URem,
+RetTy, CostKind, OpInfoZ,
+{TTI::OK_UniformConstantValue, TTI::OP_None});
   // For non-rotates (X != Y) we must add shift-by-zero handling costs.
   if (X != Y) {
 Type *CondTy = RetTy->getWithNewBitWidth(1);
@@ -2611,8 +2612,14 @@ class BasicTTIImplBase : public 
TargetTransformInfoImplCRTPBase {
   thisT()->getArithmeticInstrCost(BinaryOperator::Shl, RetTy, 
CostKind);
   Cost += thisT()->getArithmeticInstrCost(BinaryOperator::LShr, RetTy,
   CostKind);
-  Cost += thisT()->getArithmeticInstrCost(BinaryOperator::URem, RetTy,
-  CostKind);
+  // Non-constant shift amounts requires a modulo. If the typesize is a
+  // power-2 then this will be converted to an and, otherwise it will use a
+  // urem.
+  Cost += thisT()->getArithmeticInstrCost(
+  isPowerOf2_32(RetTy->getScalarSizeInBits()) ? BinaryOperator::And
+  : BinaryOperator::URem,
+  RetTy, CostKind, {TTI::OK_AnyValue, TTI::OP_None},
+  {TTI::OK_UniformConstantValue, TTI::OP_None});
   // Shift-by-zero handling.
   Cost += thisT()->getCmpSelInstrCost(BinaryOperator::ICmp, RetTy, CondTy,
   CmpInst::ICMP_EQ, CostKind);

diff  --git a/llvm/test/Analysis/CostModel/AArch64/fshl.ll 
b/llvm/test/Analysis/CostModel/AArch64/fshl.ll
index 632f26dfa5382..317adc96a74b6 100644
--- a/llvm/test/Analysis/CostModel/AArch64/fshl.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/fshl.ll
@@ -15,7 +15,7 @@ entry:
 
 define i8 @fshl_i8_3rd_arg_var(i8 %a, i8 %b, i8 %c) {
 ; CHECK-LABEL: 'fshl_i8_3rd_arg_var'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %fshl 
= tail call i8 @llvm.fshl.i8(i8 %a, i8 %b, i8 %c)
+; CHECK-NEXT:  Cost

[llvm-branch-commits] [libcxx] release/20.x: [libc++] Set feature-test macro `__cpp_lib_atomic_float` (#127559) (PR #127732)

2025-02-18 Thread Nikolas Klauser via llvm-branch-commits


https://github.com/philnik777 approved this pull request.


https://github.com/llvm/llvm-project/pull/127732
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] e503227 - AMDGPU: Handle gfx950 XDL Write-VGPR-VALU-WAW wait state change (#126132)

2025-02-18 Thread Tom Stellard via llvm-branch-commits


Author: Vigneshwar Jayakumar
Date: 2025-02-18T21:54:57-08:00
New Revision: e503227bc57625a0a22b450f5bd3e78df96ca4fe

URL: 
https://github.com/llvm/llvm-project/commit/e503227bc57625a0a22b450f5bd3e78df96ca4fe
DIFF: 
https://github.com/llvm/llvm-project/commit/e503227bc57625a0a22b450f5bd3e78df96ca4fe.diff

LOG: AMDGPU: Handle gfx950 XDL Write-VGPR-VALU-WAW wait state change (#126132)

There are additional wait states for XDL write VALU WAW hazard in gfx950
compared to gfx940.

(cherry picked from commit 1188b1ff7b956cb65d8ddda5f1e56c432f1a57c7)

Added: 


Modified: 
llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp 
b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
index 537181710ed32..646663a92e5e8 100644
--- a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
@@ -2605,12 +2605,14 @@ static int 
GFX940_SMFMA_N_PassWriteVgprVALUWawWaitStates(int NumPasses) {
   return NumPasses + 2;
 }
 
-static int GFX940_XDL_N_PassWriteVgprVALUWawWaitStates(int NumPasses) {
-  // 2 pass -> 5
-  // 4 pass -> 7
-  // 8 pass -> 11
-  // 16 pass -> 19
-  return NumPasses + 3;
+static int GFX940_XDL_N_PassWriteVgprVALUWawWaitStates(int NumPasses,
+   bool IsGFX950) {
+  // xdl def cycles | gfx940 | gfx950
+  // 2 pass |  55
+  // 4 pass |  78
+  // 8 pass |  11   12
+  // 16 pass|  19   20
+  return NumPasses + 3 + (NumPasses != 2 && IsGFX950);
 }
 
 static int GFX940_XDL_N_PassWriteVgprVALUMemExpReadWaitStates(int NumPasses,
@@ -2858,7 +2860,8 @@ int GCNHazardRecognizer::checkMAIVALUHazards(MachineInstr 
*MI) {
   } else if (ST.hasGFX940Insts()) {
 NeedWaitStates =
 isXDL(ST, *MFMA)
-? GFX940_XDL_N_PassWriteVgprVALUWawWaitStates(NumPasses)
+? GFX940_XDL_N_PassWriteVgprVALUWawWaitStates(
+  NumPasses, ST.hasGFX950Insts())
 : GFX940_SMFMA_N_PassWriteVgprVALUWawWaitStates(NumPasses);
   } else {
 switch (NumPasses) {

diff  --git a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir 
b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
index ef30c9a44b2b5..0af37ad8c896e 100644
--- a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
+++ b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
@@ -958,7 +958,8 @@ body: |
 # GCN-LABEL: name: xdl_smfma16x16_write_vgpr_valu_write
 # GCN:  V_MFMA
 # GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 2
+# GFX940-NEXT: S_NOP 2
+# GFX950-NEXT: S_NOP 3
 # GCN-NEXT: V_MOV_B32
 name:xdl_smfma16x16_write_vgpr_valu_write
 body: |
@@ -970,7 +971,8 @@ body: |
 # GCN:  V_MFMA
 # GCN-NEXT: S_NOP 7
 # GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 2
+# GFX940-NEXT: S_NOP 2
+# GFX950-NEXT: S_NOP 3
 # GCN-NEXT: V_MOV_B32
 name:xdl_smfma32x32_write_vgpr_valu_write
 body: |
@@ -991,7 +993,8 @@ body: |
 # GCN-LABEL: name: xdl_smfma16x16_write_vgpr_valu_f16_write
 # GCN:  V_MFMA
 # GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 2
+# GFX940-NEXT: S_NOP 2
+# GFX950-NEXT: S_NOP 3
 # GCN-NEXT: V_FMA_F16_e64
 name:xdl_smfma16x16_write_vgpr_valu_f16_write
 body: |
@@ -1003,7 +1006,8 @@ body: |
 # GCN:  V_MFMA
 # GCN-NEXT: S_NOP 7
 # GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 2
+# GFX940-NEXT: S_NOP 2
+# GFX950-NEXT: S_NOP 3
 # GCN-NEXT: V_FMA_F16_e64
 name:xdl_smfma32x32_write_vgpr_valu_f16_write
 body: |
@@ -1024,7 +1028,8 @@ body: |
 # GCN-LABEL: name: xdl_smfma16x16_write_vgpr_valu_sdwa_write
 # GCN:  V_MFMA
 # GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 2
+# GFX940-NEXT: S_NOP 2
+# GFX950-NEXT: S_NOP 3
 # GCN-NEXT: V_MOV_B32_sdwa
 name:xdl_smfma16x16_write_vgpr_valu_sdwa_write
 body: |
@@ -1761,7 +1766,8 @@ body: |
 ...
 # GCN-LABEL: name: xdl_sgemm16X16X16_mfma_write_vgpr_valu_write
 # GCN:  V_MFMA
-# GCN-NEXT: S_NOP 6
+# GFX940-NEXT: S_NOP 6
+# GFX950-NEXT: S_NOP 7
 # GCN-NEXT: V_MOV_B32
 name:xdl_sgemm16X16X16_mfma_write_vgpr_valu_write
 body: |
@@ -2072,7 +2078,8 @@ body: |
 ...
 # GCN-LABEL: name: smfmac16x16_read_vgpr_srcc_valu_write
 # GCN:  V_SMFMAC
-# GCN-NEXT: S_NOP 6
+# GFX940-NEXT: S_NOP 6
+# GFX950-NEXT: S_NOP 7
 # GCN-NEXT: V_MOV_B32
 name:smfmac16x16_read_vgpr_srcc_valu_write
 body: |
@@ -2102,7 +2109,8 @@ body: |
 # GCN-LABEL: name: smfmac32x32_read_vgpr_srcc_valu_write
 # GCN:  V_SMFMAC
 # GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 2
+# GFX940-NEXT: S_NOP 2
+# GFX950-NEXT: S_NOP 3
 # GCN-NEXT: V_MOV_B32
 name:smfmac32x32_read_vgpr_srcc_valu_write
 body: |



___

[llvm-branch-commits] [libcxx] release/20.x: Revert "[libc++] Reduce std::conjunction overhead (#124259)" (PR #127677)

2025-02-18 Thread via llvm-branch-commits


https://github.com/llvmbot updated 
https://github.com/llvm/llvm-project/pull/127677

>From 04295354b0f0f101773c4f9437680d45a70bab24 Mon Sep 17 00:00:00 2001
From: Nikolas Klauser 
Date: Fri, 7 Feb 2025 15:40:16 +0100
Subject: [PATCH] Revert "[libc++] Reduce std::conjunction overhead (#124259)"

It turns out that the new implementation takes significantly more stack
memory for some reason.

This reverts commit 2696e4fb9567d23ce065a067e7f4909b310daf50.

(cherry picked from commit 0227396417d4625bc93affdd8957ff8d90c76299)
---
 libcxx/include/__type_traits/conjunction.h | 42 --
 1 file changed, 24 insertions(+), 18 deletions(-)

diff --git a/libcxx/include/__type_traits/conjunction.h 
b/libcxx/include/__type_traits/conjunction.h
index 6b6717a50a468..ad9656acd47ec 100644
--- a/libcxx/include/__type_traits/conjunction.h
+++ b/libcxx/include/__type_traits/conjunction.h
@@ -10,6 +10,8 @@
 #define _LIBCPP___TYPE_TRAITS_CONJUNCTION_H
 
 #include <__config>
+#include <__type_traits/conditional.h>
+#include <__type_traits/enable_if.h>
 #include <__type_traits/integral_constant.h>
 #include <__type_traits/is_same.h>
 
@@ -19,29 +21,22 @@
 
 _LIBCPP_BEGIN_NAMESPACE_STD
 
-template 
-struct _AndImpl;
+template 
+using __expand_to_true _LIBCPP_NODEBUG = true_type;
 
-template <>
-struct _AndImpl {
-  template 
-  using _Result _LIBCPP_NODEBUG =
-  typename _AndImpl::template _Result<_First, _Rest...>;
-};
+template 
+__expand_to_true<__enable_if_t<_Pred::value>...> __and_helper(int);
 
-template <>
-struct _AndImpl {
-  template 
-  using _Result _LIBCPP_NODEBUG = _Res;
-};
+template 
+false_type __and_helper(...);
 
 // _And always performs lazy evaluation of its arguments.
 //
 // However, `_And<_Pred...>` itself will evaluate its result immediately 
(without having to
 // be instantiated) since it is an alias, unlike `conjunction<_Pred...>`, 
which is a struct.
 // If you want to defer the evaluation of `_And<_Pred...>` itself, use 
`_Lazy<_And, _Pred...>`.
-template 
-using _And _LIBCPP_NODEBUG = typename _AndImpl::template _Result;
+template 
+using _And _LIBCPP_NODEBUG = decltype(std::__and_helper<_Pred...>(0));
 
 template 
 struct __all_dummy;
@@ -51,11 +46,22 @@ struct __all : _IsSame<__all_dummy<_Pred...>, 
__all_dummy<((void)_Pred, true)...
 
 #if _LIBCPP_STD_VER >= 17
 
-template 
-struct _LIBCPP_NO_SPECIALIZATIONS conjunction : _And<_Args...> {};
+template 
+struct _LIBCPP_NO_SPECIALIZATIONS conjunction : true_type {};
+
+_LIBCPP_DIAGNOSTIC_PUSH
+#  if __has_warning("-Winvalid-specialization")
+_LIBCPP_CLANG_DIAGNOSTIC_IGNORED("-Winvalid-specialization")
+#  endif
+template 
+struct conjunction<_Arg> : _Arg {};
+
+template 
+struct conjunction<_Arg, _Args...> : conditional_t> {};
+_LIBCPP_DIAGNOSTIC_POP
 
 template 
-_LIBCPP_NO_SPECIALIZATIONS inline constexpr bool conjunction_v = 
_And<_Args...>::value;
+_LIBCPP_NO_SPECIALIZATIONS inline constexpr bool conjunction_v = 
conjunction<_Args...>::value;
 
 #endif // _LIBCPP_STD_VER >= 17
 

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/20.x: AMDGPU: Handle gfx950 XDL Write-VGPR-VALU-WAW wait state change (#126132) (PR #126847)

2025-02-18 Thread via llvm-branch-commits


github-actions[bot] wrote:

@arsenm (or anyone else). If you would like to add a note about this fix in the 
release notes (completely optional). Please reply to this comment with a one or 
two sentence description of the fix.  When you are done, please add the 
release:note label to this PR. 

https://github.com/llvm/llvm-project/pull/126847
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/20.x: AMDGPU: Handle gfx950 XDL Write-VGPR-VALU-WAW wait state change (#126132) (PR #126847)

2025-02-18 Thread Tom Stellard via llvm-branch-commits


https://github.com/tstellar closed 
https://github.com/llvm/llvm-project/pull/126847
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libcxx] release/20.x: Revert "[libc++] Reduce std::conjunction overhead (#124259)" (PR #127677)

2025-02-18 Thread Tom Stellard via llvm-branch-commits


https://github.com/tstellar closed 
https://github.com/llvm/llvm-project/pull/127677
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libcxx] 0429535 - Revert "[libc++] Reduce std::conjunction overhead (#124259)"

2025-02-18 Thread Tom Stellard via llvm-branch-commits


Author: Nikolas Klauser
Date: 2025-02-18T21:56:43-08:00
New Revision: 04295354b0f0f101773c4f9437680d45a70bab24

URL: 
https://github.com/llvm/llvm-project/commit/04295354b0f0f101773c4f9437680d45a70bab24
DIFF: 
https://github.com/llvm/llvm-project/commit/04295354b0f0f101773c4f9437680d45a70bab24.diff

LOG: Revert "[libc++] Reduce std::conjunction overhead (#124259)"

It turns out that the new implementation takes significantly more stack
memory for some reason.

This reverts commit 2696e4fb9567d23ce065a067e7f4909b310daf50.

(cherry picked from commit 0227396417d4625bc93affdd8957ff8d90c76299)

Added: 


Modified: 
libcxx/include/__type_traits/conjunction.h

Removed: 




diff  --git a/libcxx/include/__type_traits/conjunction.h 
b/libcxx/include/__type_traits/conjunction.h
index 6b6717a50a468..ad9656acd47ec 100644
--- a/libcxx/include/__type_traits/conjunction.h
+++ b/libcxx/include/__type_traits/conjunction.h
@@ -10,6 +10,8 @@
 #define _LIBCPP___TYPE_TRAITS_CONJUNCTION_H
 
 #include <__config>
+#include <__type_traits/conditional.h>
+#include <__type_traits/enable_if.h>
 #include <__type_traits/integral_constant.h>
 #include <__type_traits/is_same.h>
 
@@ -19,29 +21,22 @@
 
 _LIBCPP_BEGIN_NAMESPACE_STD
 
-template 
-struct _AndImpl;
+template 
+using __expand_to_true _LIBCPP_NODEBUG = true_type;
 
-template <>
-struct _AndImpl {
-  template 
-  using _Result _LIBCPP_NODEBUG =
-  typename _AndImpl::template _Result<_First, _Rest...>;
-};
+template 
+__expand_to_true<__enable_if_t<_Pred::value>...> __and_helper(int);
 
-template <>
-struct _AndImpl {
-  template 
-  using _Result _LIBCPP_NODEBUG = _Res;
-};
+template 
+false_type __and_helper(...);
 
 // _And always performs lazy evaluation of its arguments.
 //
 // However, `_And<_Pred...>` itself will evaluate its result immediately 
(without having to
 // be instantiated) since it is an alias, unlike `conjunction<_Pred...>`, 
which is a struct.
 // If you want to defer the evaluation of `_And<_Pred...>` itself, use 
`_Lazy<_And, _Pred...>`.
-template 
-using _And _LIBCPP_NODEBUG = typename _AndImpl::template _Result;
+template 
+using _And _LIBCPP_NODEBUG = decltype(std::__and_helper<_Pred...>(0));
 
 template 
 struct __all_dummy;
@@ -51,11 +46,22 @@ struct __all : _IsSame<__all_dummy<_Pred...>, 
__all_dummy<((void)_Pred, true)...
 
 #if _LIBCPP_STD_VER >= 17
 
-template 
-struct _LIBCPP_NO_SPECIALIZATIONS conjunction : _And<_Args...> {};
+template 
+struct _LIBCPP_NO_SPECIALIZATIONS conjunction : true_type {};
+
+_LIBCPP_DIAGNOSTIC_PUSH
+#  if __has_warning("-Winvalid-specialization")
+_LIBCPP_CLANG_DIAGNOSTIC_IGNORED("-Winvalid-specialization")
+#  endif
+template 
+struct conjunction<_Arg> : _Arg {};
+
+template 
+struct conjunction<_Arg, _Args...> : conditional_t> {};
+_LIBCPP_DIAGNOSTIC_POP
 
 template 
-_LIBCPP_NO_SPECIALIZATIONS inline constexpr bool conjunction_v = 
_And<_Args...>::value;
+_LIBCPP_NO_SPECIALIZATIONS inline constexpr bool conjunction_v = 
conjunction<_Args...>::value;
 
 #endif // _LIBCPP_STD_VER >= 17
 



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/20.x: AMDGPU: Stop emitting an error on illegal addrspacecasts (PR #127751)

2025-02-18 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm milestoned 
https://github.com/llvm/llvm-project/pull/127751
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/20.x: AMDGPU: Stop emitting an error on illegal addrspacecasts (#127487) (PR #127496)

2025-02-18 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

Opened manual version in #127751

https://github.com/llvm/llvm-project/pull/127496
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/20.x: AMDGPU: Stop emitting an error on illegal addrspacecasts (PR #127751)

2025-02-18 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm edited 
https://github.com/llvm/llvm-project/pull/127751
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libcxx] release/20.x: Revert "[libc++] Reduce std::conjunction overhead (#124259)" (PR #127677)

2025-02-18 Thread via llvm-branch-commits


github-actions[bot] wrote:

@philnik777 (or anyone else). If you would like to add a note about this fix in 
the release notes (completely optional). Please reply to this comment with a 
one or two sentence description of the fix.  When you are done, please add the 
release:note label to this PR. 

https://github.com/llvm/llvm-project/pull/127677
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/20.x: AMDGPU: Stop emitting an error on illegal addrspacecasts (PR #127751)

2025-02-18 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/127751

Backport 
https://github.com/llvm/llvm-project/commit/18ea6c928088cf9ad2a990bfcca546c608825a7f

Requested by: @arsenm

Required rerunning update_llc_test_checks on the test 

>From 6aed1a5fcab4655cda93b0b433987d12401f9925 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 17 Feb 2025 21:03:50 +0700
Subject: [PATCH 1/2] AMDGPU: Stop emitting an error on illegal addrspacecasts
 (#127487)

These cannot be static compile errors, and should be treated as
poison. Invalid casts may be introduced which are dynamically dead.

For example:

```
  void foo(volatile generic int* x) {
__builtin_assume(is_shared(x));
*x = 4;
  }

  void bar() {
private int y;
foo(&y); // violation, wrong address space
  }
```

This could produce a compile time backend error or not depending on
the optimization level. Similarly, the new test demonstrates a failure
on a lowered atomicrmw which required inserting runtime address
space checks. The invalid cases are dynamically dead, we should not
error, and the AtomicExpand pass shouldn't have to consider the details
of the incoming pointer to produce valid IR.

This should go to the release branch. This fixes broken -O0 compiles
with 64-bit atomics which would have started failing in
1d0370872f28ec9965448f33db1b105addaf64ae.

(cherry picked from commit 18ea6c928088cf9ad2a990bfcca546c608825a7f)
---
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |   7 +-
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |   7 +-
 llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll  | 646 ++
 .../CodeGen/AMDGPU/invalid-addrspacecast.ll   |  44 +-
 4 files changed, 687 insertions(+), 17 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index e9e47eaadd557..e84f0f5fa615a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -2426,11 +2426,8 @@ bool AMDGPULegalizerInfo::legalizeAddrSpaceCast(
 return true;
   }
 
-  DiagnosticInfoUnsupported InvalidAddrSpaceCast(
-  MF.getFunction(), "invalid addrspacecast", B.getDebugLoc());
-
-  LLVMContext &Ctx = MF.getFunction().getContext();
-  Ctx.diagnose(InvalidAddrSpaceCast);
+  // Invalid casts are poison.
+  // TODO: Should return poison
   B.buildUndef(Dst);
   MI.eraseFromParent();
   return true;
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index b632c50dae0e3..e09df53995d61 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -7340,11 +7340,8 @@ SDValue SITargetLowering::lowerADDRSPACECAST(SDValue Op,
 
   // global <-> flat are no-ops and never emitted.
 
-  const MachineFunction &MF = DAG.getMachineFunction();
-  DiagnosticInfoUnsupported InvalidAddrSpaceCast(
-  MF.getFunction(), "invalid addrspacecast", SL.getDebugLoc());
-  DAG.getContext()->diagnose(InvalidAddrSpaceCast);
-
+  // Invalid casts are poison.
+  // TODO: Should return poison
   return DAG.getUNDEF(Op->getValueType(0));
 }
 
diff --git a/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll 
b/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
index f5c9b1a79b476..5c62730fdfe8e 100644
--- a/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
+++ b/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
@@ -444,6 +444,652 @@ define float @no_unsafe(ptr %addr, float %val) {
   ret float %res
 }
 
+@global = hidden addrspace(1) global i64 0, align 8
+
+; Make sure there is no error on an invalid addrspacecast without optimizations
+define i64 @optnone_atomicrmw_add_i64_expand(i64 %val) #1 {
+; GFX908-LABEL: optnone_atomicrmw_add_i64_expand:
+; GFX908:   ; %bb.0:
+; GFX908-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX908-NEXT:s_mov_b64 s[4:5], src_private_base
+; GFX908-NEXT:s_mov_b32 s6, 32
+; GFX908-NEXT:s_lshr_b64 s[4:5], s[4:5], s6
+; GFX908-NEXT:s_getpc_b64 s[6:7]
+; GFX908-NEXT:s_add_u32 s6, s6, global@rel32@lo+4
+; GFX908-NEXT:s_addc_u32 s7, s7, global@rel32@hi+12
+; GFX908-NEXT:s_cmp_eq_u32 s7, s4
+; GFX908-NEXT:s_cselect_b64 s[4:5], -1, 0
+; GFX908-NEXT:v_cndmask_b32_e64 v2, 0, 1, s[4:5]
+; GFX908-NEXT:s_mov_b64 s[4:5], -1
+; GFX908-NEXT:s_mov_b32 s6, 1
+; GFX908-NEXT:v_cmp_ne_u32_e64 s[6:7], v2, s6
+; GFX908-NEXT:s_and_b64 vcc, exec, s[6:7]
+; GFX908-NEXT:; implicit-def: $vgpr3_vgpr4
+; GFX908-NEXT:s_cbranch_vccnz .LBB4_3
+; GFX908-NEXT:  .LBB4_1: ; %Flow
+; GFX908-NEXT:v_cndmask_b32_e64 v2, 0, 1, s[4:5]
+; GFX908-NEXT:s_mov_b32 s4, 1
+; GFX908-NEXT:v_cmp_ne_u32_e64 s[4:5], v2, s4
+; GFX908-NEXT:s_and_b64 vcc, exec, s[4:5]
+; GFX908-NEXT:s_cbranch_vccnz .LBB4_4
+; GFX908-NEXT:  ; %bb.2: ; %atomicrmw.private
+; GFX908-NEXT:s_waitcnt lgkmcnt(0)
+; GFX908-NEXT:buffer_load_dword v3, v0, s[0:3], 0 offen
+; GFX908-NEXT:s_waitcnt vmcnt(0)
+; GFX908-NEXT:v_mo

[llvm-branch-commits] [llvm] release/20.x: AMDGPU: Stop emitting an error on illegal addrspacecasts (PR #127751)

2025-02-18 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

Backport 
https://github.com/llvm/llvm-project/commit/18ea6c928088cf9ad2a990bfcca546c608825a7f

Requested by: @arsenm

Required rerunning update_llc_test_checks on the test 

---

Patch is 38.85 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/127751.diff


4 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp (+2-5) 
- (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+2-5) 
- (modified) llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll (+755) 
- (modified) llvm/test/CodeGen/AMDGPU/invalid-addrspacecast.ll (+37-7) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index e9e47eaadd557..e84f0f5fa615a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -2426,11 +2426,8 @@ bool AMDGPULegalizerInfo::legalizeAddrSpaceCast(
 return true;
   }
 
-  DiagnosticInfoUnsupported InvalidAddrSpaceCast(
-  MF.getFunction(), "invalid addrspacecast", B.getDebugLoc());
-
-  LLVMContext &Ctx = MF.getFunction().getContext();
-  Ctx.diagnose(InvalidAddrSpaceCast);
+  // Invalid casts are poison.
+  // TODO: Should return poison
   B.buildUndef(Dst);
   MI.eraseFromParent();
   return true;
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index b632c50dae0e3..e09df53995d61 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -7340,11 +7340,8 @@ SDValue SITargetLowering::lowerADDRSPACECAST(SDValue Op,
 
   // global <-> flat are no-ops and never emitted.
 
-  const MachineFunction &MF = DAG.getMachineFunction();
-  DiagnosticInfoUnsupported InvalidAddrSpaceCast(
-  MF.getFunction(), "invalid addrspacecast", SL.getDebugLoc());
-  DAG.getContext()->diagnose(InvalidAddrSpaceCast);
-
+  // Invalid casts are poison.
+  // TODO: Should return poison
   return DAG.getUNDEF(Op->getValueType(0));
 }
 
diff --git a/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll 
b/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
index f5c9b1a79b476..9b446896db590 100644
--- a/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
+++ b/llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
@@ -444,6 +444,761 @@ define float @no_unsafe(ptr %addr, float %val) {
   ret float %res
 }
 
+@global = hidden addrspace(1) global i64 0, align 8
+
+; Make sure there is no error on an invalid addrspacecast without optimizations
+define i64 @optnone_atomicrmw_add_i64_expand(i64 %val) #1 {
+; GFX908-LABEL: optnone_atomicrmw_add_i64_expand:
+; GFX908:   ; %bb.0:
+; GFX908-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX908-NEXT:s_mov_b64 s[4:5], src_private_base
+; GFX908-NEXT:s_mov_b32 s6, 32
+; GFX908-NEXT:s_lshr_b64 s[4:5], s[4:5], s6
+; GFX908-NEXT:s_getpc_b64 s[6:7]
+; GFX908-NEXT:s_add_u32 s6, s6, global@rel32@lo+4
+; GFX908-NEXT:s_addc_u32 s7, s7, global@rel32@hi+12
+; GFX908-NEXT:s_cmp_eq_u32 s7, s4
+; GFX908-NEXT:s_cselect_b64 s[4:5], -1, 0
+; GFX908-NEXT:v_cndmask_b32_e64 v2, 0, 1, s[4:5]
+; GFX908-NEXT:s_mov_b64 s[4:5], -1
+; GFX908-NEXT:s_mov_b32 s6, 1
+; GFX908-NEXT:v_cmp_ne_u32_e64 s[6:7], v2, s6
+; GFX908-NEXT:s_and_b64 vcc, exec, s[6:7]
+; GFX908-NEXT:; implicit-def: $vgpr3_vgpr4
+; GFX908-NEXT:s_cbranch_vccnz .LBB4_3
+; GFX908-NEXT:  .LBB4_1: ; %Flow
+; GFX908-NEXT:v_cndmask_b32_e64 v2, 0, 1, s[4:5]
+; GFX908-NEXT:s_mov_b32 s4, 1
+; GFX908-NEXT:v_cmp_ne_u32_e64 s[4:5], v2, s4
+; GFX908-NEXT:s_and_b64 vcc, exec, s[4:5]
+; GFX908-NEXT:s_cbranch_vccnz .LBB4_4
+; GFX908-NEXT:  ; %bb.2: ; %atomicrmw.private
+; GFX908-NEXT:s_waitcnt lgkmcnt(0)
+; GFX908-NEXT:buffer_load_dword v3, v0, s[0:3], 0 offen
+; GFX908-NEXT:s_waitcnt vmcnt(0)
+; GFX908-NEXT:v_mov_b32_e32 v4, v3
+; GFX908-NEXT:v_add_co_u32_e64 v0, s[4:5], v3, v0
+; GFX908-NEXT:v_addc_co_u32_e64 v1, s[4:5], v4, v1, s[4:5]
+; GFX908-NEXT:buffer_store_dword v1, v0, s[0:3], 0 offen
+; GFX908-NEXT:buffer_store_dword v0, v0, s[0:3], 0 offen
+; GFX908-NEXT:s_branch .LBB4_4
+; GFX908-NEXT:  .LBB4_3: ; %atomicrmw.global
+; GFX908-NEXT:s_getpc_b64 s[4:5]
+; GFX908-NEXT:s_add_u32 s4, s4, global@rel32@lo+4
+; GFX908-NEXT:s_addc_u32 s5, s5, global@rel32@hi+12
+; GFX908-NEXT:v_mov_b32_e32 v2, s4
+; GFX908-NEXT:v_mov_b32_e32 v3, s5
+; GFX908-NEXT:flat_atomic_add_x2 v[3:4], v[2:3], v[0:1] glc
+; GFX908-NEXT:s_mov_b64 s[4:5], 0
+; GFX908-NEXT:s_branch .LBB4_1
+; GFX908-NEXT:  .LBB4_4: ; %atomicrmw.phi
+; GFX908-NEXT:  ; %bb.5: ; %atomicrmw.end
+; GFX908-NEXT:s_mov_b32 s4, 32
+; GFX908-NEXT:s_waitcnt vmcnt(0) lgkmcnt(0)
+; GFX908-NEXT:v_lshrrev_b64 v[1:2], s4, v[3:4]
+; GFX908-NEXT:v_mov_b32_e32 v0, v3
+; GFX908-NEXT:s_setpc_b64 s[30:31]
+;
+; GFX90A-LABEL

[llvm-branch-commits] [llvm] release/20.x: AMDGPU: Handle gfx950 XDL Write-VGPR-VALU-WAW wait state change (#126132) (PR #126847)

2025-02-18 Thread via llvm-branch-commits


https://github.com/llvmbot updated 
https://github.com/llvm/llvm-project/pull/126847

>From e503227bc57625a0a22b450f5bd3e78df96ca4fe Mon Sep 17 00:00:00 2001
From: Vigneshwar Jayakumar 
Date: Tue, 11 Feb 2025 12:32:23 -0600
Subject: [PATCH] AMDGPU: Handle gfx950 XDL Write-VGPR-VALU-WAW wait state
 change (#126132)

There are additional wait states for XDL write VALU WAW hazard in gfx950
compared to gfx940.

(cherry picked from commit 1188b1ff7b956cb65d8ddda5f1e56c432f1a57c7)
---
 .../lib/Target/AMDGPU/GCNHazardRecognizer.cpp | 17 +++--
 .../CodeGen/AMDGPU/mai-hazards-gfx940.mir | 24 ---
 2 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp 
b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
index 537181710ed32..646663a92e5e8 100644
--- a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
@@ -2605,12 +2605,14 @@ static int 
GFX940_SMFMA_N_PassWriteVgprVALUWawWaitStates(int NumPasses) {
   return NumPasses + 2;
 }
 
-static int GFX940_XDL_N_PassWriteVgprVALUWawWaitStates(int NumPasses) {
-  // 2 pass -> 5
-  // 4 pass -> 7
-  // 8 pass -> 11
-  // 16 pass -> 19
-  return NumPasses + 3;
+static int GFX940_XDL_N_PassWriteVgprVALUWawWaitStates(int NumPasses,
+   bool IsGFX950) {
+  // xdl def cycles | gfx940 | gfx950
+  // 2 pass |  55
+  // 4 pass |  78
+  // 8 pass |  11   12
+  // 16 pass|  19   20
+  return NumPasses + 3 + (NumPasses != 2 && IsGFX950);
 }
 
 static int GFX940_XDL_N_PassWriteVgprVALUMemExpReadWaitStates(int NumPasses,
@@ -2858,7 +2860,8 @@ int GCNHazardRecognizer::checkMAIVALUHazards(MachineInstr 
*MI) {
   } else if (ST.hasGFX940Insts()) {
 NeedWaitStates =
 isXDL(ST, *MFMA)
-? GFX940_XDL_N_PassWriteVgprVALUWawWaitStates(NumPasses)
+? GFX940_XDL_N_PassWriteVgprVALUWawWaitStates(
+  NumPasses, ST.hasGFX950Insts())
 : GFX940_SMFMA_N_PassWriteVgprVALUWawWaitStates(NumPasses);
   } else {
 switch (NumPasses) {
diff --git a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir 
b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
index ef30c9a44b2b5..0af37ad8c896e 100644
--- a/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
+++ b/llvm/test/CodeGen/AMDGPU/mai-hazards-gfx940.mir
@@ -958,7 +958,8 @@ body: |
 # GCN-LABEL: name: xdl_smfma16x16_write_vgpr_valu_write
 # GCN:  V_MFMA
 # GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 2
+# GFX940-NEXT: S_NOP 2
+# GFX950-NEXT: S_NOP 3
 # GCN-NEXT: V_MOV_B32
 name:xdl_smfma16x16_write_vgpr_valu_write
 body: |
@@ -970,7 +971,8 @@ body: |
 # GCN:  V_MFMA
 # GCN-NEXT: S_NOP 7
 # GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 2
+# GFX940-NEXT: S_NOP 2
+# GFX950-NEXT: S_NOP 3
 # GCN-NEXT: V_MOV_B32
 name:xdl_smfma32x32_write_vgpr_valu_write
 body: |
@@ -991,7 +993,8 @@ body: |
 # GCN-LABEL: name: xdl_smfma16x16_write_vgpr_valu_f16_write
 # GCN:  V_MFMA
 # GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 2
+# GFX940-NEXT: S_NOP 2
+# GFX950-NEXT: S_NOP 3
 # GCN-NEXT: V_FMA_F16_e64
 name:xdl_smfma16x16_write_vgpr_valu_f16_write
 body: |
@@ -1003,7 +1006,8 @@ body: |
 # GCN:  V_MFMA
 # GCN-NEXT: S_NOP 7
 # GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 2
+# GFX940-NEXT: S_NOP 2
+# GFX950-NEXT: S_NOP 3
 # GCN-NEXT: V_FMA_F16_e64
 name:xdl_smfma32x32_write_vgpr_valu_f16_write
 body: |
@@ -1024,7 +1028,8 @@ body: |
 # GCN-LABEL: name: xdl_smfma16x16_write_vgpr_valu_sdwa_write
 # GCN:  V_MFMA
 # GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 2
+# GFX940-NEXT: S_NOP 2
+# GFX950-NEXT: S_NOP 3
 # GCN-NEXT: V_MOV_B32_sdwa
 name:xdl_smfma16x16_write_vgpr_valu_sdwa_write
 body: |
@@ -1761,7 +1766,8 @@ body: |
 ...
 # GCN-LABEL: name: xdl_sgemm16X16X16_mfma_write_vgpr_valu_write
 # GCN:  V_MFMA
-# GCN-NEXT: S_NOP 6
+# GFX940-NEXT: S_NOP 6
+# GFX950-NEXT: S_NOP 7
 # GCN-NEXT: V_MOV_B32
 name:xdl_sgemm16X16X16_mfma_write_vgpr_valu_write
 body: |
@@ -2072,7 +2078,8 @@ body: |
 ...
 # GCN-LABEL: name: smfmac16x16_read_vgpr_srcc_valu_write
 # GCN:  V_SMFMAC
-# GCN-NEXT: S_NOP 6
+# GFX940-NEXT: S_NOP 6
+# GFX950-NEXT: S_NOP 7
 # GCN-NEXT: V_MOV_B32
 name:smfmac16x16_read_vgpr_srcc_valu_write
 body: |
@@ -2102,7 +2109,8 @@ body: |
 # GCN-LABEL: name: smfmac32x32_read_vgpr_srcc_valu_write
 # GCN:  V_SMFMAC
 # GCN-NEXT: S_NOP 7
-# GCN-NEXT: S_NOP 2
+# GFX940-NEXT: S_NOP 2
+# GFX950-NEXT: S_NOP 3
 # GCN-NEXT: V_MOV_B32
 name:smfmac32x32_read_vgpr_srcc_valu_write
 body: |

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https:

[llvm-branch-commits] [libcxx] release/20.x: [libc++] Set feature-test macro `__cpp_lib_atomic_float` (#127559) (PR #127732)

2025-02-18 Thread via llvm-branch-commits


https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/127732

Backport 2207e3e32549306bf563c6987f790cabe8d4ea78

Requested by: @frederick-vs-ja

>From 3cf1f7f66db80ed3f74b91574d5ed5604fc74b3f Mon Sep 17 00:00:00 2001
From: "A. Jiang" 
Date: Wed, 19 Feb 2025 09:06:51 +0800
Subject: [PATCH] [libc++] Set feature-test macro `__cpp_lib_atomic_float`
 (#127559)

The corresponding feature was implemented in LLVM 18 (by #67799), but
this FTM wasn't added before.

(cherry picked from commit 2207e3e32549306bf563c6987f790cabe8d4ea78)
---
 libcxx/docs/FeatureTestMacroTable.rst |  2 +-
 libcxx/docs/Status/Cxx20Papers.csv|  2 +-
 libcxx/include/version|  2 +-
 .../atomic.version.compile.pass.cpp   | 48 ++-
 .../version.version.compile.pass.cpp  | 48 ++-
 .../generate_feature_test_macro_components.py |  1 -
 6 files changed, 33 insertions(+), 70 deletions(-)

diff --git a/libcxx/docs/FeatureTestMacroTable.rst 
b/libcxx/docs/FeatureTestMacroTable.rst
index ccaa784ccb088..dcf9838edd74b 100644
--- a/libcxx/docs/FeatureTestMacroTable.rst
+++ b/libcxx/docs/FeatureTestMacroTable.rst
@@ -174,7 +174,7 @@ Status
 -- 
-
 ``__cpp_lib_atomic_flag_test`` ``201907L``
 -- 
-
-``__cpp_lib_atomic_float`` *unimplemented*
+``__cpp_lib_atomic_float`` ``201711L``
 -- 
-
 ``__cpp_lib_atomic_lock_free_type_aliases````201907L``
 -- 
-
diff --git a/libcxx/docs/Status/Cxx20Papers.csv 
b/libcxx/docs/Status/Cxx20Papers.csv
index 524c6d0ac8be0..b595da3728841 100644
--- a/libcxx/docs/Status/Cxx20Papers.csv
+++ b/libcxx/docs/Status/Cxx20Papers.csv
@@ -2,7 +2,7 @@
 "`P0463R1 `__","Endian just Endian","2017-07 
(Toronto)","|Complete|","7",""
 "`P0674R1 `__","Extending make_shared to Support 
Arrays","2017-07 (Toronto)","|Complete|","15",""
 "","","","","",""
-"`P0020R6 `__","Floating Point Atomic","2017-11 
(Albuquerque)","|Complete|","18",""
+"`P0020R6 `__","Floating Point Atomic","2017-11 
(Albuquerque)","|Complete|","18","The feature-test macro was not set until LLVM 
20."
 "`P0053R7 `__","C++ Synchronized Buffered 
Ostream","2017-11 (Albuquerque)","|Complete|","18",""
 "`P0202R3 `__","Add constexpr modifiers to 
functions in  and  Headers","2017-11 
(Albuquerque)","|Complete|","12",""
 "`P0415R1 `__","Constexpr for ``std::complex``\ 
","2017-11 (Albuquerque)","|Complete|","16",""
diff --git a/libcxx/include/version b/libcxx/include/version
index c5966b90c061d..63ead9fd5d29d 100644
--- a/libcxx/include/version
+++ b/libcxx/include/version
@@ -378,7 +378,7 @@ __cpp_lib_void_t
201411L 
 # define __cpp_lib_array_constexpr  201811L
 # define __cpp_lib_assume_aligned   201811L
 # define __cpp_lib_atomic_flag_test 201907L
-// # define __cpp_lib_atomic_float 201711L
+# define __cpp_lib_atomic_float 201711L
 # define __cpp_lib_atomic_lock_free_type_aliases201907L
 # define __cpp_lib_atomic_ref   201806L
 // # define __cpp_lib_atomic_shared_ptr201711L
diff --git 
a/libcxx/test/std/language.support/support.limits/support.limits.general/atomic.version.compile.pass.cpp
 
b/libcxx/test/std/language.support/support.limits/support.limits.general/atomic.version.compile.pass.cpp
index 9ed18fbfe19ac..5a21e6320bffe 100644
--- 
a/libcxx/test/std/language.support/support.limits/support.limits.general/atomic.version.compile.pass.cpp
+++ 
b/libcxx/test/std/language.support/support.limits/support.limits.general/atomic.version.compile.pass.cpp
@@ -169,17 +169,11 @@
 #   error "__cpp_lib_atomic_flag_test should have the value 201907L in c++20"
 # endif
 
-# if !defined(_LIBCPP_VERSION)
-#   ifndef __cpp_lib_atomic_float
-# error "__cpp_lib_atomic_float should be defined in c++20"
-#   endif
-#   if __cpp_lib_atomic_float != 201711L
-# error "__cpp_lib_atomic_float should have the value 201711L in c++20"
-#   endif
-# else // _LIBCPP_VERSION
-#   ifdef __cpp_lib_atomic_float
-# error "__cpp_lib_atomic_float should not be defined because it is 
unimplemented in libc++!"
-#   endif
+# ifndef __cpp_lib_atomic_float
+#   error "__cpp_lib_atomic_float should be defined in c++20"
+# endif
+# if __cpp_lib_atomic_float != 201711L
+#   error "__cpp_lib_atomi

[llvm-branch-commits] [libcxx] release/20.x: [libc++] Set feature-test macro `__cpp_lib_atomic_float` (#127559) (PR #127732)

2025-02-18 Thread via llvm-branch-commits


llvmbot wrote:

@philnik777 What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/127732
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libcxx] release/20.x: [libc++] Set feature-test macro `__cpp_lib_atomic_float` (#127559) (PR #127732)

2025-02-18 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-libcxx

Author: None (llvmbot)


Changes

Backport 2207e3e32549306bf563c6987f790cabe8d4ea78

Requested by: @frederick-vs-ja

---
Full diff: https://github.com/llvm/llvm-project/pull/127732.diff


6 Files Affected:

- (modified) libcxx/docs/FeatureTestMacroTable.rst (+1-1) 
- (modified) libcxx/docs/Status/Cxx20Papers.csv (+1-1) 
- (modified) libcxx/include/version (+1-1) 
- (modified) 
libcxx/test/std/language.support/support.limits/support.limits.general/atomic.version.compile.pass.cpp
 (+15-33) 
- (modified) 
libcxx/test/std/language.support/support.limits/support.limits.general/version.version.compile.pass.cpp
 (+15-33) 
- (modified) libcxx/utils/generate_feature_test_macro_components.py (-1) 


``diff
diff --git a/libcxx/docs/FeatureTestMacroTable.rst 
b/libcxx/docs/FeatureTestMacroTable.rst
index ccaa784ccb088..dcf9838edd74b 100644
--- a/libcxx/docs/FeatureTestMacroTable.rst
+++ b/libcxx/docs/FeatureTestMacroTable.rst
@@ -174,7 +174,7 @@ Status
 -- 
-
 ``__cpp_lib_atomic_flag_test`` ``201907L``
 -- 
-
-``__cpp_lib_atomic_float`` *unimplemented*
+``__cpp_lib_atomic_float`` ``201711L``
 -- 
-
 ``__cpp_lib_atomic_lock_free_type_aliases````201907L``
 -- 
-
diff --git a/libcxx/docs/Status/Cxx20Papers.csv 
b/libcxx/docs/Status/Cxx20Papers.csv
index 524c6d0ac8be0..b595da3728841 100644
--- a/libcxx/docs/Status/Cxx20Papers.csv
+++ b/libcxx/docs/Status/Cxx20Papers.csv
@@ -2,7 +2,7 @@
 "`P0463R1 `__","Endian just Endian","2017-07 
(Toronto)","|Complete|","7",""
 "`P0674R1 `__","Extending make_shared to Support 
Arrays","2017-07 (Toronto)","|Complete|","15",""
 "","","","","",""
-"`P0020R6 `__","Floating Point Atomic","2017-11 
(Albuquerque)","|Complete|","18",""
+"`P0020R6 `__","Floating Point Atomic","2017-11 
(Albuquerque)","|Complete|","18","The feature-test macro was not set until LLVM 
20."
 "`P0053R7 `__","C++ Synchronized Buffered 
Ostream","2017-11 (Albuquerque)","|Complete|","18",""
 "`P0202R3 `__","Add constexpr modifiers to 
functions in  and  Headers","2017-11 
(Albuquerque)","|Complete|","12",""
 "`P0415R1 `__","Constexpr for ``std::complex``\ 
","2017-11 (Albuquerque)","|Complete|","16",""
diff --git a/libcxx/include/version b/libcxx/include/version
index c5966b90c061d..63ead9fd5d29d 100644
--- a/libcxx/include/version
+++ b/libcxx/include/version
@@ -378,7 +378,7 @@ __cpp_lib_void_t
201411L 
 # define __cpp_lib_array_constexpr  201811L
 # define __cpp_lib_assume_aligned   201811L
 # define __cpp_lib_atomic_flag_test 201907L
-// # define __cpp_lib_atomic_float 201711L
+# define __cpp_lib_atomic_float 201711L
 # define __cpp_lib_atomic_lock_free_type_aliases201907L
 # define __cpp_lib_atomic_ref   201806L
 // # define __cpp_lib_atomic_shared_ptr201711L
diff --git 
a/libcxx/test/std/language.support/support.limits/support.limits.general/atomic.version.compile.pass.cpp
 
b/libcxx/test/std/language.support/support.limits/support.limits.general/atomic.version.compile.pass.cpp
index 9ed18fbfe19ac..5a21e6320bffe 100644
--- 
a/libcxx/test/std/language.support/support.limits/support.limits.general/atomic.version.compile.pass.cpp
+++ 
b/libcxx/test/std/language.support/support.limits/support.limits.general/atomic.version.compile.pass.cpp
@@ -169,17 +169,11 @@
 #   error "__cpp_lib_atomic_flag_test should have the value 201907L in c++20"
 # endif
 
-# if !defined(_LIBCPP_VERSION)
-#   ifndef __cpp_lib_atomic_float
-# error "__cpp_lib_atomic_float should be defined in c++20"
-#   endif
-#   if __cpp_lib_atomic_float != 201711L
-# error "__cpp_lib_atomic_float should have the value 201711L in c++20"
-#   endif
-# else // _LIBCPP_VERSION
-#   ifdef __cpp_lib_atomic_float
-# error "__cpp_lib_atomic_float should not be defined because it is 
unimplemented in libc++!"
-#   endif
+# ifndef __cpp_lib_atomic_float
+#   error "__cpp_lib_atomic_float should be defined in c++20"
+# endif
+# if __cpp_lib_atomic_float != 201711L
+#   error "__cpp_lib_atomic_float should have the value 201711L in c++20"
 # endif
 
 # ifndef __cpp_lib_atomic_is_always_lock_free
@@ -262,17 +256,11 @@
 #   error "__cpp_lib_atomic_flag_test should have the value 201907L in c++23"
 # e

[llvm-branch-commits] [libcxx] release/20.x: [libc++] Set feature-test macro `__cpp_lib_atomic_float` (#127559) (PR #127732)

2025-02-18 Thread via llvm-branch-commits


https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/127732
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [MLIR][OpenMP] Add Lowering support for OpenMP custom mappers in map clause (PR #121001)

2025-02-18 Thread Akash Banerjee via llvm-branch-commits


https://github.com/TIFitis updated 
https://github.com/llvm/llvm-project/pull/121001

>From 2121c61420db36438293ae2df8b297f70ab4b61c Mon Sep 17 00:00:00 2001
From: Akash Banerjee 
Date: Mon, 23 Dec 2024 21:13:42 +
Subject: [PATCH 1/5] Add flang lowering changes for mapper field in map
 clause.

---
 flang/lib/Lower/OpenMP/ClauseProcessor.cpp  | 32 +
 flang/lib/Lower/OpenMP/ClauseProcessor.h|  3 +-
 flang/test/Lower/OpenMP/Todo/map-mapper.f90 | 16 ---
 flang/test/Lower/OpenMP/map-mapper.f90  | 23 +++
 4 files changed, 52 insertions(+), 22 deletions(-)
 delete mode 100644 flang/test/Lower/OpenMP/Todo/map-mapper.f90
 create mode 100644 flang/test/Lower/OpenMP/map-mapper.f90

diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp 
b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
index febc6adcf9d6f..467a0dcebf2b8 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
@@ -969,8 +969,10 @@ void ClauseProcessor::processMapObjects(
 llvm::omp::OpenMPOffloadMappingFlags mapTypeBits,
 std::map &parentMemberIndices,
 llvm::SmallVectorImpl &mapVars,
-llvm::SmallVectorImpl &mapSyms) const {
+llvm::SmallVectorImpl &mapSyms,
+std::string mapperIdName) const {
   fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
+  mlir::FlatSymbolRefAttr mapperId;
 
   for (const omp::Object &object : objects) {
 llvm::SmallVector bounds;
@@ -1003,6 +1005,20 @@ void ClauseProcessor::processMapObjects(
   }
 }
 
+if (!mapperIdName.empty()) {
+  if (mapperIdName == "default") {
+auto &typeSpec = object.sym()->owner().IsDerivedType()
+ ? *object.sym()->owner().derivedTypeSpec()
+ : object.sym()->GetType()->derivedTypeSpec();
+mapperIdName = typeSpec.name().ToString() + ".default";
+mapperIdName = converter.mangleName(mapperIdName, 
*typeSpec.GetScope());
+  }
+  assert(converter.getMLIRSymbolTable()->lookup(mapperIdName) &&
+ "mapper not found");
+  mapperId = mlir::FlatSymbolRefAttr::get(&converter.getMLIRContext(),
+  mapperIdName);
+  mapperIdName.clear();
+}
 // Explicit map captures are captured ByRef by default,
 // optimisation passes may alter this to ByCopy or other capture
 // types to optimise
@@ -1016,7 +1032,8 @@ void ClauseProcessor::processMapObjects(
 static_cast<
 std::underlying_type_t>(
 mapTypeBits),
-mlir::omp::VariableCaptureKind::ByRef, baseOp.getType());
+mlir::omp::VariableCaptureKind::ByRef, baseOp.getType(), false,
+mapperId);
 
 if (parentObj.has_value()) {
   parentMemberIndices[parentObj.value()].addChildIndexAndMapToParent(
@@ -1047,6 +1064,7 @@ bool ClauseProcessor::processMap(
 const auto &[mapType, typeMods, mappers, iterator, objects] = clause.t;
 llvm::omp::OpenMPOffloadMappingFlags mapTypeBits =
 llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_NONE;
+std::string mapperIdName;
 // If the map type is specified, then process it else Tofrom is the
 // default.
 Map::MapType type = mapType.value_or(Map::MapType::Tofrom);
@@ -1090,13 +1108,17 @@ bool ClauseProcessor::processMap(
"Support for iterator modifiers is not implemented yet");
 }
 if (mappers) {
-  TODO(currentLocation,
-   "Support for mapper modifiers is not implemented yet");
+  assert(mappers->size() == 1 && "more than one mapper");
+  mapperIdName = mappers->front().v.id().symbol->name().ToString();
+  if (mapperIdName != "default")
+mapperIdName = converter.mangleName(
+mapperIdName, mappers->front().v.id().symbol->owner());
 }
 
 processMapObjects(stmtCtx, clauseLocation,
   std::get(clause.t), mapTypeBits,
-  parentMemberIndices, result.mapVars, *ptrMapSyms);
+  parentMemberIndices, result.mapVars, *ptrMapSyms,
+  mapperIdName);
   };
 
   bool clauseFound = findRepeatableClause(process);
diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.h 
b/flang/lib/Lower/OpenMP/ClauseProcessor.h
index e05f66c766684..2b319e890a5ad 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.h
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.h
@@ -175,7 +175,8 @@ class ClauseProcessor {
   llvm::omp::OpenMPOffloadMappingFlags mapTypeBits,
   std::map &parentMemberIndices,
   llvm::SmallVectorImpl &mapVars,
-  llvm::SmallVectorImpl &mapSyms) const;
+  llvm::SmallVectorImpl &mapSyms,
+  std::string mapperIdName = "") const;
 
   lower::AbstractConverter &converter;
   semantics::SemanticsContext &semaCtx;
diff --git a/flang/test/Lower/OpenMP/Todo/map-mapper.f90 
b/flang/test/Lower/OpenMP/Todo/map-mapper.f90
deleted file mode 100644
index 9554ffd5fda7b..0
--- a/flang/test/Lowe

[llvm-branch-commits] [flang] [mlir] [MLIR][OpenMP] Add conversion support from FIR to LLVM Dialect for OMP DeclareMapper (PR #121005)

2025-02-18 Thread Akash Banerjee via llvm-branch-commits


https://github.com/TIFitis updated 
https://github.com/llvm/llvm-project/pull/121005

>From 77e95624798ddc1f07ab7af30962e7a128248bae Mon Sep 17 00:00:00 2001
From: Akash Banerjee 
Date: Mon, 23 Dec 2024 21:50:03 +
Subject: [PATCH 1/2] Add OpenMP to LLVM dialect conversion support for
 DeclareMapperOp.

---
 .../Fir/convert-to-llvm-openmp-and-fir.fir| 27 +--
 .../Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp  | 48 +++
 .../OpenMPToLLVM/convert-to-llvmir.mlir   | 13 +
 3 files changed, 74 insertions(+), 14 deletions(-)

diff --git a/flang/test/Fir/convert-to-llvm-openmp-and-fir.fir 
b/flang/test/Fir/convert-to-llvm-openmp-and-fir.fir
index 8e4e1fe824d9f..82f2aea3ad983 100644
--- a/flang/test/Fir/convert-to-llvm-openmp-and-fir.fir
+++ b/flang/test/Fir/convert-to-llvm-openmp-and-fir.fir
@@ -936,9 +936,9 @@ func.func @omp_map_info_descriptor_type_conversion(%arg0 : 
!fir.ref>, i32) 
map_clauses(tofrom) capture(ByRef) -> !fir.llvm_ptr> {name = ""}
   // CHECK: %[[DESC_MAP:.*]] = omp.map.info var_ptr(%[[ARG_0]] : !llvm.ptr, 
!llvm.struct<(ptr, i64, i32, i8, i8, i8, i8)>) map_clauses(always, delete) 
capture(ByRef) members(%[[MEMBER_MAP]] : [0] : !llvm.ptr) -> !llvm.ptr {name = 
""}
   %2 = omp.map.info var_ptr(%arg0 : !fir.ref>>, 
!fir.box>) map_clauses(always, delete) capture(ByRef) members(%1 
: [0] : !fir.llvm_ptr>) -> !fir.ref>> 
{name = ""}
-  // CHECK: omp.target_exit_data map_entries(%[[DESC_MAP]] : !llvm.ptr) 
+  // CHECK: omp.target_exit_data map_entries(%[[DESC_MAP]] : !llvm.ptr)
   omp.target_exit_data   map_entries(%2 : !fir.ref>>)
-  return 
+  return
 }
 
 // -
@@ -956,8 +956,8 @@ func.func 
@omp_map_info_derived_type_explicit_member_conversion(%arg0 : !fir.ref
   %3 = fir.field_index real, 
!fir.type<_QFderived_type{real:f32,array:!fir.array<10xi32>,int:i32}>
   %4 = fir.coordinate_of %arg0, %3 : 
(!fir.ref,int:i32}>>,
 !fir.field) -> !fir.ref
   // CHECK: %[[MAP_MEMBER_2:.*]] = omp.map.info var_ptr(%[[GEP_2]] : 
!llvm.ptr, f32) map_clauses(tofrom) capture(ByRef) -> !llvm.ptr {name = 
"dtype%real"}
-  %5 = omp.map.info var_ptr(%4 : !fir.ref, f32) map_clauses(tofrom) 
capture(ByRef) -> !fir.ref {name = "dtype%real"}
-  // CHECK: %[[MAP_PARENT:.*]] = omp.map.info var_ptr(%[[ARG_0]] : !llvm.ptr, 
!llvm.struct<"_QFderived_type", (f32, array<10 x i32>, i32)>) 
map_clauses(tofrom) capture(ByRef) members(%[[MAP_MEMBER_1]], %[[MAP_MEMBER_2]] 
: [2], [0] : !llvm.ptr, !llvm.ptr) -> !llvm.ptr {name = "dtype", partial_map = 
true} 
+  %5 = omp.map.info var_ptr(%4 : !fir.ref, f32) map_clauses(tofrom) 
capture(ByRef) -> !fir.ref {name = "dtype%real"}
+  // CHECK: %[[MAP_PARENT:.*]] = omp.map.info var_ptr(%[[ARG_0]] : !llvm.ptr, 
!llvm.struct<"_QFderived_type", (f32, array<10 x i32>, i32)>) 
map_clauses(tofrom) capture(ByRef) members(%[[MAP_MEMBER_1]], %[[MAP_MEMBER_2]] 
: [2], [0] : !llvm.ptr, !llvm.ptr) -> !llvm.ptr {name = "dtype", partial_map = 
true}
   %6 = omp.map.info var_ptr(%arg0 : 
!fir.ref,int:i32}>>,
 !fir.type<_QFderived_type{real:f32,array:!fir.array<10xi32>,int:i32}>) 
map_clauses(tofrom) capture(ByRef) members(%2, %5 : [2], [0] : !fir.ref, 
!fir.ref) -> 
!fir.ref,int:i32}>> 
{name = "dtype", partial_map = true}
   // CHECK: omp.target map_entries(%[[MAP_MEMBER_1]] -> %[[ARG_1:.*]], 
%[[MAP_MEMBER_2]] -> %[[ARG_2:.*]], %[[MAP_PARENT]] -> %[[ARG_3:.*]] : 
!llvm.ptr, !llvm.ptr, !llvm.ptr) {
   omp.target map_entries(%2 -> %arg1, %5 -> %arg2, %6 -> %arg3 : 
!fir.ref, !fir.ref, 
!fir.ref,int:i32}>>)
 {
@@ -1275,3 +1275,22 @@ func.func @map_nested_dtype_alloca_mem2(%arg0 : 
!fir.ref {
+omp.declare_mapper @my_mapper : !fir.type<_QFdeclare_mapperTmy_type{data:i32}> 
{
+// CHECK: ^bb0(%[[VAL_0:.*]]: !llvm.ptr):
+^bb0(%0: !fir.ref>):
+// CHECK:   %[[VAL_1:.*]] = llvm.mlir.constant(0 : i32) : i32
+  %1 = fir.field_index data, !fir.type<_QFdeclare_mapperTmy_type{data:i32}>
+// CHECK:   %[[VAL_2:.*]] = llvm.getelementptr %[[VAL_0]][0, 0] : 
(!llvm.ptr) -> !llvm.ptr, !llvm.struct<"_QFdeclare_mapperTmy_type", (i32)>
+  %2 = fir.coordinate_of %0, %1 : 
(!fir.ref>, !fir.field) -> 
!fir.ref
+// CHECK:   %[[VAL_3:.*]] = omp.map.info var_ptr(%[[VAL_2]] : 
!llvm.ptr, i32) map_clauses(tofrom) capture(ByRef) -> !llvm.ptr {name = 
"var%[[VAL_4:.*]]"}
+  %3 = omp.map.info var_ptr(%2 : !fir.ref, i32) map_clauses(tofrom) 
capture(ByRef) -> !fir.ref {name = "var%data"}
+// CHECK:   %[[VAL_5:.*]] = omp.map.info var_ptr(%[[VAL_0]] : 
!llvm.ptr, !llvm.struct<"_QFdeclare_mapperTmy_type", (i32)>) 
map_clauses(tofrom) capture(ByRef) members(%[[VAL_3]] : [0] : !llvm.ptr) -> 
!llvm.ptr {name = "var", partial_map = true}
+  %4 = omp.map.info var_ptr(%0 : 
!fir.ref>, 
!fir.type<_QFdeclare_mapperTmy_type{data:i32}>) map_clauses(tofrom) 
capture(ByRef) members(%3 : [0] : !fir.ref) -> 
!fir.ref> {name = "var", 
partial_map = true}
+// CHECK:   omp.declare_mapper_info map_entries(%[[VAL_5]], %[[VAL_3]] 
: !llvm.ptr, !llvm.ptr)
+  omp.declare_mappe

[llvm-branch-commits] [clang] [llvm] [mlir] [MLIR][OpenMP] Add LLVM translation support for OpenMP UserDefinedMappers (PR #124746)

2025-02-18 Thread Akash Banerjee via llvm-branch-commits


https://github.com/TIFitis updated 
https://github.com/llvm/llvm-project/pull/124746

>From 784c7fb6fb6c8dbbda2838fd2e5dee7b86129b5a Mon Sep 17 00:00:00 2001
From: Akash Banerjee 
Date: Tue, 28 Jan 2025 13:38:13 +
Subject: [PATCH 1/8] [MLIR][OpenMP] Add LLVM translation support for OpenMP
 UserDefinedMappers

This patch adds OpenMPToLLVMIRTranslation support for the OpenMP Declare Mapper 
directive.

Since both MLIR and Clang now support custom mappers, I've made the relative 
params required instead of optional as well.

Depends on #121005
---
 clang/lib/CodeGen/CGOpenMPRuntime.cpp |  11 +-
 .../llvm/Frontend/OpenMP/OMPIRBuilder.h   |  31 +--
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp |  70 +++---
 .../Frontend/OpenMPIRBuilderTest.cpp  |  46 ++--
 .../OpenMP/OpenMPToLLVMIRTranslation.cpp  | 215 +++---
 mlir/test/Target/LLVMIR/omptarget-llvm.mlir   | 117 ++
 .../fortran/target-custom-mapper.f90  |  46 
 7 files changed, 437 insertions(+), 99 deletions(-)
 create mode 100644 offload/test/offloading/fortran/target-custom-mapper.f90

diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index 9f7db25a15bec..0b322112a1076 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -,8 +,8 @@ static void emitOffloadingArraysAndArgs(
 return MFunc;
   };
   OMPBuilder.emitOffloadingArraysAndArgs(
-  AllocaIP, CodeGenIP, Info, Info.RTArgs, CombinedInfo, IsNonContiguous,
-  ForEndCall, DeviceAddrCB, CustomMapperCB);
+  AllocaIP, CodeGenIP, Info, Info.RTArgs, CombinedInfo, CustomMapperCB,
+  IsNonContiguous, ForEndCall, DeviceAddrCB);
 }
 
 /// Check for inner distribute directive.
@@ -9098,9 +9098,10 @@ void CGOpenMPRuntime::emitUserDefinedMapper(const 
OMPDeclareMapperDecl *D,
   CGM.getCXXABI().getMangleContext().mangleCanonicalTypeName(Ty, Out);
   std::string Name = getName({"omp_mapper", TyStr, D->getName()});
 
-  auto *NewFn = OMPBuilder.emitUserDefinedMapper(PrivatizeAndGenMapInfoCB,
- ElemTy, Name, CustomMapperCB);
-  UDMMap.try_emplace(D, NewFn);
+  llvm::Expected NewFn = OMPBuilder.emitUserDefinedMapper(
+  PrivatizeAndGenMapInfoCB, ElemTy, Name, CustomMapperCB);
+  assert(NewFn && "Unexpected error in emitUserDefinedMapper");
+  UDMMap.try_emplace(D, *NewFn);
   if (CGF)
 FunctionUDMMap[CGF->CurFn].push_back(D);
 }
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h 
b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index d25077cae63e4..151bd36aadaf0 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -2399,6 +2399,7 @@ class OpenMPIRBuilder {
CurInfo.NonContigInfo.Strides.end());
 }
   };
+  using MapInfosOrErrorTy = Expected;
 
   /// Callback function type for functions emitting the host fallback code that
   /// is executed when the kernel launch fails. It takes an insertion point as
@@ -2475,9 +2476,9 @@ class OpenMPIRBuilder {
   /// including base pointers, pointers, sizes, map types, user-defined 
mappers.
   void emitOffloadingArrays(
   InsertPointTy AllocaIP, InsertPointTy CodeGenIP, MapInfosTy 
&CombinedInfo,
-  TargetDataInfo &Info, bool IsNonContiguous = false,
-  function_ref DeviceAddrCB = nullptr,
-  function_ref CustomMapperCB = nullptr);
+  TargetDataInfo &Info, function_ref CustomMapperCB,
+  bool IsNonContiguous = false,
+  function_ref DeviceAddrCB = nullptr);
 
   /// Allocates memory for and populates the arrays required for offloading
   /// (offload_{baseptrs|ptrs|mappers|sizes|maptypes|mapnames}). Then, it
@@ -2488,9 +2489,9 @@ class OpenMPIRBuilder {
   void emitOffloadingArraysAndArgs(
   InsertPointTy AllocaIP, InsertPointTy CodeGenIP, TargetDataInfo &Info,
   TargetDataRTArgs &RTArgs, MapInfosTy &CombinedInfo,
+  function_ref CustomMapperCB,
   bool IsNonContiguous = false, bool ForEndCall = false,
-  function_ref DeviceAddrCB = nullptr,
-  function_ref CustomMapperCB = nullptr);
+  function_ref DeviceAddrCB = nullptr);
 
   /// Creates offloading entry for the provided entry ID \a ID, address \a
   /// Addr, size \a Size, and flags \a Flags.
@@ -2950,12 +2951,12 @@ class OpenMPIRBuilder {
   /// \param FuncName Optional param to specify mapper function name.
   /// \param CustomMapperCB Optional callback to generate code related to
   /// custom mappers.
-  Function *emitUserDefinedMapper(
-  function_ref
+  Expected emitUserDefinedMapper(
+  function_ref
   PrivAndGenMapInfoCB,
   llvm::Type *ElemTy, StringRef FuncName,
-  function_ref CustomMapperCB = nullptr);
+  function_ref CustomMapperCB);
 
   /// Generator for '#omp target data'
   ///
@@ -2969,21 +2970,21 @@ class OpenMPIRBuilder {
   /// \param IfCond Value which corresponds to the if clause

[llvm-branch-commits] [flang] [AMDGPU] Add missing gfx architectures to AddFlangOffloadRuntime.cmake (PR #125827)

2025-02-18 Thread Shilei Tian via llvm-branch-commits


https://github.com/shiltian approved this pull request.


https://github.com/llvm/llvm-project/pull/125827
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [HLSL] Implement default constant buffer `$Globals` (PR #125807)

2025-02-18 Thread Helena Kotas via llvm-branch-commits


https://github.com/hekota ready_for_review 
https://github.com/llvm/llvm-project/pull/125807
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/20.x: [clang] StmtPrinter: Handle DeclRefExpr to a Decomposition (#125001) (PR #126659)

2025-02-18 Thread Aaron Ballman via llvm-branch-commits


https://github.com/AaronBallman approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/126659
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [CodeGen][NewPM] Port RegAllocPriorityAdvisor analysis to NPM (PR #118462)

2025-02-18 Thread Akshat Oke via llvm-branch-commits


https://github.com/optimisan updated 
https://github.com/llvm/llvm-project/pull/118462

>From 6638037194a6ba920de27d08ffb33f599d601bd5 Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Tue, 3 Dec 2024 10:12:36 +
Subject: [PATCH 1/9] [CodeGen][NewPM] Port RegAllocPriorityAdvisor analysis to
 NPM

---
 .../llvm}/CodeGen/RegAllocPriorityAdvisor.h   |  78 +++-
 llvm/include/llvm/InitializePasses.h  |   2 +-
 .../llvm/Passes/MachinePassRegistry.def   |   1 +
 llvm/lib/CodeGen/MLRegAllocEvictAdvisor.cpp   |   6 +-
 .../lib/CodeGen/MLRegAllocPriorityAdvisor.cpp | 184 +++---
 llvm/lib/CodeGen/RegAllocEvictionAdvisor.cpp  |   2 +-
 llvm/lib/CodeGen/RegAllocGreedy.cpp   |   9 +-
 llvm/lib/CodeGen/RegAllocGreedy.h |   2 +-
 llvm/lib/CodeGen/RegAllocPriorityAdvisor.cpp  | 155 +++
 llvm/lib/Passes/PassBuilder.cpp   |   1 +
 10 files changed, 320 insertions(+), 120 deletions(-)
 rename llvm/{lib => include/llvm}/CodeGen/RegAllocPriorityAdvisor.h (57%)

diff --git a/llvm/lib/CodeGen/RegAllocPriorityAdvisor.h 
b/llvm/include/llvm/CodeGen/RegAllocPriorityAdvisor.h
similarity index 57%
rename from llvm/lib/CodeGen/RegAllocPriorityAdvisor.h
rename to llvm/include/llvm/CodeGen/RegAllocPriorityAdvisor.h
index 0758743c2b140..a53739fdc3fc4 100644
--- a/llvm/lib/CodeGen/RegAllocPriorityAdvisor.h
+++ b/llvm/include/llvm/CodeGen/RegAllocPriorityAdvisor.h
@@ -9,8 +9,10 @@
 #ifndef LLVM_CODEGEN_REGALLOCPRIORITYADVISOR_H
 #define LLVM_CODEGEN_REGALLOCPRIORITYADVISOR_H
 
+#include "llvm/CodeGen/MachineBasicBlock.h"
 #include "llvm/CodeGen/RegAllocEvictionAdvisor.h"
 #include "llvm/CodeGen/SlotIndexes.h"
+#include "llvm/IR/PassManager.h"
 #include "llvm/Pass.h"
 
 namespace llvm {
@@ -68,12 +70,72 @@ class DummyPriorityAdvisor : public RegAllocPriorityAdvisor 
{
   unsigned getPriority(const LiveInterval &LI) const override;
 };
 
-class RegAllocPriorityAdvisorAnalysis : public ImmutablePass {
+/// Common provider for getting the priority advisor and logging rewards.
+/// Legacy analysis forwards all calls to this provider.
+/// New analysis serves the provider as the analysis result.
+/// Expensive setup is done in the constructor, so that the advisor can be
+/// created quickly for every machine function.
+/// TODO: Remove once legacy PM support is dropped.
+class RegAllocPriorityAdvisorProvider {
 public:
   enum class AdvisorMode : int { Default, Release, Development, Dummy };
 
-  RegAllocPriorityAdvisorAnalysis(AdvisorMode Mode)
-  : ImmutablePass(ID), Mode(Mode){};
+  RegAllocPriorityAdvisorProvider(AdvisorMode Mode) : Mode(Mode) {}
+
+  virtual ~RegAllocPriorityAdvisorProvider() = default;
+
+  virtual void logRewardIfNeeded(const MachineFunction &MF,
+ llvm::function_ref GetReward) {};
+
+  virtual std::unique_ptr
+  getAdvisor(const MachineFunction &MF, const RAGreedy &RA) = 0;
+
+  void setAnalyses(SlotIndexes *SI) { this->SI = SI; }
+
+  AdvisorMode getAdvisorMode() const { return Mode; }
+
+protected:
+  SlotIndexes *SI;
+
+private:
+  const AdvisorMode Mode;
+};
+
+RegAllocPriorityAdvisorProvider *createReleaseModePriorityAdvisorProvider();
+
+RegAllocPriorityAdvisorProvider *
+createDevelopmentModePriorityAdvisorProvider(LLVMContext &Ctx);
+
+class RegAllocPriorityAdvisorAnalysis
+: public AnalysisInfoMixin {
+  static AnalysisKey Key;
+  friend AnalysisInfoMixin;
+
+public:
+  struct Result {
+// Owned by this analysis.
+RegAllocPriorityAdvisorProvider *Provider;
+
+bool invalidate(MachineFunction &MF, const PreservedAnalyses &PA,
+MachineFunctionAnalysisManager::Invalidator &Inv) {
+  auto PAC = PA.getChecker();
+  return !PAC.preservedWhenStateless() ||
+ Inv.invalidate(MF, PA);
+}
+  };
+
+  Result run(MachineFunction &MF, MachineFunctionAnalysisManager &MFAM);
+
+private:
+  void initializeProvider(LLVMContext &Ctx);
+  std::unique_ptr Provider;
+};
+
+class RegAllocPriorityAdvisorAnalysisLegacy : public ImmutablePass {
+public:
+  using AdvisorMode = RegAllocPriorityAdvisorProvider::AdvisorMode;
+  RegAllocPriorityAdvisorAnalysisLegacy(AdvisorMode Mode)
+  : ImmutablePass(ID), Mode(Mode) {};
   static char ID;
 
   /// Get an advisor for the given context (i.e. machine function, etc)
@@ -81,7 +143,7 @@ class RegAllocPriorityAdvisorAnalysis : public ImmutablePass 
{
   getAdvisor(const MachineFunction &MF, const RAGreedy &RA) = 0;
   AdvisorMode getAdvisorMode() const { return Mode; }
   virtual void logRewardIfNeeded(const MachineFunction &MF,
- llvm::function_ref GetReward){};
+ llvm::function_ref GetReward) {};
 
 protected:
   // This analysis preserves everything, and subclasses may have additional
@@ -97,11 +159,13 @@ class RegAllocPriorityAdvisorAnalysis : public 
ImmutablePass {
 
 /// Specialization for the API used by the analysis infrastructure to create
 /// an instance

[llvm-branch-commits] [llvm] [CodeGen][NewPM] Port RegAllocGreedy to NPM (PR #119540)

2025-02-18 Thread Akshat Oke via llvm-branch-commits


https://github.com/optimisan updated 
https://github.com/llvm/llvm-project/pull/119540

>From cc0367a02d043a96980843b1ea491779daa2d3c0 Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Wed, 11 Dec 2024 08:51:55 +
Subject: [PATCH 1/8] [CodeGen][NewPM] Port RegAllocGreedy to NPM

---
 llvm/include/llvm/CodeGen/MachineFunction.h   |   1 +
 llvm/include/llvm/CodeGen/Passes.h|   2 +-
 llvm/include/llvm/InitializePasses.h  |   2 +-
 .../llvm/Passes/MachinePassRegistry.def   |   9 +
 llvm/lib/CodeGen/CodeGen.cpp  |   2 +-
 llvm/lib/CodeGen/RegAllocGreedy.cpp   | 185 ++
 llvm/lib/CodeGen/RegAllocGreedy.h |  57 +++---
 llvm/lib/Passes/PassBuilder.cpp   |   1 +
 8 files changed, 196 insertions(+), 63 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/MachineFunction.h 
b/llvm/include/llvm/CodeGen/MachineFunction.h
index f1e595cde54e3..7fd0994883fe8 100644
--- a/llvm/include/llvm/CodeGen/MachineFunction.h
+++ b/llvm/include/llvm/CodeGen/MachineFunction.h
@@ -927,6 +927,7 @@ class LLVM_ABI MachineFunction {
 
   /// Run the current MachineFunction through the machine code verifier, useful
   /// for debugger use.
+  /// TODO: Add the param LiveStks
   /// \returns true if no problems were found.
   bool verify(LiveIntervals *LiveInts, SlotIndexes *Indexes,
   const char *Banner = nullptr, raw_ostream *OS = nullptr,
diff --git a/llvm/include/llvm/CodeGen/Passes.h 
b/llvm/include/llvm/CodeGen/Passes.h
index b5d2a7e6bf035..0182f21bee5f5 100644
--- a/llvm/include/llvm/CodeGen/Passes.h
+++ b/llvm/include/llvm/CodeGen/Passes.h
@@ -171,7 +171,7 @@ namespace llvm {
   extern char &LiveRangeShrinkID;
 
   /// Greedy register allocator.
-  extern char &RAGreedyID;
+  extern char &RAGreedyLegacyID;
 
   /// Basic register allocator.
   extern char &RABasicID;
diff --git a/llvm/include/llvm/InitializePasses.h 
b/llvm/include/llvm/InitializePasses.h
index 5b30eb53208a8..69c9e14541907 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -248,7 +248,7 @@ void 
initializeProfileSummaryInfoWrapperPassPass(PassRegistry &);
 void initializePromoteLegacyPassPass(PassRegistry &);
 void initializeRABasicPass(PassRegistry &);
 void initializePseudoProbeInserterPass(PassRegistry &);
-void initializeRAGreedyPass(PassRegistry &);
+void initializeRAGreedyLegacyPass(PassRegistry &);
 void initializeReachingDefAnalysisPass(PassRegistry &);
 void initializeReassociateLegacyPassPass(PassRegistry &);
 void initializeRegAllocEvictionAdvisorAnalysisLegacyPass(PassRegistry &);
diff --git a/llvm/include/llvm/Passes/MachinePassRegistry.def 
b/llvm/include/llvm/Passes/MachinePassRegistry.def
index 373bd047e2395..78b4c8153e26b 100644
--- a/llvm/include/llvm/Passes/MachinePassRegistry.def
+++ b/llvm/include/llvm/Passes/MachinePassRegistry.def
@@ -194,6 +194,15 @@ MACHINE_FUNCTION_PASS_WITH_PARAMS(
   return parseRegAllocFastPassOptions(*PB, Params);
 },
 "filter=reg-filter;no-clear-vregs")
+
+MACHINE_FUNCTION_PASS_WITH_PARAMS(
+"regallocgreedy", "RAGreedy",
+[](RegAllocFilterFunc F) { return RAGreedyPass(F); },
+[PB = this](StringRef Params) {
+  // TODO: parseRegAllocFilter(*PB, Params);
+  return Expected(nullptr);
+}, ""
+)
 #undef MACHINE_FUNCTION_PASS_WITH_PARAMS
 
 // After a pass is converted to new pass manager, its entry should be moved 
from
diff --git a/llvm/lib/CodeGen/CodeGen.cpp b/llvm/lib/CodeGen/CodeGen.cpp
index 35df2a479a545..21f76bdb2ad6b 100644
--- a/llvm/lib/CodeGen/CodeGen.cpp
+++ b/llvm/lib/CodeGen/CodeGen.cpp
@@ -112,7 +112,7 @@ void llvm::initializeCodeGen(PassRegistry &Registry) {
   initializePreISelIntrinsicLoweringLegacyPassPass(Registry);
   initializeProcessImplicitDefsPass(Registry);
   initializeRABasicPass(Registry);
-  initializeRAGreedyPass(Registry);
+  initializeRAGreedyLegacyPass(Registry);
   initializeRegAllocFastPass(Registry);
   initializeRegUsageInfoCollectorLegacyPass(Registry);
   initializeRegUsageInfoPropagationLegacyPass(Registry);
diff --git a/llvm/lib/CodeGen/RegAllocGreedy.cpp 
b/llvm/lib/CodeGen/RegAllocGreedy.cpp
index bd81d630f9d1f..f4cc80c751350 100644
--- a/llvm/lib/CodeGen/RegAllocGreedy.cpp
+++ b/llvm/lib/CodeGen/RegAllocGreedy.cpp
@@ -43,8 +43,10 @@
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/MachineOperand.h"
 #include "llvm/CodeGen/MachineOptimizationRemarkEmitter.h"
+#include "llvm/CodeGen/MachinePassManager.h"
 #include "llvm/CodeGen/MachineRegisterInfo.h"
 #include "llvm/CodeGen/RegAllocEvictionAdvisor.h"
+#include "llvm/CodeGen/RegAllocGreedyPass.h"
 #include "llvm/CodeGen/RegAllocPriorityAdvisor.h"
 #include "llvm/CodeGen/RegAllocRegistry.h"
 #include "llvm/CodeGen/RegisterClassInfo.h"
@@ -55,6 +57,7 @@
 #include "llvm/CodeGen/TargetRegisterInfo.h"
 #include "llvm/CodeGen/TargetSubtargetInfo.h"
 #include "llvm/CodeGen/VirtRegMap.h"
+#include "llvm/IR/Analysis.h"
 #include "llvm/IR/Debug

[llvm-branch-commits] [clang] release/20.x: [clang] Handle f(no-)strict-overflow, f(no-)wrapv, f(no-)wrapv-pointer like gcc (#126524) (PR #126535)

2025-02-18 Thread Vlad Serebrennikov via llvm-branch-commits


Endilll wrote:

@nico I think that the release note from #122486 doesn't cover changes in this 
PR. Can you expand existing release notes or add another one, which explains 
how we handle combination of those driver arguments?

https://github.com/llvm/llvm-project/pull/126535
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/20.x: [clang] Fix false positive regression for lifetime analysis warning. (#127460) (PR #127618)

2025-02-18 Thread Utkarsh Saxena via llvm-branch-commits


https://github.com/usx95 approved this pull request.


https://github.com/llvm/llvm-project/pull/127618
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [HLSL] Implement default constant buffer `$Globals` (PR #125807)

2025-02-18 Thread Helena Kotas via llvm-branch-commits



@@ -286,10 +286,7 @@ void CGHLSLRuntime::emitBufferGlobalsAndMetadata(const 
HLSLBufferDecl *BufDecl,
   .str( &&
"layout type does not match the converted element type");
 
-// there might be resources inside the used defined structs
-if (VDTy->isStructureType() && VDTy->isHLSLIntangibleType())
-  // FIXME: handle resources in cbuffer structs
-  llvm_unreachable("resources in cbuffer are not supported yet");
+// FIXME: handle resources in cbuffer user-defined structs

hekota wrote:

Yes, this is a future work (llvm/wg-hlsl#175). I've removed the 
`llvm_unreachable` because we use it in existing tests.

https://github.com/llvm/llvm-project/pull/125807
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [HLSL] Implement default constant buffer `$Globals` (PR #125807)

2025-02-18 Thread Helena Kotas via llvm-branch-commits


https://github.com/hekota updated 
https://github.com/llvm/llvm-project/pull/125807

>From 42bb34f66f0030f55e1055c4ee0b362511b7f45b Mon Sep 17 00:00:00 2001
From: Helena Kotas 
Date: Tue, 4 Feb 2025 22:01:49 -0800
Subject: [PATCH 1/5] [HLSL] Implement default constant buffer `$Globals`

All variable declarations in the global scope that are not resources, static or 
empty are implicitly added to implicit constant buffer `$Globals`.

Fixes #123801
---
 clang/include/clang/AST/Decl.h  | 22 +++
 clang/include/clang/Sema/SemaHLSL.h |  7 ++-
 clang/lib/AST/Decl.cpp  | 41 -
 clang/lib/CodeGen/CGHLSLRuntime.cpp |  7 +--
 clang/lib/CodeGen/CodeGenModule.cpp |  5 ++
 clang/lib/Sema/Sema.cpp |  3 +-
 clang/lib/Sema/SemaHLSL.cpp | 47 +--
 clang/test/AST/HLSL/default_cbuffer.hlsl| 50 
 clang/test/CodeGenHLSL/basic_types.hlsl | 64 ++---
 clang/test/CodeGenHLSL/default_cbuffer.hlsl | 43 ++
 10 files changed, 242 insertions(+), 47 deletions(-)
 create mode 100644 clang/test/AST/HLSL/default_cbuffer.hlsl
 create mode 100644 clang/test/CodeGenHLSL/default_cbuffer.hlsl

diff --git a/clang/include/clang/AST/Decl.h b/clang/include/clang/AST/Decl.h
index 05e56978977f2..f86ddaf89bd9c 100644
--- a/clang/include/clang/AST/Decl.h
+++ b/clang/include/clang/AST/Decl.h
@@ -5038,6 +5038,11 @@ class HLSLBufferDecl final : public NamedDecl, public 
DeclContext {
   // LayoutStruct - Layout struct for the buffer
   CXXRecordDecl *LayoutStruct;
 
+  // For default (implicit) constant buffer, a lisf of references of global
+  // decls that belong to the buffer. The decls are already parented by the
+  // translation unit context.
+  SmallVector DefaultBufferDecls;
+
   HLSLBufferDecl(DeclContext *DC, bool CBuffer, SourceLocation KwLoc,
  IdentifierInfo *ID, SourceLocation IDLoc,
  SourceLocation LBrace);
@@ -5047,6 +5052,8 @@ class HLSLBufferDecl final : public NamedDecl, public 
DeclContext {
 bool CBuffer, SourceLocation KwLoc,
 IdentifierInfo *ID, SourceLocation IDLoc,
 SourceLocation LBrace);
+  static HLSLBufferDecl *CreateDefaultCBuffer(ASTContext &C,
+  DeclContext *LexicalParent);
   static HLSLBufferDecl *CreateDeserialized(ASTContext &C, GlobalDeclID ID);
 
   SourceRange getSourceRange() const override LLVM_READONLY {
@@ -5061,6 +5068,7 @@ class HLSLBufferDecl final : public NamedDecl, public 
DeclContext {
   bool hasPackoffset() const { return HasPackoffset; }
   const CXXRecordDecl *getLayoutStruct() const { return LayoutStruct; }
   void addLayoutStruct(CXXRecordDecl *LS);
+  void addDefaultBufferDecl(Decl *D);
 
   // Implement isa/cast/dyncast/etc.
   static bool classof(const Decl *D) { return classofKind(D->getKind()); }
@@ -5072,6 +5080,20 @@ class HLSLBufferDecl final : public NamedDecl, public 
DeclContext {
 return static_cast(const_cast(DC));
   }
 
+  // Iterator for the buffer decls. Concatenates the list of decls parented
+  // by this HLSLBufferDecl with the list of default buffer decls.
+  using buffer_decl_iterator =
+  llvm::concat_iterator::const_iterator,
+decl_iterator>;
+  using buffer_decl_range = llvm::iterator_range;
+
+  buffer_decl_range buffer_decls() const {
+return buffer_decl_range(buffer_decls_begin(), buffer_decls_end());
+  }
+  buffer_decl_iterator buffer_decls_begin() const;
+  buffer_decl_iterator buffer_decls_end() const;
+  bool buffer_decls_empty();
+
   friend class ASTDeclReader;
   friend class ASTDeclWriter;
 };
diff --git a/clang/include/clang/Sema/SemaHLSL.h 
b/clang/include/clang/Sema/SemaHLSL.h
index f4cd11f423a84..b1cc856975532 100644
--- a/clang/include/clang/Sema/SemaHLSL.h
+++ b/clang/include/clang/Sema/SemaHLSL.h
@@ -103,13 +103,13 @@ class SemaHLSL : public SemaBase {
  HLSLParamModifierAttr::Spelling Spelling);
   void ActOnTopLevelFunction(FunctionDecl *FD);
   void ActOnVariableDeclarator(VarDecl *VD);
+  void ActOnEndOfTranslationUnit(TranslationUnitDecl *TU);
   void CheckEntryPoint(FunctionDecl *FD);
   void CheckSemanticAnnotation(FunctionDecl *EntryPoint, const Decl *Param,
const HLSLAnnotationAttr *AnnotationAttr);
   void DiagnoseAttrStageMismatch(
   const Attr *A, llvm::Triple::EnvironmentType Stage,
   std::initializer_list AllowedStages);
-  void DiagnoseAvailabilityViolations(TranslationUnitDecl *TU);
 
   QualType handleVectorBinOpConversion(ExprResult &LHS, ExprResult &RHS,
QualType LHSType, QualType RHSType,
@@ -159,11 +159,16 @@ class SemaHLSL : public SemaBase {
   // List of all resource bindings
   ResourceBindings Bindings;
 
+  // default constant buffer $Globals
+  HLSL

[llvm-branch-commits] [clang] [HLSL] Implement default constant buffer `$Globals` (PR #125807)

2025-02-18 Thread Helena Kotas via llvm-branch-commits


https://github.com/hekota edited 
https://github.com/llvm/llvm-project/pull/125807
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libcxx] 456bf5e - Revert "[libc++] Add Hagenberg issues and papers to the Status pages (#127510)"

2025-02-18 Thread via llvm-branch-commits


Author: Louis Dionne
Date: 2025-02-18T15:09:43-05:00
New Revision: 456bf5e2d348cb739038265c0310c8bc0322869b

URL: 
https://github.com/llvm/llvm-project/commit/456bf5e2d348cb739038265c0310c8bc0322869b
DIFF: 
https://github.com/llvm/llvm-project/commit/456bf5e2d348cb739038265c0310c8bc0322869b.diff

LOG: Revert "[libc++] Add Hagenberg issues and papers to the Status pages 
(#127510)"

This reverts commit 3a00c428d903c5857eae83cb5e9dab73614c5ddb.

Added: 


Modified: 
libcxx/docs/Status/Cxx2cIssues.csv
libcxx/docs/Status/Cxx2cPapers.csv

Removed: 




diff  --git a/libcxx/docs/Status/Cxx2cIssues.csv 
b/libcxx/docs/Status/Cxx2cIssues.csv
index 1ec23dfabd5ea..45faea0568b2e 100644
--- a/libcxx/docs/Status/Cxx2cIssues.csv
+++ b/libcxx/docs/Status/Cxx2cIssues.csv
@@ -111,16 +111,6 @@
 "`LWG4169 `__","``std::atomic``'s default 
constructor should be constrained","2024-11 (Wrocław)","","",""
 "`LWG4170 `__","``contiguous_iterator`` should 
require ``to_address(I{})``","2024-11 (Wrocław)","","",""
 "","","","","",""
-"`LWG3578 `__","Iterator SCARYness in the context of 
associative container merging","2025-02 (Hagenberg)","","",""
-"`LWG3956 `__","``chrono::parse`` uses ``from_stream`` 
as a customization point","2025-02 (Hagenberg)","","",""
-"`LWG4172 `__","``unique_lock`` self-move-assignment 
is broken","2025-02 (Hagenberg)","","",""
-"`LWG4175 `__","``get_env()`` specified in terms of 
``as_const()`` but this doesn't work with rvalue senders","2025-02 
(Hagenberg)","","",""
-"`LWG4179 `__","Wrong range in 
``[alg.search]``","2025-02 (Hagenberg)","","",""
-"`LWG4186 `__","``regex_traits::transform_primary`` 
mistakenly detects ``typeid`` of a function","2025-02 (Hagenberg)","","",""
-"`LWG4189 `__","``cache_latest_view`` should be 
freestanding","2025-02 (Hagenberg)","","",""
-"`LWG4191 `__","P1467 changed the return type of 
``pow(complex, int)``","2025-02 (Hagenberg)","","",""
-"`LWG4196 `__","Complexity of ``inplace_merge()`` is 
incorrect","2025-02 (Hagenberg)","","",""
-"","","","","",""
 "`LWG3343 `__","Ordering of calls to ``unlock()`` 
and ``notify_all()`` in Effects element of ``notify_all_at_thread_exit()`` 
should be reversed","Not Adopted Yet","|Complete|","16",""
 "`LWG4139 `__","§[time.zone.leap] recursive 
constraint in <=>","Not Adopted Yet","|Complete|","20",""
 "`LWG3456 `__","Pattern used by std::from_chars is 
underspecified (option B)","Not Adopted Yet","|Complete|","20",""

diff  --git a/libcxx/docs/Status/Cxx2cPapers.csv 
b/libcxx/docs/Status/Cxx2cPapers.csv
index 1436db6cf2b45..b2bb1d6e9d6c3 100644
--- a/libcxx/docs/Status/Cxx2cPapers.csv
+++ b/libcxx/docs/Status/Cxx2cPapers.csv
@@ -79,6 +79,7 @@
 "`P3136R1 `__","Retiring niebloids","2024-11 
(Wrocław)","","",""
 "`P3138R5 `__","``views::cache_latest``","2024-11 
(Wrocław)","","",""
 "`P3379R0 `__","Constrain ``std::expected`` 
equality operators","2024-11 (Wrocław)","","",""
+"`P0472R2 `__","Put ``std::monostate`` in 
","2024-11 (Wrocław)","","",""
 "`P2862R1 `__","``text_encoding::name()`` should 
never return null values","2024-11 (Wrocław)","","",""
 "`P2897R7 `__","``aligned_accessor``: An ``mdspan`` 
accessor expressing pointer over-alignment","2024-11 (Wrocław)","","",""
 "`P3355R1 `__","Fix ``submdspan`` for 
C++26","2024-11 (Wrocław)","","",""
@@ -91,29 +92,9 @@
 "`P3369R0 `__","constexpr for 
``uninitialized_default_construct``","2024-11 (Wrocław)","","",""
 "`P3370R1 `__","Add new library headers from 
C23","2024-11 (Wrocław)","","",""
 "`P3309R3 `__","constexpr ``atomic`` and 
``atomic_ref``","2024-11 (Wrocław)","","",""
+"`P3019R11 `__","``indirect`` and ``polymorphic``: 
Vocabulary Types for Composite Class Design","2024-11 (Wrocław)","","",""
 "`P1928R15 `__","``std::simd`` — merge 
data-parallel types from the Parallelism TS 2","2024-11 (Wrocław)","","",""
 "`P3325R5 `__","A Utility for Creating Execution 
Environments","2024-11 (Wrocław)","","",""
 "`P3068R6 `__","Allowing exception throwing in 
constant-evaluation","2024-11 (Wrocław)","","",""
 "`P3247R2 `__","Deprecate the notion of trivial 
types","2024-11 (Wrocław)","","",""
 "","","","","",""
-"`P3074R7

[llvm-branch-commits] [clang] [HLSL] Implement default constant buffer `$Globals` (PR #125807)

2025-02-18 Thread Helena Kotas via llvm-branch-commits



@@ -5753,6 +5765,30 @@ void HLSLBufferDecl::addLayoutStruct(CXXRecordDecl *LS) {
   addDecl(LS);
 }
 
+void HLSLBufferDecl::addDefaultBufferDecl(Decl *D) {
+  assert(isImplicit() &&
+ "default decls can only be added to the implicit/default constant "
+ "buffer $Globals");
+  DefaultBufferDecls.push_back(D);
+}
+
+HLSLBufferDecl::buffer_decl_iterator
+HLSLBufferDecl::buffer_decls_begin() const {
+  return buffer_decl_iterator(llvm::iterator_range(DefaultBufferDecls.begin(),
+   DefaultBufferDecls.end()),
+  decl_range(decls_begin(), decls_end()));
+}
+
+HLSLBufferDecl::buffer_decl_iterator HLSLBufferDecl::buffer_decls_end() const {
+  return buffer_decl_iterator(
+  llvm::iterator_range(DefaultBufferDecls.end(), DefaultBufferDecls.end()),

hekota wrote:

Yes, it signifies the end of the list. `buffer_decl_iterator` is a 
`llvm::concat_iterator` and it takes a range.

https://github.com/llvm/llvm-project/pull/125807
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] [mlir] [MLIR][OpenMP] Add LLVM translation support for OpenMP UserDefinedMappers (PR #124746)

2025-02-18 Thread Sergio Afonso via llvm-branch-commits


https://github.com/skatrak approved this pull request.


https://github.com/llvm/llvm-project/pull/124746
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [mlir] [MLIR][OpenMP] Add OMP Mapper field to MapInfoOp (PR #120994)

2025-02-18 Thread Akash Banerjee via llvm-branch-commits


https://github.com/TIFitis updated 
https://github.com/llvm/llvm-project/pull/120994

>From e70f3c9aed18b33c07e60f3314722825031d5ed1 Mon Sep 17 00:00:00 2001
From: Akash Banerjee 
Date: Mon, 23 Dec 2024 20:53:47 +
Subject: [PATCH 1/2] Add mapper field to mapInfoOp.

---
 flang/lib/Lower/OpenMP/Utils.cpp| 3 ++-
 flang/lib/Lower/OpenMP/Utils.h  | 3 ++-
 flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp  | 5 -
 flang/lib/Optimizer/OpenMP/MapsForPrivatizedSymbols.cpp | 1 +
 mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td   | 2 ++
 mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp| 2 +-
 mlir/test/Dialect/OpenMP/ops.mlir   | 4 ++--
 7 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/flang/lib/Lower/OpenMP/Utils.cpp b/flang/lib/Lower/OpenMP/Utils.cpp
index 35722fa7d1b12..fa1975dac789b 100644
--- a/flang/lib/Lower/OpenMP/Utils.cpp
+++ b/flang/lib/Lower/OpenMP/Utils.cpp
@@ -125,7 +125,7 @@ createMapInfoOp(fir::FirOpBuilder &builder, mlir::Location 
loc,
 llvm::ArrayRef members,
 mlir::ArrayAttr membersIndex, uint64_t mapType,
 mlir::omp::VariableCaptureKind mapCaptureType, mlir::Type 
retTy,
-bool partialMap) {
+bool partialMap, mlir::FlatSymbolRefAttr mapperId) {
   if (auto boxTy = llvm::dyn_cast(baseAddr.getType())) {
 baseAddr = builder.create(loc, baseAddr);
 retTy = baseAddr.getType();
@@ -144,6 +144,7 @@ createMapInfoOp(fir::FirOpBuilder &builder, mlir::Location 
loc,
   mlir::omp::MapInfoOp op = builder.create(
   loc, retTy, baseAddr, varType, varPtrPtr, members, membersIndex, bounds,
   builder.getIntegerAttr(builder.getIntegerType(64, false), mapType),
+  mapperId,
   builder.getAttr(mapCaptureType),
   builder.getStringAttr(name), builder.getBoolAttr(partialMap));
   return op;
diff --git a/flang/lib/Lower/OpenMP/Utils.h b/flang/lib/Lower/OpenMP/Utils.h
index f2e378443e5f2..3943eb633b04e 100644
--- a/flang/lib/Lower/OpenMP/Utils.h
+++ b/flang/lib/Lower/OpenMP/Utils.h
@@ -116,7 +116,8 @@ createMapInfoOp(fir::FirOpBuilder &builder, mlir::Location 
loc,
 llvm::ArrayRef members,
 mlir::ArrayAttr membersIndex, uint64_t mapType,
 mlir::omp::VariableCaptureKind mapCaptureType, mlir::Type 
retTy,
-bool partialMap = false);
+bool partialMap = false,
+mlir::FlatSymbolRefAttr mapperId = mlir::FlatSymbolRefAttr());
 
 void insertChildMapInfoIntoParent(
 Fortran::lower::AbstractConverter &converter,
diff --git a/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp 
b/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
index e7c1d1d9d560f..beea7543e54b3 100644
--- a/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
+++ b/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
@@ -184,6 +184,7 @@ class MapInfoFinalizationPass
 /*members=*/mlir::SmallVector{},
 /*membersIndex=*/mlir::ArrayAttr{}, bounds,
 builder.getIntegerAttr(builder.getIntegerType(64, false), mapType),
+/*mapperId*/ mlir::FlatSymbolRefAttr(),
 builder.getAttr(
 mlir::omp::VariableCaptureKind::ByRef),
 /*name=*/builder.getStringAttr(""),
@@ -329,7 +330,8 @@ class MapInfoFinalizationPass
 builder.getIntegerAttr(
 builder.getIntegerType(64, false),
 getDescriptorMapType(op.getMapType().value_or(0), target)),
-op.getMapCaptureTypeAttr(), op.getNameAttr(),
+/*mapperId*/ mlir::FlatSymbolRefAttr(), op.getMapCaptureTypeAttr(),
+op.getNameAttr(),
 /*partial_map=*/builder.getBoolAttr(false));
 op.replaceAllUsesWith(newDescParentMapOp.getResult());
 op->erase();
@@ -623,6 +625,7 @@ class MapInfoFinalizationPass
   /*members=*/mlir::ValueRange{},
   /*members_index=*/mlir::ArrayAttr{},
   /*bounds=*/bounds, op.getMapTypeAttr(),
+  /*mapperId*/ mlir::FlatSymbolRefAttr(),
   builder.getAttr(
   mlir::omp::VariableCaptureKind::ByRef),
   builder.getStringAttr(op.getNameAttr().strref() + "." +
diff --git a/flang/lib/Optimizer/OpenMP/MapsForPrivatizedSymbols.cpp 
b/flang/lib/Optimizer/OpenMP/MapsForPrivatizedSymbols.cpp
index 963ae863c1fc5..97ea463a3c495 100644
--- a/flang/lib/Optimizer/OpenMP/MapsForPrivatizedSymbols.cpp
+++ b/flang/lib/Optimizer/OpenMP/MapsForPrivatizedSymbols.cpp
@@ -91,6 +91,7 @@ class MapsForPrivatizedSymbolsPass
 /*bounds=*/ValueRange{},
 builder.getIntegerAttr(builder.getIntegerType(64, /*isSigned=*/false),
mapTypeTo),
+/*mapperId*/ mlir::FlatSymbolRefAttr(),
 builder.getAttr(
 omp::VariableCaptureKind::ByRef),
 StringAttr(), builder.getBoolAttr(false));
diff --git a/mlir/include/mlir/

[llvm-branch-commits] [clang] release/20.x: [TBAA] Don't emit pointer-tbaa for void pointers. (#122116) (PR #125206)

2025-02-18 Thread Aaron Ballman via llvm-branch-commits


AaronBallman wrote:

> Without the pointer-tbaa changes (which is new on by default in Clang 20), we 
> would always generate `any pointer`. Without this fix, we will generate 
> different tags for different `void` pointer depths.
> 
> With this fix, we will generate `any pointer` again for `void` pointers. I 
> think this should be very safe to take.

Okay, thank you for the explanation! I'm okay with this (hopefully early rc 
testing will shake out any serious issues if there are any).

https://github.com/llvm/llvm-project/pull/125206
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/20.x: [TBAA] Don't emit pointer-tbaa for void pointers. (#122116) (PR #125206)

2025-02-18 Thread Aaron Ballman via llvm-branch-commits


https://github.com/AaronBallman approved this pull request.

LGTM!

https://github.com/llvm/llvm-project/pull/125206
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libcxx] release/20.x: [libc++] Avoid including on arbitrary platforms (#125587) (PR #127310)

2025-02-18 Thread Louis Dionne via llvm-branch-commits


ldionne wrote:

Thanks for the heads up. I think we should merge this but also cherry-pick 
https://github.com/llvm/llvm-project/pull/127691 into LLVM 20 to fix that issue.

https://github.com/llvm/llvm-project/pull/127310
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [HLSL] Implement default constant buffer `$Globals` (PR #125807)

2025-02-18 Thread Helena Kotas via llvm-branch-commits



@@ -5072,6 +5080,20 @@ class HLSLBufferDecl final : public NamedDecl, public 
DeclContext {
 return static_cast(const_cast(DC));
   }
 
+  // Iterator for the buffer decls. Concatenates the list of decls parented

hekota wrote:

That is correct. We are not depending on the order, but I like to put default 
buffer decls first and the children decls second because the children decls 
will include an implicit buffer layout struct decl that is created last. I will 
update the comment to make it clearer. 

https://github.com/llvm/llvm-project/pull/125807
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libcxx] release/20.x: Revert "[libc++] Reduce std::conjunction overhead (#124259)" (PR #127677)

2025-02-18 Thread Louis Dionne via llvm-branch-commits


https://github.com/ldionne approved this pull request.

LGTM but we need CI to be green

https://github.com/llvm/llvm-project/pull/127677
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libcxx] release/20.x: Revert "[libc++] Reduce std::conjunction overhead (#124259)" (PR #127677)

2025-02-18 Thread via llvm-branch-commits


https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/127677

Backport 0227396417d4625bc93affdd8957ff8d90c76299

Requested by: @philnik777

>From 5145297a1e4d814933a80037abcb49373628e3d2 Mon Sep 17 00:00:00 2001
From: Nikolas Klauser 
Date: Fri, 7 Feb 2025 15:40:16 +0100
Subject: [PATCH] Revert "[libc++] Reduce std::conjunction overhead (#124259)"

It turns out that the new implementation takes significantly more stack
memory for some reason.

This reverts commit 2696e4fb9567d23ce065a067e7f4909b310daf50.

(cherry picked from commit 0227396417d4625bc93affdd8957ff8d90c76299)
---
 libcxx/include/__type_traits/conjunction.h | 42 --
 1 file changed, 24 insertions(+), 18 deletions(-)

diff --git a/libcxx/include/__type_traits/conjunction.h 
b/libcxx/include/__type_traits/conjunction.h
index 6b6717a50a468..ad9656acd47ec 100644
--- a/libcxx/include/__type_traits/conjunction.h
+++ b/libcxx/include/__type_traits/conjunction.h
@@ -10,6 +10,8 @@
 #define _LIBCPP___TYPE_TRAITS_CONJUNCTION_H
 
 #include <__config>
+#include <__type_traits/conditional.h>
+#include <__type_traits/enable_if.h>
 #include <__type_traits/integral_constant.h>
 #include <__type_traits/is_same.h>
 
@@ -19,29 +21,22 @@
 
 _LIBCPP_BEGIN_NAMESPACE_STD
 
-template 
-struct _AndImpl;
+template 
+using __expand_to_true _LIBCPP_NODEBUG = true_type;
 
-template <>
-struct _AndImpl {
-  template 
-  using _Result _LIBCPP_NODEBUG =
-  typename _AndImpl::template _Result<_First, _Rest...>;
-};
+template 
+__expand_to_true<__enable_if_t<_Pred::value>...> __and_helper(int);
 
-template <>
-struct _AndImpl {
-  template 
-  using _Result _LIBCPP_NODEBUG = _Res;
-};
+template 
+false_type __and_helper(...);
 
 // _And always performs lazy evaluation of its arguments.
 //
 // However, `_And<_Pred...>` itself will evaluate its result immediately 
(without having to
 // be instantiated) since it is an alias, unlike `conjunction<_Pred...>`, 
which is a struct.
 // If you want to defer the evaluation of `_And<_Pred...>` itself, use 
`_Lazy<_And, _Pred...>`.
-template 
-using _And _LIBCPP_NODEBUG = typename _AndImpl::template _Result;
+template 
+using _And _LIBCPP_NODEBUG = decltype(std::__and_helper<_Pred...>(0));
 
 template 
 struct __all_dummy;
@@ -51,11 +46,22 @@ struct __all : _IsSame<__all_dummy<_Pred...>, 
__all_dummy<((void)_Pred, true)...
 
 #if _LIBCPP_STD_VER >= 17
 
-template 
-struct _LIBCPP_NO_SPECIALIZATIONS conjunction : _And<_Args...> {};
+template 
+struct _LIBCPP_NO_SPECIALIZATIONS conjunction : true_type {};
+
+_LIBCPP_DIAGNOSTIC_PUSH
+#  if __has_warning("-Winvalid-specialization")
+_LIBCPP_CLANG_DIAGNOSTIC_IGNORED("-Winvalid-specialization")
+#  endif
+template 
+struct conjunction<_Arg> : _Arg {};
+
+template 
+struct conjunction<_Arg, _Args...> : conditional_t> {};
+_LIBCPP_DIAGNOSTIC_POP
 
 template 
-_LIBCPP_NO_SPECIALIZATIONS inline constexpr bool conjunction_v = 
_And<_Args...>::value;
+_LIBCPP_NO_SPECIALIZATIONS inline constexpr bool conjunction_v = 
conjunction<_Args...>::value;
 
 #endif // _LIBCPP_STD_VER >= 17
 

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libcxx] release/20.x: Revert "[libc++] Reduce std::conjunction overhead (#124259)" (PR #127677)

2025-02-18 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-libcxx

Author: None (llvmbot)


Changes

Backport 0227396417d4625bc93affdd8957ff8d90c76299

Requested by: @philnik777

---
Full diff: https://github.com/llvm/llvm-project/pull/127677.diff


1 Files Affected:

- (modified) libcxx/include/__type_traits/conjunction.h (+24-18) 


``diff
diff --git a/libcxx/include/__type_traits/conjunction.h 
b/libcxx/include/__type_traits/conjunction.h
index 6b6717a50a468..ad9656acd47ec 100644
--- a/libcxx/include/__type_traits/conjunction.h
+++ b/libcxx/include/__type_traits/conjunction.h
@@ -10,6 +10,8 @@
 #define _LIBCPP___TYPE_TRAITS_CONJUNCTION_H
 
 #include <__config>
+#include <__type_traits/conditional.h>
+#include <__type_traits/enable_if.h>
 #include <__type_traits/integral_constant.h>
 #include <__type_traits/is_same.h>
 
@@ -19,29 +21,22 @@
 
 _LIBCPP_BEGIN_NAMESPACE_STD
 
-template 
-struct _AndImpl;
+template 
+using __expand_to_true _LIBCPP_NODEBUG = true_type;
 
-template <>
-struct _AndImpl {
-  template 
-  using _Result _LIBCPP_NODEBUG =
-  typename _AndImpl::template _Result<_First, _Rest...>;
-};
+template 
+__expand_to_true<__enable_if_t<_Pred::value>...> __and_helper(int);
 
-template <>
-struct _AndImpl {
-  template 
-  using _Result _LIBCPP_NODEBUG = _Res;
-};
+template 
+false_type __and_helper(...);
 
 // _And always performs lazy evaluation of its arguments.
 //
 // However, `_And<_Pred...>` itself will evaluate its result immediately 
(without having to
 // be instantiated) since it is an alias, unlike `conjunction<_Pred...>`, 
which is a struct.
 // If you want to defer the evaluation of `_And<_Pred...>` itself, use 
`_Lazy<_And, _Pred...>`.
-template 
-using _And _LIBCPP_NODEBUG = typename _AndImpl::template _Result;
+template 
+using _And _LIBCPP_NODEBUG = decltype(std::__and_helper<_Pred...>(0));
 
 template 
 struct __all_dummy;
@@ -51,11 +46,22 @@ struct __all : _IsSame<__all_dummy<_Pred...>, 
__all_dummy<((void)_Pred, true)...
 
 #if _LIBCPP_STD_VER >= 17
 
-template 
-struct _LIBCPP_NO_SPECIALIZATIONS conjunction : _And<_Args...> {};
+template 
+struct _LIBCPP_NO_SPECIALIZATIONS conjunction : true_type {};
+
+_LIBCPP_DIAGNOSTIC_PUSH
+#  if __has_warning("-Winvalid-specialization")
+_LIBCPP_CLANG_DIAGNOSTIC_IGNORED("-Winvalid-specialization")
+#  endif
+template 
+struct conjunction<_Arg> : _Arg {};
+
+template 
+struct conjunction<_Arg, _Args...> : conditional_t> {};
+_LIBCPP_DIAGNOSTIC_POP
 
 template 
-_LIBCPP_NO_SPECIALIZATIONS inline constexpr bool conjunction_v = 
_And<_Args...>::value;
+_LIBCPP_NO_SPECIALIZATIONS inline constexpr bool conjunction_v = 
conjunction<_Args...>::value;
 
 #endif // _LIBCPP_STD_VER >= 17
 

``




https://github.com/llvm/llvm-project/pull/127677
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libcxx] release/20.x: Revert "[libc++] Reduce std::conjunction overhead (#124259)" (PR #127677)

2025-02-18 Thread via llvm-branch-commits


https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/127677
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/20.x: [TBAA] Don't emit pointer-tbaa for void pointers. (#122116) (PR #125206)

2025-02-18 Thread John McCall via llvm-branch-commits


rjmccall wrote:

Right, I agree with taking this for the release branch. It is essentially a bug 
fix (relaxing a rule that was undesirably strict) for the pointer TBAA feature, 
which is already in the branch.

https://github.com/llvm/llvm-project/pull/125206
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [HLSL] Implement default constant buffer `$Globals` (PR #125807)

2025-02-18 Thread Helena Kotas via llvm-branch-commits



@@ -159,11 +159,16 @@ class SemaHLSL : public SemaBase {
   // List of all resource bindings
   ResourceBindings Bindings;
 
+  // default constant buffer $Globals
+  HLSLBufferDecl *DefaultCBuffer;
+
 private:
   void collectResourcesOnVarDecl(VarDecl *D);
   void collectResourcesOnUserRecordDecl(const VarDecl *VD,
 const RecordType *RT);
   void processExplicitBindingsOnDecl(VarDecl *D);
+
+  void diagnoseAvailabilityViolations(TranslationUnitDecl *TU);

hekota wrote:

It used to be called directly from clang Sema. Now clang Sema calls HLSL Sema's 
`ActOnEndOfTranslationUnit` because we need to do more work at the end of 
translation unit. The `diagnoseAvailabilityViolations` is called from 
`SemaHLSL::ActOnEndOfTranslationUnit` so it can be private now.

https://github.com/llvm/llvm-project/pull/125807
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [libc] release/20.x: [Clang] Add handlers for 'match_any' and 'match_all' to `gpuintrin.h` (#127504) (PR #127704)

2025-02-18 Thread via llvm-branch-commits


https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/127704
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [libc] release/20.x: [Clang] Add handlers for 'match_any' and 'match_all' to `gpuintrin.h` (#127504) (PR #127704)

2025-02-18 Thread via llvm-branch-commits


https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/127704

Backport 9a584b07d7c29cec65bb446782c4af72e6d8

Requested by: @jhuber6

>From 91ef311d90f040fb0fe062a898bbcbde419b Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Mon, 17 Feb 2025 14:06:24 -0600
Subject: [PATCH] [Clang] Add handlers for 'match_any' and 'match_all' to
 `gpuintrin.h` (#127504)

Summary:
These helpers are very useful but currently absent. They allow the user
to get a bitmask representing the matches within the warp. I have made
an executive decision to drop the `predicate` return from `match_all`
because it's easily testable with `match_all() == __activemask()`.

(cherry picked from commit 9a584b07d7c29cec65bb446782c4af72e6d8)
---
 clang/lib/Headers/amdgpuintrin.h  | 56 ++
 clang/lib/Headers/nvptxintrin.h   | 74 +++
 libc/src/__support/GPU/utils.h|  8 ++
 .../src/__support/GPU/CMakeLists.txt  |  9 +++
 .../integration/src/__support/GPU/match.cpp   | 35 +
 5 files changed, 182 insertions(+)
 create mode 100644 libc/test/integration/src/__support/GPU/match.cpp

diff --git a/clang/lib/Headers/amdgpuintrin.h b/clang/lib/Headers/amdgpuintrin.h
index 9dad99ffe9439..355e75d0b2d42 100644
--- a/clang/lib/Headers/amdgpuintrin.h
+++ b/clang/lib/Headers/amdgpuintrin.h
@@ -162,6 +162,62 @@ __gpu_shuffle_idx_u64(uint64_t __lane_mask, uint32_t 
__idx, uint64_t __x,
  ((uint64_t)__gpu_shuffle_idx_u32(__lane_mask, __idx, __lo, __width));
 }
 
+// Returns a bitmask marking all lanes that have the same value of __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_any_u32(uint64_t __lane_mask, uint32_t __x) {
+  uint32_t __match_mask = 0;
+
+  bool __done = 0;
+  while (__gpu_ballot(__lane_mask, !__done)) {
+if (!__done) {
+  uint32_t __first = __gpu_read_first_lane_u32(__lane_mask, __x);
+  if (__first == __x) {
+__match_mask = __gpu_lane_mask();
+__done = 1;
+  }
+}
+  }
+  __gpu_sync_lane(__lane_mask);
+  return __match_mask;
+}
+
+// Returns a bitmask marking all lanes that have the same value of __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_any_u64(uint64_t __lane_mask, uint64_t __x) {
+  uint64_t __match_mask = 0;
+
+  bool __done = 0;
+  while (__gpu_ballot(__lane_mask, __done)) {
+if (!__done) {
+  uint64_t __first = __gpu_read_first_lane_u64(__lane_mask, __x);
+  if (__first == __x) {
+__match_mask = __gpu_lane_mask();
+__done = 1;
+  }
+}
+  }
+  __gpu_sync_lane(__lane_mask);
+  return __match_mask;
+}
+
+// Returns the current lane mask if every lane contains __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_all_u32(uint64_t __lane_mask, uint32_t __x) {
+  uint32_t __first = __gpu_read_first_lane_u64(__lane_mask, __x);
+  uint64_t __ballot = __gpu_ballot(__lane_mask, __x == __first);
+  __gpu_sync_lane(__lane_mask);
+  return __ballot == __gpu_lane_mask() ? __gpu_lane_mask() : 0ull;
+}
+
+// Returns the current lane mask if every lane contains __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_all_u64(uint64_t __lane_mask, uint64_t __x) {
+  uint64_t __first = __gpu_read_first_lane_u64(__lane_mask, __x);
+  uint64_t __ballot = __gpu_ballot(__lane_mask, __x == __first);
+  __gpu_sync_lane(__lane_mask);
+  return __ballot == __gpu_lane_mask() ? __gpu_lane_mask() : 0ull;
+}
+
 // Returns true if the flat pointer points to AMDGPU 'shared' memory.
 _DEFAULT_FN_ATTRS static __inline__ bool __gpu_is_ptr_local(void *ptr) {
   return __builtin_amdgcn_is_shared((void [[clang::address_space(0)]] *)((
diff --git a/clang/lib/Headers/nvptxintrin.h b/clang/lib/Headers/nvptxintrin.h
index 40fa2edebe975..f857a87b5f4c7 100644
--- a/clang/lib/Headers/nvptxintrin.h
+++ b/clang/lib/Headers/nvptxintrin.h
@@ -13,6 +13,10 @@
 #error "This file is intended for NVPTX targets or offloading to NVPTX"
 #endif
 
+#ifndef __CUDA_ARCH__
+#define __CUDA_ARCH__ 0
+#endif
+
 #include 
 
 #if !defined(__cplusplus)
@@ -168,6 +172,76 @@ __gpu_shuffle_idx_u64(uint64_t __lane_mask, uint32_t 
__idx, uint64_t __x,
  ((uint64_t)__gpu_shuffle_idx_u32(__mask, __idx, __lo, __width));
 }
 
+// Returns a bitmask marking all lanes that have the same value of __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_any_u32(uint64_t __lane_mask, uint32_t __x) {
+  // Newer targets can use the dedicated CUDA support.
+  if (__CUDA_ARCH__ >= 700 || __nvvm_reflect("__CUDA_ARCH") >= 700)
+return __nvvm_match_any_sync_i32(__lane_mask, __x);
+
+  uint32_t __match_mask = 0;
+  bool __done = 0;
+  while (__gpu_ballot(__lane_mask, !__done)) {
+if (!__done) {
+  uint32_t __first = __gpu_read_first_lane_u32(__lane_mask, __x);
+  if (__first == __x) {
+__match_mask = __gpu_lane_mask();
+__done = 1;
+  }
+}
+  }
+  return __match_mask;
+}
+
+// Returns a bitmask marking all lanes that have the s

[llvm-branch-commits] [clang] [libc] release/20.x: [Clang] Add handlers for 'match_any' and 'match_all' to `gpuintrin.h` (#127504) (PR #127704)

2025-02-18 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-x86

Author: None (llvmbot)


Changes

Backport 9a584b07d7c29cec65bb446782c4af72e6d8

Requested by: @jhuber6

---
Full diff: https://github.com/llvm/llvm-project/pull/127704.diff


5 Files Affected:

- (modified) clang/lib/Headers/amdgpuintrin.h (+56) 
- (modified) clang/lib/Headers/nvptxintrin.h (+74) 
- (modified) libc/src/__support/GPU/utils.h (+8) 
- (modified) libc/test/integration/src/__support/GPU/CMakeLists.txt (+9) 
- (added) libc/test/integration/src/__support/GPU/match.cpp (+35) 


``diff
diff --git a/clang/lib/Headers/amdgpuintrin.h b/clang/lib/Headers/amdgpuintrin.h
index 9dad99ffe9439..355e75d0b2d42 100644
--- a/clang/lib/Headers/amdgpuintrin.h
+++ b/clang/lib/Headers/amdgpuintrin.h
@@ -162,6 +162,62 @@ __gpu_shuffle_idx_u64(uint64_t __lane_mask, uint32_t 
__idx, uint64_t __x,
  ((uint64_t)__gpu_shuffle_idx_u32(__lane_mask, __idx, __lo, __width));
 }
 
+// Returns a bitmask marking all lanes that have the same value of __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_any_u32(uint64_t __lane_mask, uint32_t __x) {
+  uint32_t __match_mask = 0;
+
+  bool __done = 0;
+  while (__gpu_ballot(__lane_mask, !__done)) {
+if (!__done) {
+  uint32_t __first = __gpu_read_first_lane_u32(__lane_mask, __x);
+  if (__first == __x) {
+__match_mask = __gpu_lane_mask();
+__done = 1;
+  }
+}
+  }
+  __gpu_sync_lane(__lane_mask);
+  return __match_mask;
+}
+
+// Returns a bitmask marking all lanes that have the same value of __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_any_u64(uint64_t __lane_mask, uint64_t __x) {
+  uint64_t __match_mask = 0;
+
+  bool __done = 0;
+  while (__gpu_ballot(__lane_mask, __done)) {
+if (!__done) {
+  uint64_t __first = __gpu_read_first_lane_u64(__lane_mask, __x);
+  if (__first == __x) {
+__match_mask = __gpu_lane_mask();
+__done = 1;
+  }
+}
+  }
+  __gpu_sync_lane(__lane_mask);
+  return __match_mask;
+}
+
+// Returns the current lane mask if every lane contains __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_all_u32(uint64_t __lane_mask, uint32_t __x) {
+  uint32_t __first = __gpu_read_first_lane_u64(__lane_mask, __x);
+  uint64_t __ballot = __gpu_ballot(__lane_mask, __x == __first);
+  __gpu_sync_lane(__lane_mask);
+  return __ballot == __gpu_lane_mask() ? __gpu_lane_mask() : 0ull;
+}
+
+// Returns the current lane mask if every lane contains __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_all_u64(uint64_t __lane_mask, uint64_t __x) {
+  uint64_t __first = __gpu_read_first_lane_u64(__lane_mask, __x);
+  uint64_t __ballot = __gpu_ballot(__lane_mask, __x == __first);
+  __gpu_sync_lane(__lane_mask);
+  return __ballot == __gpu_lane_mask() ? __gpu_lane_mask() : 0ull;
+}
+
 // Returns true if the flat pointer points to AMDGPU 'shared' memory.
 _DEFAULT_FN_ATTRS static __inline__ bool __gpu_is_ptr_local(void *ptr) {
   return __builtin_amdgcn_is_shared((void [[clang::address_space(0)]] *)((
diff --git a/clang/lib/Headers/nvptxintrin.h b/clang/lib/Headers/nvptxintrin.h
index 40fa2edebe975..f857a87b5f4c7 100644
--- a/clang/lib/Headers/nvptxintrin.h
+++ b/clang/lib/Headers/nvptxintrin.h
@@ -13,6 +13,10 @@
 #error "This file is intended for NVPTX targets or offloading to NVPTX"
 #endif
 
+#ifndef __CUDA_ARCH__
+#define __CUDA_ARCH__ 0
+#endif
+
 #include 
 
 #if !defined(__cplusplus)
@@ -168,6 +172,76 @@ __gpu_shuffle_idx_u64(uint64_t __lane_mask, uint32_t 
__idx, uint64_t __x,
  ((uint64_t)__gpu_shuffle_idx_u32(__mask, __idx, __lo, __width));
 }
 
+// Returns a bitmask marking all lanes that have the same value of __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_any_u32(uint64_t __lane_mask, uint32_t __x) {
+  // Newer targets can use the dedicated CUDA support.
+  if (__CUDA_ARCH__ >= 700 || __nvvm_reflect("__CUDA_ARCH") >= 700)
+return __nvvm_match_any_sync_i32(__lane_mask, __x);
+
+  uint32_t __match_mask = 0;
+  bool __done = 0;
+  while (__gpu_ballot(__lane_mask, !__done)) {
+if (!__done) {
+  uint32_t __first = __gpu_read_first_lane_u32(__lane_mask, __x);
+  if (__first == __x) {
+__match_mask = __gpu_lane_mask();
+__done = 1;
+  }
+}
+  }
+  return __match_mask;
+}
+
+// Returns a bitmask marking all lanes that have the same value of __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_any_u64(uint64_t __lane_mask, uint64_t __x) {
+  // Newer targets can use the dedicated CUDA support.
+  if (__CUDA_ARCH__ >= 700 || __nvvm_reflect("__CUDA_ARCH") >= 700)
+return __nvvm_match_any_sync_i64(__lane_mask, __x);
+
+  uint64_t __match_mask = 0;
+
+  bool __done = 0;
+  while (__gpu_ballot(__lane_mask, __done)) {
+if (!__done) {
+  uint64_t __first = __gpu_read_first_lane_u64(__lane_mask, __x);
+  if (__first == __x) {
+__match_mask = __gpu_lane_mask();
+__done = 1;
+  }
+}
+

[llvm-branch-commits] [clang] [libc] release/20.x: [Clang] Add handlers for 'match_any' and 'match_all' to `gpuintrin.h` (#127504) (PR #127704)

2025-02-18 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-libc

Author: None (llvmbot)


Changes

Backport 9a584b07d7c29cec65bb446782c4af72e6d8

Requested by: @jhuber6

---
Full diff: https://github.com/llvm/llvm-project/pull/127704.diff


5 Files Affected:

- (modified) clang/lib/Headers/amdgpuintrin.h (+56) 
- (modified) clang/lib/Headers/nvptxintrin.h (+74) 
- (modified) libc/src/__support/GPU/utils.h (+8) 
- (modified) libc/test/integration/src/__support/GPU/CMakeLists.txt (+9) 
- (added) libc/test/integration/src/__support/GPU/match.cpp (+35) 


``diff
diff --git a/clang/lib/Headers/amdgpuintrin.h b/clang/lib/Headers/amdgpuintrin.h
index 9dad99ffe9439..355e75d0b2d42 100644
--- a/clang/lib/Headers/amdgpuintrin.h
+++ b/clang/lib/Headers/amdgpuintrin.h
@@ -162,6 +162,62 @@ __gpu_shuffle_idx_u64(uint64_t __lane_mask, uint32_t 
__idx, uint64_t __x,
  ((uint64_t)__gpu_shuffle_idx_u32(__lane_mask, __idx, __lo, __width));
 }
 
+// Returns a bitmask marking all lanes that have the same value of __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_any_u32(uint64_t __lane_mask, uint32_t __x) {
+  uint32_t __match_mask = 0;
+
+  bool __done = 0;
+  while (__gpu_ballot(__lane_mask, !__done)) {
+if (!__done) {
+  uint32_t __first = __gpu_read_first_lane_u32(__lane_mask, __x);
+  if (__first == __x) {
+__match_mask = __gpu_lane_mask();
+__done = 1;
+  }
+}
+  }
+  __gpu_sync_lane(__lane_mask);
+  return __match_mask;
+}
+
+// Returns a bitmask marking all lanes that have the same value of __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_any_u64(uint64_t __lane_mask, uint64_t __x) {
+  uint64_t __match_mask = 0;
+
+  bool __done = 0;
+  while (__gpu_ballot(__lane_mask, __done)) {
+if (!__done) {
+  uint64_t __first = __gpu_read_first_lane_u64(__lane_mask, __x);
+  if (__first == __x) {
+__match_mask = __gpu_lane_mask();
+__done = 1;
+  }
+}
+  }
+  __gpu_sync_lane(__lane_mask);
+  return __match_mask;
+}
+
+// Returns the current lane mask if every lane contains __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_all_u32(uint64_t __lane_mask, uint32_t __x) {
+  uint32_t __first = __gpu_read_first_lane_u64(__lane_mask, __x);
+  uint64_t __ballot = __gpu_ballot(__lane_mask, __x == __first);
+  __gpu_sync_lane(__lane_mask);
+  return __ballot == __gpu_lane_mask() ? __gpu_lane_mask() : 0ull;
+}
+
+// Returns the current lane mask if every lane contains __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_all_u64(uint64_t __lane_mask, uint64_t __x) {
+  uint64_t __first = __gpu_read_first_lane_u64(__lane_mask, __x);
+  uint64_t __ballot = __gpu_ballot(__lane_mask, __x == __first);
+  __gpu_sync_lane(__lane_mask);
+  return __ballot == __gpu_lane_mask() ? __gpu_lane_mask() : 0ull;
+}
+
 // Returns true if the flat pointer points to AMDGPU 'shared' memory.
 _DEFAULT_FN_ATTRS static __inline__ bool __gpu_is_ptr_local(void *ptr) {
   return __builtin_amdgcn_is_shared((void [[clang::address_space(0)]] *)((
diff --git a/clang/lib/Headers/nvptxintrin.h b/clang/lib/Headers/nvptxintrin.h
index 40fa2edebe975..f857a87b5f4c7 100644
--- a/clang/lib/Headers/nvptxintrin.h
+++ b/clang/lib/Headers/nvptxintrin.h
@@ -13,6 +13,10 @@
 #error "This file is intended for NVPTX targets or offloading to NVPTX"
 #endif
 
+#ifndef __CUDA_ARCH__
+#define __CUDA_ARCH__ 0
+#endif
+
 #include 
 
 #if !defined(__cplusplus)
@@ -168,6 +172,76 @@ __gpu_shuffle_idx_u64(uint64_t __lane_mask, uint32_t 
__idx, uint64_t __x,
  ((uint64_t)__gpu_shuffle_idx_u32(__mask, __idx, __lo, __width));
 }
 
+// Returns a bitmask marking all lanes that have the same value of __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_any_u32(uint64_t __lane_mask, uint32_t __x) {
+  // Newer targets can use the dedicated CUDA support.
+  if (__CUDA_ARCH__ >= 700 || __nvvm_reflect("__CUDA_ARCH") >= 700)
+return __nvvm_match_any_sync_i32(__lane_mask, __x);
+
+  uint32_t __match_mask = 0;
+  bool __done = 0;
+  while (__gpu_ballot(__lane_mask, !__done)) {
+if (!__done) {
+  uint32_t __first = __gpu_read_first_lane_u32(__lane_mask, __x);
+  if (__first == __x) {
+__match_mask = __gpu_lane_mask();
+__done = 1;
+  }
+}
+  }
+  return __match_mask;
+}
+
+// Returns a bitmask marking all lanes that have the same value of __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_any_u64(uint64_t __lane_mask, uint64_t __x) {
+  // Newer targets can use the dedicated CUDA support.
+  if (__CUDA_ARCH__ >= 700 || __nvvm_reflect("__CUDA_ARCH") >= 700)
+return __nvvm_match_any_sync_i64(__lane_mask, __x);
+
+  uint64_t __match_mask = 0;
+
+  bool __done = 0;
+  while (__gpu_ballot(__lane_mask, __done)) {
+if (!__done) {
+  uint64_t __first = __gpu_read_first_lane_u64(__lane_mask, __x);
+  if (__first == __x) {
+__match_mask = __gpu_lane_mask();
+__done = 1;
+  }
+}
+  }
+

[llvm-branch-commits] [clang] [libc] release/20.x: [Clang] Add handlers for 'match_any' and 'match_all' to `gpuintrin.h` (#127504) (PR #127704)

2025-02-18 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-clang

Author: None (llvmbot)


Changes

Backport 9a584b07d7c29cec65bb446782c4af72e6d8

Requested by: @jhuber6

---
Full diff: https://github.com/llvm/llvm-project/pull/127704.diff


5 Files Affected:

- (modified) clang/lib/Headers/amdgpuintrin.h (+56) 
- (modified) clang/lib/Headers/nvptxintrin.h (+74) 
- (modified) libc/src/__support/GPU/utils.h (+8) 
- (modified) libc/test/integration/src/__support/GPU/CMakeLists.txt (+9) 
- (added) libc/test/integration/src/__support/GPU/match.cpp (+35) 


``diff
diff --git a/clang/lib/Headers/amdgpuintrin.h b/clang/lib/Headers/amdgpuintrin.h
index 9dad99ffe9439..355e75d0b2d42 100644
--- a/clang/lib/Headers/amdgpuintrin.h
+++ b/clang/lib/Headers/amdgpuintrin.h
@@ -162,6 +162,62 @@ __gpu_shuffle_idx_u64(uint64_t __lane_mask, uint32_t 
__idx, uint64_t __x,
  ((uint64_t)__gpu_shuffle_idx_u32(__lane_mask, __idx, __lo, __width));
 }
 
+// Returns a bitmask marking all lanes that have the same value of __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_any_u32(uint64_t __lane_mask, uint32_t __x) {
+  uint32_t __match_mask = 0;
+
+  bool __done = 0;
+  while (__gpu_ballot(__lane_mask, !__done)) {
+if (!__done) {
+  uint32_t __first = __gpu_read_first_lane_u32(__lane_mask, __x);
+  if (__first == __x) {
+__match_mask = __gpu_lane_mask();
+__done = 1;
+  }
+}
+  }
+  __gpu_sync_lane(__lane_mask);
+  return __match_mask;
+}
+
+// Returns a bitmask marking all lanes that have the same value of __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_any_u64(uint64_t __lane_mask, uint64_t __x) {
+  uint64_t __match_mask = 0;
+
+  bool __done = 0;
+  while (__gpu_ballot(__lane_mask, __done)) {
+if (!__done) {
+  uint64_t __first = __gpu_read_first_lane_u64(__lane_mask, __x);
+  if (__first == __x) {
+__match_mask = __gpu_lane_mask();
+__done = 1;
+  }
+}
+  }
+  __gpu_sync_lane(__lane_mask);
+  return __match_mask;
+}
+
+// Returns the current lane mask if every lane contains __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_all_u32(uint64_t __lane_mask, uint32_t __x) {
+  uint32_t __first = __gpu_read_first_lane_u64(__lane_mask, __x);
+  uint64_t __ballot = __gpu_ballot(__lane_mask, __x == __first);
+  __gpu_sync_lane(__lane_mask);
+  return __ballot == __gpu_lane_mask() ? __gpu_lane_mask() : 0ull;
+}
+
+// Returns the current lane mask if every lane contains __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_all_u64(uint64_t __lane_mask, uint64_t __x) {
+  uint64_t __first = __gpu_read_first_lane_u64(__lane_mask, __x);
+  uint64_t __ballot = __gpu_ballot(__lane_mask, __x == __first);
+  __gpu_sync_lane(__lane_mask);
+  return __ballot == __gpu_lane_mask() ? __gpu_lane_mask() : 0ull;
+}
+
 // Returns true if the flat pointer points to AMDGPU 'shared' memory.
 _DEFAULT_FN_ATTRS static __inline__ bool __gpu_is_ptr_local(void *ptr) {
   return __builtin_amdgcn_is_shared((void [[clang::address_space(0)]] *)((
diff --git a/clang/lib/Headers/nvptxintrin.h b/clang/lib/Headers/nvptxintrin.h
index 40fa2edebe975..f857a87b5f4c7 100644
--- a/clang/lib/Headers/nvptxintrin.h
+++ b/clang/lib/Headers/nvptxintrin.h
@@ -13,6 +13,10 @@
 #error "This file is intended for NVPTX targets or offloading to NVPTX"
 #endif
 
+#ifndef __CUDA_ARCH__
+#define __CUDA_ARCH__ 0
+#endif
+
 #include 
 
 #if !defined(__cplusplus)
@@ -168,6 +172,76 @@ __gpu_shuffle_idx_u64(uint64_t __lane_mask, uint32_t 
__idx, uint64_t __x,
  ((uint64_t)__gpu_shuffle_idx_u32(__mask, __idx, __lo, __width));
 }
 
+// Returns a bitmask marking all lanes that have the same value of __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_any_u32(uint64_t __lane_mask, uint32_t __x) {
+  // Newer targets can use the dedicated CUDA support.
+  if (__CUDA_ARCH__ >= 700 || __nvvm_reflect("__CUDA_ARCH") >= 700)
+return __nvvm_match_any_sync_i32(__lane_mask, __x);
+
+  uint32_t __match_mask = 0;
+  bool __done = 0;
+  while (__gpu_ballot(__lane_mask, !__done)) {
+if (!__done) {
+  uint32_t __first = __gpu_read_first_lane_u32(__lane_mask, __x);
+  if (__first == __x) {
+__match_mask = __gpu_lane_mask();
+__done = 1;
+  }
+}
+  }
+  return __match_mask;
+}
+
+// Returns a bitmask marking all lanes that have the same value of __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_any_u64(uint64_t __lane_mask, uint64_t __x) {
+  // Newer targets can use the dedicated CUDA support.
+  if (__CUDA_ARCH__ >= 700 || __nvvm_reflect("__CUDA_ARCH") >= 700)
+return __nvvm_match_any_sync_i64(__lane_mask, __x);
+
+  uint64_t __match_mask = 0;
+
+  bool __done = 0;
+  while (__gpu_ballot(__lane_mask, __done)) {
+if (!__done) {
+  uint64_t __first = __gpu_read_first_lane_u64(__lane_mask, __x);
+  if (__first == __x) {
+__match_mask = __gpu_lane_mask();
+__done = 1;
+  }
+}
+  }
+

[llvm-branch-commits] [clang] [libc] release/20.x: [Clang] Add handlers for 'match_any' and 'match_all' to `gpuintrin.h` (#127504) (PR #127704)

2025-02-18 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: None (llvmbot)


Changes

Backport 9a584b07d7c29cec65bb446782c4af72e6d8

Requested by: @jhuber6

---
Full diff: https://github.com/llvm/llvm-project/pull/127704.diff


5 Files Affected:

- (modified) clang/lib/Headers/amdgpuintrin.h (+56) 
- (modified) clang/lib/Headers/nvptxintrin.h (+74) 
- (modified) libc/src/__support/GPU/utils.h (+8) 
- (modified) libc/test/integration/src/__support/GPU/CMakeLists.txt (+9) 
- (added) libc/test/integration/src/__support/GPU/match.cpp (+35) 


``diff
diff --git a/clang/lib/Headers/amdgpuintrin.h b/clang/lib/Headers/amdgpuintrin.h
index 9dad99ffe9439..355e75d0b2d42 100644
--- a/clang/lib/Headers/amdgpuintrin.h
+++ b/clang/lib/Headers/amdgpuintrin.h
@@ -162,6 +162,62 @@ __gpu_shuffle_idx_u64(uint64_t __lane_mask, uint32_t 
__idx, uint64_t __x,
  ((uint64_t)__gpu_shuffle_idx_u32(__lane_mask, __idx, __lo, __width));
 }
 
+// Returns a bitmask marking all lanes that have the same value of __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_any_u32(uint64_t __lane_mask, uint32_t __x) {
+  uint32_t __match_mask = 0;
+
+  bool __done = 0;
+  while (__gpu_ballot(__lane_mask, !__done)) {
+if (!__done) {
+  uint32_t __first = __gpu_read_first_lane_u32(__lane_mask, __x);
+  if (__first == __x) {
+__match_mask = __gpu_lane_mask();
+__done = 1;
+  }
+}
+  }
+  __gpu_sync_lane(__lane_mask);
+  return __match_mask;
+}
+
+// Returns a bitmask marking all lanes that have the same value of __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_any_u64(uint64_t __lane_mask, uint64_t __x) {
+  uint64_t __match_mask = 0;
+
+  bool __done = 0;
+  while (__gpu_ballot(__lane_mask, __done)) {
+if (!__done) {
+  uint64_t __first = __gpu_read_first_lane_u64(__lane_mask, __x);
+  if (__first == __x) {
+__match_mask = __gpu_lane_mask();
+__done = 1;
+  }
+}
+  }
+  __gpu_sync_lane(__lane_mask);
+  return __match_mask;
+}
+
+// Returns the current lane mask if every lane contains __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_all_u32(uint64_t __lane_mask, uint32_t __x) {
+  uint32_t __first = __gpu_read_first_lane_u64(__lane_mask, __x);
+  uint64_t __ballot = __gpu_ballot(__lane_mask, __x == __first);
+  __gpu_sync_lane(__lane_mask);
+  return __ballot == __gpu_lane_mask() ? __gpu_lane_mask() : 0ull;
+}
+
+// Returns the current lane mask if every lane contains __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_all_u64(uint64_t __lane_mask, uint64_t __x) {
+  uint64_t __first = __gpu_read_first_lane_u64(__lane_mask, __x);
+  uint64_t __ballot = __gpu_ballot(__lane_mask, __x == __first);
+  __gpu_sync_lane(__lane_mask);
+  return __ballot == __gpu_lane_mask() ? __gpu_lane_mask() : 0ull;
+}
+
 // Returns true if the flat pointer points to AMDGPU 'shared' memory.
 _DEFAULT_FN_ATTRS static __inline__ bool __gpu_is_ptr_local(void *ptr) {
   return __builtin_amdgcn_is_shared((void [[clang::address_space(0)]] *)((
diff --git a/clang/lib/Headers/nvptxintrin.h b/clang/lib/Headers/nvptxintrin.h
index 40fa2edebe975..f857a87b5f4c7 100644
--- a/clang/lib/Headers/nvptxintrin.h
+++ b/clang/lib/Headers/nvptxintrin.h
@@ -13,6 +13,10 @@
 #error "This file is intended for NVPTX targets or offloading to NVPTX"
 #endif
 
+#ifndef __CUDA_ARCH__
+#define __CUDA_ARCH__ 0
+#endif
+
 #include 
 
 #if !defined(__cplusplus)
@@ -168,6 +172,76 @@ __gpu_shuffle_idx_u64(uint64_t __lane_mask, uint32_t 
__idx, uint64_t __x,
  ((uint64_t)__gpu_shuffle_idx_u32(__mask, __idx, __lo, __width));
 }
 
+// Returns a bitmask marking all lanes that have the same value of __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_any_u32(uint64_t __lane_mask, uint32_t __x) {
+  // Newer targets can use the dedicated CUDA support.
+  if (__CUDA_ARCH__ >= 700 || __nvvm_reflect("__CUDA_ARCH") >= 700)
+return __nvvm_match_any_sync_i32(__lane_mask, __x);
+
+  uint32_t __match_mask = 0;
+  bool __done = 0;
+  while (__gpu_ballot(__lane_mask, !__done)) {
+if (!__done) {
+  uint32_t __first = __gpu_read_first_lane_u32(__lane_mask, __x);
+  if (__first == __x) {
+__match_mask = __gpu_lane_mask();
+__done = 1;
+  }
+}
+  }
+  return __match_mask;
+}
+
+// Returns a bitmask marking all lanes that have the same value of __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_any_u64(uint64_t __lane_mask, uint64_t __x) {
+  // Newer targets can use the dedicated CUDA support.
+  if (__CUDA_ARCH__ >= 700 || __nvvm_reflect("__CUDA_ARCH") >= 700)
+return __nvvm_match_any_sync_i64(__lane_mask, __x);
+
+  uint64_t __match_mask = 0;
+
+  bool __done = 0;
+  while (__gpu_ballot(__lane_mask, __done)) {
+if (!__done) {
+  uint64_t __first = __gpu_read_first_lane_u64(__lane_mask, __x);
+  if (__first == __x) {
+__match_mask = __gpu_lane_mask();
+__done = 1;
+  }
+

[llvm-branch-commits] [clang] [libc] release/20.x: [Clang] Add handlers for 'match_any' and 'match_all' to `gpuintrin.h` (#127504) (PR #127704)

2025-02-18 Thread via llvm-branch-commits


llvmbot wrote:

@jhuber6 What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/127704
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [libc] release/20.x: [Clang] Add handlers for 'match_any' and 'match_all' to `gpuintrin.h` (#127504) (PR #127704)

2025-02-18 Thread Joseph Huber via llvm-branch-commits


https://github.com/jhuber6 approved this pull request.

Approving my own patch feels like a conflict of interest.

https://github.com/llvm/llvm-project/pull/127704
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [flang][OpenMP] Extend `do concurrent` mapping to multi-range loops (PR #127634)

2025-02-18 Thread Kareem Ergawy via llvm-branch-commits


https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/127634

>From 4c63b2a9dc2e8c1147a38249eae7bf8244469a00 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Tue, 18 Feb 2025 06:17:17 -0600
Subject: [PATCH] [flang][OpenMP] Extend `do concurrent` mapping to multi-range
 loops

Adds support for converting mulit-range loops to OpenMP (on the host
only for now). The changes here "prepare" a loop nest for collapsing by
sinking iteration variables to the innermost `fir.do_loop` op in the
nest.
---
 flang/docs/DoConcurrentConversionToOpenMP.md  |  29 
 .../OpenMP/DoConcurrentConversion.cpp | 141 +-
 .../multiple_iteration_ranges.f90 |  72 +
 3 files changed, 240 insertions(+), 2 deletions(-)
 create mode 100644 
flang/test/Transforms/DoConcurrent/multiple_iteration_ranges.f90

diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
index 914ace0813f0e..e7665a7751035 100644
--- a/flang/docs/DoConcurrentConversionToOpenMP.md
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -173,6 +173,35 @@ omp.parallel {
 
 
 
+### Multi-range loops
+
+The pass currently supports multi-range loops as well. Given the following
+example:
+
+```fortran
+   do concurrent(i=1:n, j=1:m)
+   a(i,j) = i * j
+   end do
+```
+
+The generated `omp.loop_nest` operation look like:
+
+```
+omp.loop_nest (%arg0, %arg1)
+: index = (%17, %19) to (%18, %20)
+inclusive step (%c1_2, %c1_4) {
+  fir.store %arg0 to %private_i#1 : !fir.ref
+  fir.store %arg1 to %private_j#1 : !fir.ref
+  ...
+  omp.yield
+}
+```
+
+It is worth noting that we have privatized versions for both iteration
+variables: `i` and `j`. These are locally allocated inside the parallel/target
+OpenMP region similar to what the single-range example in previous section
+shows.
+

[llvm-branch-commits] [flang] [flang][OpenMP] Handle "loop-local values" in `do concurrent` nests (PR #127635)

2025-02-18 Thread Kareem Ergawy via llvm-branch-commits


https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/127635

>From 3d1c2e67f4a462787c3326a8ac85c3951370aafa Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Tue, 18 Feb 2025 06:40:19 -0600
Subject: [PATCH] [flang][OpenMP] Handle "loop-local values" in `do concurrent`
 nests

Extends `do concurrent` mapping to handle "loop-local values". A loop-local
value is one that is used exclusively inside the loop but allocated outside
of it. This usually corresponds to temporary values that are used inside the
loop body for initialzing other variables for example. After collecting these
values, the pass localizes them to the loop nest by moving their allocations.
---
 flang/docs/DoConcurrentConversionToOpenMP.md  | 51 ++
 .../OpenMP/DoConcurrentConversion.cpp | 68 ++-
 .../DoConcurrent/locally_destroyed_temp.f90   | 62 +
 3 files changed, 180 insertions(+), 1 deletion(-)
 create mode 100644 
flang/test/Transforms/DoConcurrent/locally_destroyed_temp.f90

diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
index e7665a7751035..66e12ebc021a5 100644
--- a/flang/docs/DoConcurrentConversionToOpenMP.md
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -202,6 +202,57 @@ variables: `i` and `j`. These are locally allocated inside 
the parallel/target
 OpenMP region similar to what the single-range example in previous section
 shows.
 
+### Data environment
+
+By default, variables that are used inside a `do concurrent` loop nest are
+either treated as `shared` in case of mapping to `host`, or mapped into the
+`target` region using a `map` clause in case of mapping to `device`. The only
+exceptions to this are:
+  1. the loop's iteration variable(s) (IV) of **perfect** loop nests. In that
+ case, for each IV, we allocate a local copy as shown by the mapping
+ examples above.
+  1. any values that are from allocations outside the loop nest and used
+ exclusively inside of it. In such cases, a local privatized
+ copy is created in the OpenMP region to prevent multiple teams of threads
+ from accessing and destroying the same memory block, which causes runtime
+ issues. For an example of such cases, see
+ `flang/test/Transforms/DoConcurrent/locally_destroyed_temp.f90`.
+
+Implicit mapping detection (for mapping to the target device) is still quite
+limited and work to make it smarter is underway for both OpenMP in general 
+and `do concurrent` mapping.
+
+ Non-perfectly-nested loops' IVs
+
+For non-perfectly-nested loops, the IVs are still treated as `shared` or
+`map` entries as pointed out above. This **might not** be consistent with what
+the Fortran specification tells us. In particular, taking the following
+snippets from the spec (version 2023) into account:
+
+> § 3.35
+> --
+> construct entity
+> entity whose identifier has the scope of a construct
+
+> § 19.4
+> --
+>  A variable that appears as an index-name in a FORALL or DO CONCURRENT
+>  construct [...] is a construct entity. A variable that has LOCAL or
+>  LOCAL_INIT locality in a DO CONCURRENT construct is a construct entity.
+> [...]
+> The name of a variable that appears as an index-name in a DO CONCURRENT
+> construct, FORALL statement, or FORALL construct has a scope of the statement
+> or construct. A variable that has LOCAL or LOCAL_INIT locality in a DO
+> CONCURRENT construct has the scope of that construct.
+
+From the above quotes, it seems there is an equivalence between the IV of a `do
+concurrent` loop and a variable with a `LOCAL` locality specifier (equivalent
+to OpenMP's `private` clause). Which means that we should probably
+localize/privatize a `do concurrent` loop's IV even if it is not perfectly
+nested in the nest we are parallelizing. For now, however, we **do not** do
+that as pointed out previously. In the near future, we propose a middle-ground
+solution (see the Next steps section for more details).
+

[llvm-branch-commits] [flang] [flang][OpenMP] Map simple `do concurrent` loops to OpenMP host constructs (PR #127633)

2025-02-18 Thread Kareem Ergawy via llvm-branch-commits


https://github.com/ergawy updated 
https://github.com/llvm/llvm-project/pull/127633

>From 06bf9bcf103167ad1ca2f45f4700bf563fb4a07c Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Tue, 18 Feb 2025 02:50:46 -0600
Subject: [PATCH] [flang][OpenMP] Map simple `do concurrent` loops to OpenMP
 host constructs

Upstreams one more part of the ROCm `do concurrent` to OpenMP mapping
pass. This PR add support for converting simple loops to the equivalent
OpenMP constructs on the host: `omp parallel do`. Towards that end, we
have to collect more information about loop nests for which we add new
utils in the `looputils` name space.
---
 flang/docs/DoConcurrentConversionToOpenMP.md  |  47 
 .../OpenMP/DoConcurrentConversion.cpp | 211 +-
 .../Transforms/DoConcurrent/basic_host.f90|  14 +-
 .../Transforms/DoConcurrent/basic_host.mlir   |  62 +
 .../DoConcurrent/non_const_bounds.f90 |  45 
 .../DoConcurrent/not_perfectly_nested.f90 |  45 
 6 files changed, 405 insertions(+), 19 deletions(-)
 create mode 100644 flang/test/Transforms/DoConcurrent/basic_host.mlir
 create mode 100644 flang/test/Transforms/DoConcurrent/non_const_bounds.f90
 create mode 100644 flang/test/Transforms/DoConcurrent/not_perfectly_nested.f90

diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
index de2525dd8b57d..914ace0813f0e 100644
--- a/flang/docs/DoConcurrentConversionToOpenMP.md
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -126,6 +126,53 @@ see the "Data environment" section below.
 See `flang/test/Transforms/DoConcurrent/loop_nest_test.f90` for more examples
 of what is and is not detected as a perfect loop nest.
 
+### Single-range loops
+
+Given the following loop:
+```fortran
+  do concurrent(i=1:n)
+a(i) = i * i
+  end do
+```
+
+ Mapping to `host`
+
+Mapping this loop to the `host`, generates MLIR operations of the following
+structure:
+
+```
+%4 = fir.address_of(@_QFEa) ...
+%6:2 = hlfir.declare %4 ...
+
+omp.parallel {
+  // Allocate private copy for `i`.
+  // TODO Use delayed privatization.
+  %19 = fir.alloca i32 {bindc_name = "i"}
+  %20:2 = hlfir.declare %19 {uniq_name = "_QFEi"} ...
+
+  omp.wsloop {
+omp.loop_nest (%arg0) : index = (%21) to (%22) inclusive step (%c1_2) {
+  %23 = fir.convert %arg0 : (index) -> i32
+  // Use the privatized version of `i`.
+  fir.store %23 to %20#1 : !fir.ref
+  ...
+
+  // Use "shared" SSA value of `a`.
+  %42 = hlfir.designate %6#0
+  hlfir.assign %35 to %42
+  ...
+  omp.yield
+}
+omp.terminator
+  }
+  omp.terminator
+}
+```
+
+ Mapping to `device`
+
+
+

[llvm-branch-commits] [llvm] AMDGPU: Fix overly conservative immediate operand check (PR #127563)

2025-02-18 Thread Anshil Gandhi via llvm-branch-commits


https://github.com/gandhi56 approved this pull request.


https://github.com/llvm/llvm-project/pull/127563
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [flang][OpenMP] Extend `do concurrent` mapping to multi-range loops (PR #127634)

2025-02-18 Thread Kareem Ergawy via llvm-branch-commits


https://github.com/ergawy edited 
https://github.com/llvm/llvm-project/pull/127634
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [flang][OpenMP] Handle "loop-local values" in `do concurrent` nests (PR #127635)

2025-02-18 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-flang-fir-hlfir

Author: Kareem Ergawy (ergawy)


Changes

Extends `do concurrent` mapping to handle "loop-local values". A loop-local 
value is one that is used exclusively inside the loop but allocated outside of 
it. This usually corresponds to temporary values that are used inside the loop 
body for initialzing other variables for example. After collecting these 
values, the pass localizes them to the loop nest by moving their allocations.

---
Full diff: https://github.com/llvm/llvm-project/pull/127635.diff


3 Files Affected:

- (modified) flang/docs/DoConcurrentConversionToOpenMP.md (+51) 
- (modified) flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp (+67-1) 
- (added) flang/test/Transforms/DoConcurrent/locally_destroyed_temp.f90 (+62) 


``diff
diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
index e7665a7751035..66e12ebc021a5 100644
--- a/flang/docs/DoConcurrentConversionToOpenMP.md
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -202,6 +202,57 @@ variables: `i` and `j`. These are locally allocated inside 
the parallel/target
 OpenMP region similar to what the single-range example in previous section
 shows.
 
+### Data environment
+
+By default, variables that are used inside a `do concurrent` loop nest are
+either treated as `shared` in case of mapping to `host`, or mapped into the
+`target` region using a `map` clause in case of mapping to `device`. The only
+exceptions to this are:
+  1. the loop's iteration variable(s) (IV) of **perfect** loop nests. In that
+ case, for each IV, we allocate a local copy as shown by the mapping
+ examples above.
+  1. any values that are from allocations outside the loop nest and used
+ exclusively inside of it. In such cases, a local privatized
+ copy is created in the OpenMP region to prevent multiple teams of threads
+ from accessing and destroying the same memory block, which causes runtime
+ issues. For an example of such cases, see
+ `flang/test/Transforms/DoConcurrent/locally_destroyed_temp.f90`.
+
+Implicit mapping detection (for mapping to the target device) is still quite
+limited and work to make it smarter is underway for both OpenMP in general 
+and `do concurrent` mapping.
+
+ Non-perfectly-nested loops' IVs
+
+For non-perfectly-nested loops, the IVs are still treated as `shared` or
+`map` entries as pointed out above. This **might not** be consistent with what
+the Fortran specification tells us. In particular, taking the following
+snippets from the spec (version 2023) into account:
+
+> § 3.35
+> --
+> construct entity
+> entity whose identifier has the scope of a construct
+
+> § 19.4
+> --
+>  A variable that appears as an index-name in a FORALL or DO CONCURRENT
+>  construct [...] is a construct entity. A variable that has LOCAL or
+>  LOCAL_INIT locality in a DO CONCURRENT construct is a construct entity.
+> [...]
+> The name of a variable that appears as an index-name in a DO CONCURRENT
+> construct, FORALL statement, or FORALL construct has a scope of the statement
+> or construct. A variable that has LOCAL or LOCAL_INIT locality in a DO
+> CONCURRENT construct has the scope of that construct.
+
+From the above quotes, it seems there is an equivalence between the IV of a `do
+concurrent` loop and a variable with a `LOCAL` locality specifier (equivalent
+to OpenMP's `private` clause). Which means that we should probably
+localize/privatize a `do concurrent` loop's IV even if it is not perfectly
+nested in the nest we are parallelizing. For now, however, we **do not** do
+that as pointed out previously. In the near future, we propose a middle-ground
+solution (see the Next steps section for more details).
+

[llvm-branch-commits] [flang] [flang][OpenMP] Map simple `do concurrent` loops to OpenMP host constructs (PR #127633)

2025-02-18 Thread Kareem Ergawy via llvm-branch-commits


https://github.com/ergawy edited 
https://github.com/llvm/llvm-project/pull/127633
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/20.x: [TBAA] Don't emit pointer-tbaa for void pointers. (#122116) (PR #125206)

2025-02-18 Thread Aaron Ballman via llvm-branch-commits


AaronBallman wrote:

> @AaronBallman Isn't the pointer TBAA support new in Clang 20?

It changed code in 
https://github.com/llvm/llvm-project/blame/27fe2c95ee067ee013b947040538224187b3adb7/clang/lib/CodeGen/CodeGenTBAA.cpp#L117
 which is ~15 years old.

https://github.com/llvm/llvm-project/pull/125206
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [flang][OpenMP] Map simple `do concurrent` loops to OpenMP host const… (PR #127633)

2025-02-18 Thread Kareem Ergawy via llvm-branch-commits


https://github.com/ergawy created 
https://github.com/llvm/llvm-project/pull/127633

…ructs

Upstreams one more part of the ROCm `do concurrent` to OpenMP mapping pass. 
This PR add support for converting simple loops to the equivalent OpenMP 
constructs on the host: `omp parallel do`. Towards that end, we have to collect 
more information about loop nests for which we add new utils in the `looputils` 
name space.

>From 0ecf2e2f0f7d87e9aff50240c84ab9196a42d64f Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Tue, 18 Feb 2025 02:50:46 -0600
Subject: [PATCH] [flang][OpenMP] Map simple `do concurrent` loops to OpenMP
 host constructs

Upstreams one more part of the ROCm `do concurrent` to OpenMP mapping
pass. This PR add support for converting simple loops to the equivalent
OpenMP constructs on the host: `omp parallel do`. Towards that end, we
have to collect more information about loop nests for which we add new
utils in the `looputils` name space.
---
 flang/docs/DoConcurrentConversionToOpenMP.md  |  47 
 .../OpenMP/DoConcurrentConversion.cpp | 256 +-
 .../Transforms/DoConcurrent/basic_host.f90|  14 +-
 .../Transforms/DoConcurrent/basic_host.mlir   |  62 +
 .../DoConcurrent/non_const_bounds.f90 |  45 +++
 .../DoConcurrent/not_perfectly_nested.f90 |  45 +++
 6 files changed, 450 insertions(+), 19 deletions(-)
 create mode 100644 flang/test/Transforms/DoConcurrent/basic_host.mlir
 create mode 100644 flang/test/Transforms/DoConcurrent/non_const_bounds.f90
 create mode 100644 flang/test/Transforms/DoConcurrent/not_perfectly_nested.f90

diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
index de2525dd8b57d..914ace0813f0e 100644
--- a/flang/docs/DoConcurrentConversionToOpenMP.md
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -126,6 +126,53 @@ see the "Data environment" section below.
 See `flang/test/Transforms/DoConcurrent/loop_nest_test.f90` for more examples
 of what is and is not detected as a perfect loop nest.
 
+### Single-range loops
+
+Given the following loop:
+```fortran
+  do concurrent(i=1:n)
+a(i) = i * i
+  end do
+```
+
+ Mapping to `host`
+
+Mapping this loop to the `host`, generates MLIR operations of the following
+structure:
+
+```
+%4 = fir.address_of(@_QFEa) ...
+%6:2 = hlfir.declare %4 ...
+
+omp.parallel {
+  // Allocate private copy for `i`.
+  // TODO Use delayed privatization.
+  %19 = fir.alloca i32 {bindc_name = "i"}
+  %20:2 = hlfir.declare %19 {uniq_name = "_QFEi"} ...
+
+  omp.wsloop {
+omp.loop_nest (%arg0) : index = (%21) to (%22) inclusive step (%c1_2) {
+  %23 = fir.convert %arg0 : (index) -> i32
+  // Use the privatized version of `i`.
+  fir.store %23 to %20#1 : !fir.ref
+  ...
+
+  // Use "shared" SSA value of `a`.
+  %42 = hlfir.designate %6#0
+  hlfir.assign %35 to %42
+  ...
+  omp.yield
+}
+omp.terminator
+  }
+  omp.terminator
+}
+```
+
+ Mapping to `device`
+
+
+

[llvm-branch-commits] [flang] [flang][OpenMP] Map simple `do concurrent` loops to OpenMP host constructs (PR #127633)

2025-02-18 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-flang-fir-hlfir

Author: Kareem Ergawy (ergawy)


Changes

Upstreams one more part of the ROCm `do concurrent` to OpenMP mapping pass. 
This PR add support for converting simple loops to the equivalent OpenMP 
constructs on the host: `omp parallel do`. Towards that end, we have to collect 
more information about loop nests for which we add new utils in the `looputils` 
name space.

---

Patch is 21.94 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/127633.diff


6 Files Affected:

- (modified) flang/docs/DoConcurrentConversionToOpenMP.md (+47) 
- (modified) flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp (+246-10) 
- (modified) flang/test/Transforms/DoConcurrent/basic_host.f90 (+5-9) 
- (added) flang/test/Transforms/DoConcurrent/basic_host.mlir (+62) 
- (added) flang/test/Transforms/DoConcurrent/non_const_bounds.f90 (+45) 
- (added) flang/test/Transforms/DoConcurrent/not_perfectly_nested.f90 (+45) 


``diff
diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
index de2525dd8b57d..914ace0813f0e 100644
--- a/flang/docs/DoConcurrentConversionToOpenMP.md
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -126,6 +126,53 @@ see the "Data environment" section below.
 See `flang/test/Transforms/DoConcurrent/loop_nest_test.f90` for more examples
 of what is and is not detected as a perfect loop nest.
 
+### Single-range loops
+
+Given the following loop:
+```fortran
+  do concurrent(i=1:n)
+a(i) = i * i
+  end do
+```
+
+ Mapping to `host`
+
+Mapping this loop to the `host`, generates MLIR operations of the following
+structure:
+
+```
+%4 = fir.address_of(@_QFEa) ...
+%6:2 = hlfir.declare %4 ...
+
+omp.parallel {
+  // Allocate private copy for `i`.
+  // TODO Use delayed privatization.
+  %19 = fir.alloca i32 {bindc_name = "i"}
+  %20:2 = hlfir.declare %19 {uniq_name = "_QFEi"} ...
+
+  omp.wsloop {
+omp.loop_nest (%arg0) : index = (%21) to (%22) inclusive step (%c1_2) {
+  %23 = fir.convert %arg0 : (index) -> i32
+  // Use the privatized version of `i`.
+  fir.store %23 to %20#1 : !fir.ref
+  ...
+
+  // Use "shared" SSA value of `a`.
+  %42 = hlfir.designate %6#0
+  hlfir.assign %35 to %42
+  ...
+  omp.yield
+}
+omp.terminator
+  }
+  omp.terminator
+}
+```
+
+ Mapping to `device`
+
+
+

[llvm-branch-commits] [flang] [flang][OpenMP] Map simple `do concurrent` loops to OpenMP host constructs (PR #127633)

2025-02-18 Thread Kareem Ergawy via llvm-branch-commits


https://github.com/ergawy edited 
https://github.com/llvm/llvm-project/pull/127633
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [flang][OpenMP] Handle "loop-local values" in `do concurrent` nests (PR #127635)

2025-02-18 Thread Kareem Ergawy via llvm-branch-commits


https://github.com/ergawy created 
https://github.com/llvm/llvm-project/pull/127635

Extends `do concurrent` mapping to handle "loop-local values". A loop-local 
value is one that is used exclusively inside the loop but allocated outside of 
it. This usually corresponds to temporary values that are used inside the loop 
body for initialzing other variables for example. After collecting these 
values, the pass localizes them to the loop nest by moving their allocations.

>From 99369b7b57361d84564639bfed055b24191dec55 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Tue, 18 Feb 2025 06:40:19 -0600
Subject: [PATCH] [flang][OpenMP] Handle "loop-local values" in `do concurrent`
 nests

Extends `do concurrent` mapping to handle "loop-local values". A loop-local
value is one that is used exclusively inside the loop but allocated outside
of it. This usually corresponds to temporary values that are used inside the
loop body for initialzing other variables for example. After collecting these
values, the pass localizes them to the loop nest by moving their allocations.
---
 flang/docs/DoConcurrentConversionToOpenMP.md  | 51 ++
 .../OpenMP/DoConcurrentConversion.cpp | 68 ++-
 .../DoConcurrent/locally_destroyed_temp.f90   | 62 +
 3 files changed, 180 insertions(+), 1 deletion(-)
 create mode 100644 
flang/test/Transforms/DoConcurrent/locally_destroyed_temp.f90

diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
index e7665a7751035..66e12ebc021a5 100644
--- a/flang/docs/DoConcurrentConversionToOpenMP.md
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -202,6 +202,57 @@ variables: `i` and `j`. These are locally allocated inside 
the parallel/target
 OpenMP region similar to what the single-range example in previous section
 shows.
 
+### Data environment
+
+By default, variables that are used inside a `do concurrent` loop nest are
+either treated as `shared` in case of mapping to `host`, or mapped into the
+`target` region using a `map` clause in case of mapping to `device`. The only
+exceptions to this are:
+  1. the loop's iteration variable(s) (IV) of **perfect** loop nests. In that
+ case, for each IV, we allocate a local copy as shown by the mapping
+ examples above.
+  1. any values that are from allocations outside the loop nest and used
+ exclusively inside of it. In such cases, a local privatized
+ copy is created in the OpenMP region to prevent multiple teams of threads
+ from accessing and destroying the same memory block, which causes runtime
+ issues. For an example of such cases, see
+ `flang/test/Transforms/DoConcurrent/locally_destroyed_temp.f90`.
+
+Implicit mapping detection (for mapping to the target device) is still quite
+limited and work to make it smarter is underway for both OpenMP in general 
+and `do concurrent` mapping.
+
+ Non-perfectly-nested loops' IVs
+
+For non-perfectly-nested loops, the IVs are still treated as `shared` or
+`map` entries as pointed out above. This **might not** be consistent with what
+the Fortran specification tells us. In particular, taking the following
+snippets from the spec (version 2023) into account:
+
+> § 3.35
+> --
+> construct entity
+> entity whose identifier has the scope of a construct
+
+> § 19.4
+> --
+>  A variable that appears as an index-name in a FORALL or DO CONCURRENT
+>  construct [...] is a construct entity. A variable that has LOCAL or
+>  LOCAL_INIT locality in a DO CONCURRENT construct is a construct entity.
+> [...]
+> The name of a variable that appears as an index-name in a DO CONCURRENT
+> construct, FORALL statement, or FORALL construct has a scope of the statement
+> or construct. A variable that has LOCAL or LOCAL_INIT locality in a DO
+> CONCURRENT construct has the scope of that construct.
+
+From the above quotes, it seems there is an equivalence between the IV of a `do
+concurrent` loop and a variable with a `LOCAL` locality specifier (equivalent
+to OpenMP's `private` clause). Which means that we should probably
+localize/privatize a `do concurrent` loop's IV even if it is not perfectly
+nested in the nest we are parallelizing. For now, however, we **do not** do
+that as pointed out previously. In the near future, we propose a middle-ground
+solution (see the Next steps section for more details).
+

[llvm-branch-commits] [flang] [flang][OpenMP] Extend `do concurrent` mapping to multi-range loops (PR #127634)

2025-02-18 Thread Kareem Ergawy via llvm-branch-commits


https://github.com/ergawy created 
https://github.com/llvm/llvm-project/pull/127634

Adds support for converting mulit-range loops to OpenMP (on the host only for 
now). The changes here "prepare" a loop nest for collapsing by sinking 
iteration variables to the innermost `fir.do_loop` op in the nest.

>From 6d040c8d450084739a9a9b8cb1c4cfb93e1c2943 Mon Sep 17 00:00:00 2001
From: ergawy 
Date: Tue, 18 Feb 2025 06:17:17 -0600
Subject: [PATCH] [flang][OpenMP] Extend `do concurrent` mapping to multi-range
 loops

Adds support for converting mulit-range loops to OpenMP (on the host
only for now). The changes here "prepare" a loop nest for collapsing by
sinking iteration variables to the innermost `fir.do_loop` op in the
nest.
---
 flang/docs/DoConcurrentConversionToOpenMP.md  | 29 ++
 .../OpenMP/DoConcurrentConversion.cpp | 91 +++
 .../multiple_iteration_ranges.f90 | 72 +++
 3 files changed, 192 insertions(+)
 create mode 100644 
flang/test/Transforms/DoConcurrent/multiple_iteration_ranges.f90

diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
index 914ace0813f0e..e7665a7751035 100644
--- a/flang/docs/DoConcurrentConversionToOpenMP.md
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -173,6 +173,35 @@ omp.parallel {
 
 
 
+### Multi-range loops
+
+The pass currently supports multi-range loops as well. Given the following
+example:
+
+```fortran
+   do concurrent(i=1:n, j=1:m)
+   a(i,j) = i * j
+   end do
+```
+
+The generated `omp.loop_nest` operation look like:
+
+```
+omp.loop_nest (%arg0, %arg1)
+: index = (%17, %19) to (%18, %20)
+inclusive step (%c1_2, %c1_4) {
+  fir.store %arg0 to %private_i#1 : !fir.ref
+  fir.store %arg1 to %private_j#1 : !fir.ref
+  ...
+  omp.yield
+}
+```
+
+It is worth noting that we have privatized versions for both iteration
+variables: `i` and `j`. These are locally allocated inside the parallel/target
+OpenMP region similar to what the single-range example in previous section
+shows.
+

[llvm-branch-commits] [flang] [flang][OpenMP] Extend `do concurrent` mapping to multi-range loops (PR #127634)

2025-02-18 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-flang-fir-hlfir

Author: Kareem Ergawy (ergawy)


Changes

Adds support for converting mulit-range loops to OpenMP (on the host only for 
now). The changes here "prepare" a loop nest for collapsing by sinking 
iteration variables to the innermost `fir.do_loop` op in the nest.

---
Full diff: https://github.com/llvm/llvm-project/pull/127634.diff


3 Files Affected:

- (modified) flang/docs/DoConcurrentConversionToOpenMP.md (+29) 
- (modified) flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp (+91) 
- (added) flang/test/Transforms/DoConcurrent/multiple_iteration_ranges.f90 
(+72) 


``diff
diff --git a/flang/docs/DoConcurrentConversionToOpenMP.md 
b/flang/docs/DoConcurrentConversionToOpenMP.md
index 914ace0813f0e..e7665a7751035 100644
--- a/flang/docs/DoConcurrentConversionToOpenMP.md
+++ b/flang/docs/DoConcurrentConversionToOpenMP.md
@@ -173,6 +173,35 @@ omp.parallel {
 
 
 
+### Multi-range loops
+
+The pass currently supports multi-range loops as well. Given the following
+example:
+
+```fortran
+   do concurrent(i=1:n, j=1:m)
+   a(i,j) = i * j
+   end do
+```
+
+The generated `omp.loop_nest` operation look like:
+
+```
+omp.loop_nest (%arg0, %arg1)
+: index = (%17, %19) to (%18, %20)
+inclusive step (%c1_2, %c1_4) {
+  fir.store %arg0 to %private_i#1 : !fir.ref
+  fir.store %arg1 to %private_j#1 : !fir.ref
+  ...
+  omp.yield
+}
+```
+
+It is worth noting that we have privatized versions for both iteration
+variables: `i` and `j`. These are locally allocated inside the parallel/target
+OpenMP region similar to what the single-range example in previous section
+shows.
+

[llvm-branch-commits] [flang] [flang][OpenMP] Handle "loop-local values" in `do concurrent` nests (PR #127635)

2025-02-18 Thread Kareem Ergawy via llvm-branch-commits


https://github.com/ergawy edited 
https://github.com/llvm/llvm-project/pull/127635
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [flang][OpenMP] Map simple `do concurrent` loops to OpenMP host constructs (PR #127633)

2025-02-18 Thread Kareem Ergawy via llvm-branch-commits


https://github.com/ergawy edited 
https://github.com/llvm/llvm-project/pull/127633
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/20.x: [clang] Fix false positive regression for lifetime analysis warning. (#127460) (PR #127618)

2025-02-18 Thread via llvm-branch-commits


https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/127618
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/20.x: [clang] Fix false positive regression for lifetime analysis warning. (#127460) (PR #127618)

2025-02-18 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-clang

Author: None (llvmbot)


Changes

Backport 9c49b188b8e1434eb774ee8422124ad3e8870dce

Requested by: @hokein

---
Full diff: https://github.com/llvm/llvm-project/pull/127618.diff


3 Files Affected:

- (modified) clang/lib/Sema/CheckExprLifetime.cpp (+3-2) 
- (modified) clang/test/Sema/Inputs/lifetime-analysis.h (+2) 
- (modified) clang/test/Sema/warn-lifetime-analysis-nocfg.cpp (+24) 


``diff
diff --git a/clang/lib/Sema/CheckExprLifetime.cpp 
b/clang/lib/Sema/CheckExprLifetime.cpp
index 8963cad86dbca..1f87001f35b57 100644
--- a/clang/lib/Sema/CheckExprLifetime.cpp
+++ b/clang/lib/Sema/CheckExprLifetime.cpp
@@ -1239,11 +1239,12 @@ static AnalysisResult analyzePathForGSLPointer(const 
IndirectLocalPath &Path,
 }
 // Check the return type, e.g.
 //   const GSLOwner& func(const Foo& foo [[clang::lifetimebound]])
+//   GSLOwner* func(cosnt Foo& foo [[clang::lifetimebound]])
 //   GSLPointer func(const Foo& foo [[clang::lifetimebound]])
 if (FD &&
-((FD->getReturnType()->isReferenceType() &&
+((FD->getReturnType()->isPointerOrReferenceType() &&
   isRecordWithAttr(FD->getReturnType()->getPointeeType())) 
||
- isPointerLikeType(FD->getReturnType(
+ isGLSPointerType(FD->getReturnType(
   return Report;
 
 return Abandon;
diff --git a/clang/test/Sema/Inputs/lifetime-analysis.h 
b/clang/test/Sema/Inputs/lifetime-analysis.h
index d318033ff0cc4..2072e4603cead 100644
--- a/clang/test/Sema/Inputs/lifetime-analysis.h
+++ b/clang/test/Sema/Inputs/lifetime-analysis.h
@@ -61,6 +61,7 @@ struct basic_string_view {
   basic_string_view();
   basic_string_view(const T *);
   const T *begin() const;
+  const T *data() const;
 };
 using string_view = basic_string_view;
 
@@ -80,6 +81,7 @@ struct basic_string {
   const T *c_str() const;
   operator basic_string_view () const;
   using const_iterator = iter;
+  const T *data() const;
 };
 using string = basic_string;
 
diff --git a/clang/test/Sema/warn-lifetime-analysis-nocfg.cpp 
b/clang/test/Sema/warn-lifetime-analysis-nocfg.cpp
index 04bb1330ded4c..66a2a19ceb321 100644
--- a/clang/test/Sema/warn-lifetime-analysis-nocfg.cpp
+++ b/clang/test/Sema/warn-lifetime-analysis-nocfg.cpp
@@ -852,3 +852,27 @@ struct Test {
 };
 
 } // namespace GH120543
+
+namespace GH127195 {
+template 
+struct StatusOr {
+  T* operator->() [[clang::lifetimebound]];
+  T* value() [[clang::lifetimebound]];
+};
+
+const char* foo() {
+  StatusOr s;
+  return s->data(); // expected-warning {{address of stack memory associated 
with local variable}}
+  
+  StatusOr s2;
+  return s2->data();
+
+  StatusOr> s3;
+  return s3.value()->value()->data();
+
+  // FIXME: nested cases are not supported now.
+  StatusOr> s4;
+  return s4.value()->value()->data();
+}
+
+} // namespace GH127195

``




https://github.com/llvm/llvm-project/pull/127618
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/20.x: [clang] Fix false positive regression for lifetime analysis warning. (#127460) (PR #127618)

2025-02-18 Thread via llvm-branch-commits


llvmbot wrote:

@hokein What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/127618
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/20.x: [clang] Fix false positive regression for lifetime analysis warning. (#127460) (PR #127618)

2025-02-18 Thread via llvm-branch-commits


https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/127618

Backport 9c49b188b8e1434eb774ee8422124ad3e8870dce

Requested by: @hokein

>From 31928865025e07919548e21abf657c9f5aeab429 Mon Sep 17 00:00:00 2001
From: Haojian Wu 
Date: Mon, 17 Feb 2025 14:40:31 +0100
Subject: [PATCH] [clang] Fix false positive regression for lifetime analysis
 warning. (#127460)

This fixes a false positive caused by #114044.

For `GSLPointer*` types, it's less clear whether the lifetime issue is
about the GSLPointer object itself or the owner it points to. To avoid
false positives, we take a conservative approach in our heuristic.

Fixes #127195

(This will be backported to release 20).

(cherry picked from commit 9c49b188b8e1434eb774ee8422124ad3e8870dce)
---
 clang/lib/Sema/CheckExprLifetime.cpp  |  5 ++--
 clang/test/Sema/Inputs/lifetime-analysis.h|  2 ++
 .../Sema/warn-lifetime-analysis-nocfg.cpp | 24 +++
 3 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/clang/lib/Sema/CheckExprLifetime.cpp 
b/clang/lib/Sema/CheckExprLifetime.cpp
index 8963cad86dbca..1f87001f35b57 100644
--- a/clang/lib/Sema/CheckExprLifetime.cpp
+++ b/clang/lib/Sema/CheckExprLifetime.cpp
@@ -1239,11 +1239,12 @@ static AnalysisResult analyzePathForGSLPointer(const 
IndirectLocalPath &Path,
 }
 // Check the return type, e.g.
 //   const GSLOwner& func(const Foo& foo [[clang::lifetimebound]])
+//   GSLOwner* func(cosnt Foo& foo [[clang::lifetimebound]])
 //   GSLPointer func(const Foo& foo [[clang::lifetimebound]])
 if (FD &&
-((FD->getReturnType()->isReferenceType() &&
+((FD->getReturnType()->isPointerOrReferenceType() &&
   isRecordWithAttr(FD->getReturnType()->getPointeeType())) 
||
- isPointerLikeType(FD->getReturnType(
+ isGLSPointerType(FD->getReturnType(
   return Report;
 
 return Abandon;
diff --git a/clang/test/Sema/Inputs/lifetime-analysis.h 
b/clang/test/Sema/Inputs/lifetime-analysis.h
index d318033ff0cc4..2072e4603cead 100644
--- a/clang/test/Sema/Inputs/lifetime-analysis.h
+++ b/clang/test/Sema/Inputs/lifetime-analysis.h
@@ -61,6 +61,7 @@ struct basic_string_view {
   basic_string_view();
   basic_string_view(const T *);
   const T *begin() const;
+  const T *data() const;
 };
 using string_view = basic_string_view;
 
@@ -80,6 +81,7 @@ struct basic_string {
   const T *c_str() const;
   operator basic_string_view () const;
   using const_iterator = iter;
+  const T *data() const;
 };
 using string = basic_string;
 
diff --git a/clang/test/Sema/warn-lifetime-analysis-nocfg.cpp 
b/clang/test/Sema/warn-lifetime-analysis-nocfg.cpp
index 04bb1330ded4c..66a2a19ceb321 100644
--- a/clang/test/Sema/warn-lifetime-analysis-nocfg.cpp
+++ b/clang/test/Sema/warn-lifetime-analysis-nocfg.cpp
@@ -852,3 +852,27 @@ struct Test {
 };
 
 } // namespace GH120543
+
+namespace GH127195 {
+template 
+struct StatusOr {
+  T* operator->() [[clang::lifetimebound]];
+  T* value() [[clang::lifetimebound]];
+};
+
+const char* foo() {
+  StatusOr s;
+  return s->data(); // expected-warning {{address of stack memory associated 
with local variable}}
+  
+  StatusOr s2;
+  return s2->data();
+
+  StatusOr> s3;
+  return s3.value()->value()->data();
+
+  // FIXME: nested cases are not supported now.
+  StatusOr> s4;
+  return s4.value()->value()->data();
+}
+
+} // namespace GH127195

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/20.x: [TBAA] Don't emit pointer-tbaa for void pointers. (#122116) (PR #125206)

2025-02-18 Thread Florian Hahn via llvm-branch-commits


fhahn wrote:

Without the pointer-tbaa changes (which is new on by default in Clang 20), we 
would always generate `any pointer`. Without this fix, we will generate 
different tags for different `void` pointer depths. 

With this fix, we will generate `any pointer` again for `void` pointers. I 
think this should be very safe to take.

https://github.com/llvm/llvm-project/pull/125206
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Handle subregister uses in SIFoldOperands constant folding (PR #127485)

2025-02-18 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

### Merge activity

* **Feb 18, 5:11 AM EST**: A user started a stack merge that includes this pull 
request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/127485).


https://github.com/llvm/llvm-project/pull/127485
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Fix overly conservative immediate operand check (PR #127563)

2025-02-18 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/127563

>From 2903c242a24011555b7406334414fcfd4352061e Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 17 Feb 2025 22:31:48 +0700
Subject: [PATCH] AMDGPU: Fix overly conservative immediate operand check

The real legality check is peformed later anyway, so this was
unnecessarily blocking immediate folds in handled cases.

This also stops folding s_fmac_f32 to s_fmamk_f32 in a few tests,
but that seems better. The globalisel changes look suspicious,
it may be mishandling constants for VOP3P instructions.
---
 llvm/lib/Target/AMDGPU/SIFoldOperands.cpp|  3 ++-
 llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll | 16 
 llvm/test/CodeGen/AMDGPU/GlobalISel/orn2.ll  | 16 
 llvm/test/CodeGen/AMDGPU/GlobalISel/xnor.ll  |  4 +---
 llvm/test/CodeGen/AMDGPU/bug-cselect-b64.ll  |  6 ++
 llvm/test/CodeGen/AMDGPU/constrained-shift.ll|  6 ++
 .../CodeGen/AMDGPU/fold-operands-scalar-fmac.mir |  4 ++--
 llvm/test/CodeGen/AMDGPU/global-saddr-load.ll|  5 +
 llvm/test/CodeGen/AMDGPU/packed-fp32.ll  | 10 +-
 llvm/test/CodeGen/AMDGPU/scalar-float-sop2.ll|  4 ++--
 10 files changed, 25 insertions(+), 49 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp 
b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index 8248592545201..4dc7b1481746c 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -830,7 +830,8 @@ bool SIFoldOperandsImpl::tryToFoldACImm(
   if (UseOpIdx >= Desc.getNumOperands())
 return false;
 
-  if (!AMDGPU::isSISrcInlinableOperand(Desc, UseOpIdx))
+  // Filter out unhandled pseudos.
+  if (!AMDGPU::isSISrcOperand(Desc, UseOpIdx))
 return false;
 
   uint8_t OpTy = Desc.operands()[UseOpIdx].OperandType;
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll
index 4be00fedb972e..89078f20f1d47 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll
@@ -920,9 +920,7 @@ define amdgpu_ps i64 @s_andn2_v4i16(<4 x i16> inreg %src0, 
<4 x i16> inreg %src1
 ; GFX6-NEXT:s_lshl_b32 s3, s9, 16
 ; GFX6-NEXT:s_and_b32 s4, s8, 0x
 ; GFX6-NEXT:s_or_b32 s3, s3, s4
-; GFX6-NEXT:s_mov_b32 s4, -1
-; GFX6-NEXT:s_mov_b32 s5, s4
-; GFX6-NEXT:s_xor_b64 s[2:3], s[2:3], s[4:5]
+; GFX6-NEXT:s_xor_b64 s[2:3], s[2:3], -1
 ; GFX6-NEXT:s_and_b64 s[0:1], s[0:1], s[2:3]
 ; GFX6-NEXT:; return to shader part epilog
 ;
@@ -962,9 +960,7 @@ define amdgpu_ps i64 @s_andn2_v4i16_commute(<4 x i16> inreg 
%src0, <4 x i16> inr
 ; GFX6-NEXT:s_lshl_b32 s3, s9, 16
 ; GFX6-NEXT:s_and_b32 s4, s8, 0x
 ; GFX6-NEXT:s_or_b32 s3, s3, s4
-; GFX6-NEXT:s_mov_b32 s4, -1
-; GFX6-NEXT:s_mov_b32 s5, s4
-; GFX6-NEXT:s_xor_b64 s[2:3], s[2:3], s[4:5]
+; GFX6-NEXT:s_xor_b64 s[2:3], s[2:3], -1
 ; GFX6-NEXT:s_and_b64 s[0:1], s[2:3], s[0:1]
 ; GFX6-NEXT:; return to shader part epilog
 ;
@@ -1004,9 +1000,7 @@ define amdgpu_ps { i64, i64 } @s_andn2_v4i16_multi_use(<4 
x i16> inreg %src0, <4
 ; GFX6-NEXT:s_lshl_b32 s3, s9, 16
 ; GFX6-NEXT:s_and_b32 s4, s8, 0x
 ; GFX6-NEXT:s_or_b32 s3, s3, s4
-; GFX6-NEXT:s_mov_b32 s4, -1
-; GFX6-NEXT:s_mov_b32 s5, s4
-; GFX6-NEXT:s_xor_b64 s[2:3], s[2:3], s[4:5]
+; GFX6-NEXT:s_xor_b64 s[2:3], s[2:3], -1
 ; GFX6-NEXT:s_and_b64 s[0:1], s[0:1], s[2:3]
 ; GFX6-NEXT:; return to shader part epilog
 ;
@@ -1060,9 +1054,7 @@ define amdgpu_ps { i64, i64 } 
@s_andn2_v4i16_multi_foldable_use(<4 x i16> inreg
 ; GFX6-NEXT:s_lshl_b32 s5, s13, 16
 ; GFX6-NEXT:s_and_b32 s6, s12, 0x
 ; GFX6-NEXT:s_or_b32 s5, s5, s6
-; GFX6-NEXT:s_mov_b32 s6, -1
-; GFX6-NEXT:s_mov_b32 s7, s6
-; GFX6-NEXT:s_xor_b64 s[4:5], s[4:5], s[6:7]
+; GFX6-NEXT:s_xor_b64 s[4:5], s[4:5], -1
 ; GFX6-NEXT:s_and_b64 s[0:1], s[0:1], s[4:5]
 ; GFX6-NEXT:s_and_b64 s[2:3], s[2:3], s[4:5]
 ; GFX6-NEXT:; return to shader part epilog
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/orn2.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/orn2.ll
index e7119c89ac06c..065fadf3b5ef3 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/orn2.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/orn2.ll
@@ -919,9 +919,7 @@ define amdgpu_ps i64 @s_orn2_v4i16(<4 x i16> inreg %src0, 
<4 x i16> inreg %src1)
 ; GFX6-NEXT:s_lshl_b32 s3, s9, 16
 ; GFX6-NEXT:s_and_b32 s4, s8, 0x
 ; GFX6-NEXT:s_or_b32 s3, s3, s4
-; GFX6-NEXT:s_mov_b32 s4, -1
-; GFX6-NEXT:s_mov_b32 s5, s4
-; GFX6-NEXT:s_xor_b64 s[2:3], s[2:3], s[4:5]
+; GFX6-NEXT:s_xor_b64 s[2:3], s[2:3], -1
 ; GFX6-NEXT:s_or_b64 s[0:1], s[0:1], s[2:3]
 ; GFX6-NEXT:; return to shader part epilog
 ;
@@ -961,9 +959,7 @@ define amdgpu_ps i64 @s_orn2_v4i16_commute(<4 x i16> inreg 
%src0, <4 x i16> inre
 ; GFX6-NEXT:s_lshl_b32 s3, s9, 16
 ; GFX6-NEXT:

[llvm-branch-commits] [clang] release/20.x: [TBAA] Don't emit pointer-tbaa for void pointers. (#122116) (PR #125206)

2025-02-18 Thread Aaron Ballman via llvm-branch-commits


https://github.com/AaronBallman commented:

I'm on the fence. On the one hand, it's a small amount of code to change and it 
fixes miscompiles, so it's a good candidate. On the other hand, there's not 
been much bake time and it's changing behavior in Clang that's worked that way 
for about 15 years as best I can tell, so I'm not certain we know if there's 
negative fallout.

@rjmccall @fhahn do you think this should have more bake time?

https://github.com/llvm/llvm-project/pull/125206
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/20.x: [TBAA] Don't emit pointer-tbaa for void pointers. (#122116) (PR #125206)

2025-02-18 Thread Nikita Popov via llvm-branch-commits


nikic wrote:

@AaronBallman Isn't the pointer TBAA support new in Clang 20?

https://github.com/llvm/llvm-project/pull/125206
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize() (PR #127142)

2025-02-18 Thread Stanislav Mekhanoshin via llvm-branch-commits


https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/127142

>From b574a4b4afbf4cd0a6e128ea5d1e1579698124bc Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Thu, 13 Feb 2025 14:46:37 -0800
Subject: [PATCH] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize()

---
 llvm/lib/Target/AMDGPU/SIProgramInfo.cpp  |  6 ++
 .../CodeGen/AMDGPU/code-size-estimate.mir | 89 +++
 2 files changed, 95 insertions(+)

diff --git a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
index 1123696509818..b4d740422b94a 100644
--- a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
@@ -212,6 +212,12 @@ uint64_t SIProgramInfo::getFunctionCodeSize(const 
MachineFunction &MF) {
   uint64_t CodeSize = 0;
 
   for (const MachineBasicBlock &MBB : MF) {
+// The amount of padding to align code can be both underestimated and
+// overestimated. In case of inline asm used getInstSizeInBytes() will
+// return a maximum size of a single instruction, where the real size may
+// differ. At this point CodeSize may be already off.
+CodeSize = alignTo(CodeSize, MBB.getAlignment());
+
 for (const MachineInstr &MI : MBB) {
   // TODO: CodeSize should account for multiple functions.
 
diff --git a/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir 
b/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir
index 76eaf350301e4..9ae536af6f0e9 100644
--- a/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir
+++ b/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir
@@ -31,3 +31,92 @@ body: |
 
   WAVE_BARRIER
 ...
+
+# CHECK: align4: ; @align4
+# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: 
[0x00,0x00,0x8c,0xbf]
+# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}}  ; encoding: [A,A,0x85,0xbf]
+# CHECK: s_barrier   ; encoding: 
[0x00,0x00,0x8a,0xbf]
+# CHECK: .p2align2
+# CHECK: s_endpgm; encoding: 
[0x00,0x00,0x81,0xbf]
+# CHECK: ; codeLenInByte = 16
+
+---
+name:align4
+tracksRegLiveness: true
+body: |
+  bb.0:
+$scc = IMPLICIT_DEF
+S_CBRANCH_SCC1 %bb.2, implicit $scc
+
+  bb.1:
+S_BARRIER
+
+  bb.2 (align 4):
+S_ENDPGM 0
+...
+
+# CHECK: align8: ; @align8
+# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: 
[0x00,0x00,0x8c,0xbf]
+# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}}  ; encoding: [A,A,0x85,0xbf]
+# CHECK: s_barrier   ; encoding: 
[0x00,0x00,0x8a,0xbf]
+# CHECK: .p2align3
+# CHECK: s_endpgm; encoding: 
[0x00,0x00,0x81,0xbf]
+# CHECK: ; codeLenInByte = 20
+---
+name:align8
+tracksRegLiveness: true
+body: |
+  bb.0:
+$scc = IMPLICIT_DEF
+S_CBRANCH_SCC1 %bb.2, implicit $scc
+
+  bb.1:
+S_BARRIER
+
+  bb.2 (align 8):
+S_ENDPGM 0
+...
+
+# CHECK: align16:; @align16
+# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: 
[0x00,0x00,0x8c,0xbf]
+# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}}  ; encoding: [A,A,0x85,0xbf]
+# CHECK: s_barrier   ; encoding: 
[0x00,0x00,0x8a,0xbf]
+# CHECK: .p2align4
+# CHECK: s_endpgm; encoding: 
[0x00,0x00,0x81,0xbf]
+# CHECK: ; codeLenInByte = 20
+---
+name:align16
+tracksRegLiveness: true
+body: |
+  bb.0:
+$scc = IMPLICIT_DEF
+S_CBRANCH_SCC1 %bb.2, implicit $scc
+
+  bb.1:
+S_BARRIER
+
+  bb.2 (align 16):
+S_ENDPGM 0
+...
+
+# CHECK: align32:; @align32
+# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: 
[0x00,0x00,0x8c,0xbf]
+# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}}  ; encoding: [A,A,0x85,0xbf]
+# CHECK: s_barrier   ; encoding: 
[0x00,0x00,0x8a,0xbf]
+# CHECK: .p2align5
+# CHECK: s_endpgm; encoding: 
[0x00,0x00,0x81,0xbf]
+# CHECK: ; codeLenInByte = 36
+---
+name:align32
+tracksRegLiveness: true
+body: |
+  bb.0:
+$scc = IMPLICIT_DEF
+S_CBRANCH_SCC1 %bb.2, implicit $scc
+
+  bb.1:
+S_BARRIER
+
+  bb.2 (align 32):
+S_ENDPGM 0
+...

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize() (PR #127142)

2025-02-18 Thread Stanislav Mekhanoshin via llvm-branch-commits


https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/127142

>From b574a4b4afbf4cd0a6e128ea5d1e1579698124bc Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Thu, 13 Feb 2025 14:46:37 -0800
Subject: [PATCH] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize()

---
 llvm/lib/Target/AMDGPU/SIProgramInfo.cpp  |  6 ++
 .../CodeGen/AMDGPU/code-size-estimate.mir | 89 +++
 2 files changed, 95 insertions(+)

diff --git a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
index 1123696509818..b4d740422b94a 100644
--- a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
@@ -212,6 +212,12 @@ uint64_t SIProgramInfo::getFunctionCodeSize(const 
MachineFunction &MF) {
   uint64_t CodeSize = 0;
 
   for (const MachineBasicBlock &MBB : MF) {
+// The amount of padding to align code can be both underestimated and
+// overestimated. In case of inline asm used getInstSizeInBytes() will
+// return a maximum size of a single instruction, where the real size may
+// differ. At this point CodeSize may be already off.
+CodeSize = alignTo(CodeSize, MBB.getAlignment());
+
 for (const MachineInstr &MI : MBB) {
   // TODO: CodeSize should account for multiple functions.
 
diff --git a/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir 
b/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir
index 76eaf350301e4..9ae536af6f0e9 100644
--- a/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir
+++ b/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir
@@ -31,3 +31,92 @@ body: |
 
   WAVE_BARRIER
 ...
+
+# CHECK: align4: ; @align4
+# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: 
[0x00,0x00,0x8c,0xbf]
+# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}}  ; encoding: [A,A,0x85,0xbf]
+# CHECK: s_barrier   ; encoding: 
[0x00,0x00,0x8a,0xbf]
+# CHECK: .p2align2
+# CHECK: s_endpgm; encoding: 
[0x00,0x00,0x81,0xbf]
+# CHECK: ; codeLenInByte = 16
+
+---
+name:align4
+tracksRegLiveness: true
+body: |
+  bb.0:
+$scc = IMPLICIT_DEF
+S_CBRANCH_SCC1 %bb.2, implicit $scc
+
+  bb.1:
+S_BARRIER
+
+  bb.2 (align 4):
+S_ENDPGM 0
+...
+
+# CHECK: align8: ; @align8
+# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: 
[0x00,0x00,0x8c,0xbf]
+# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}}  ; encoding: [A,A,0x85,0xbf]
+# CHECK: s_barrier   ; encoding: 
[0x00,0x00,0x8a,0xbf]
+# CHECK: .p2align3
+# CHECK: s_endpgm; encoding: 
[0x00,0x00,0x81,0xbf]
+# CHECK: ; codeLenInByte = 20
+---
+name:align8
+tracksRegLiveness: true
+body: |
+  bb.0:
+$scc = IMPLICIT_DEF
+S_CBRANCH_SCC1 %bb.2, implicit $scc
+
+  bb.1:
+S_BARRIER
+
+  bb.2 (align 8):
+S_ENDPGM 0
+...
+
+# CHECK: align16:; @align16
+# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: 
[0x00,0x00,0x8c,0xbf]
+# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}}  ; encoding: [A,A,0x85,0xbf]
+# CHECK: s_barrier   ; encoding: 
[0x00,0x00,0x8a,0xbf]
+# CHECK: .p2align4
+# CHECK: s_endpgm; encoding: 
[0x00,0x00,0x81,0xbf]
+# CHECK: ; codeLenInByte = 20
+---
+name:align16
+tracksRegLiveness: true
+body: |
+  bb.0:
+$scc = IMPLICIT_DEF
+S_CBRANCH_SCC1 %bb.2, implicit $scc
+
+  bb.1:
+S_BARRIER
+
+  bb.2 (align 16):
+S_ENDPGM 0
+...
+
+# CHECK: align32:; @align32
+# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: 
[0x00,0x00,0x8c,0xbf]
+# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}}  ; encoding: [A,A,0x85,0xbf]
+# CHECK: s_barrier   ; encoding: 
[0x00,0x00,0x8a,0xbf]
+# CHECK: .p2align5
+# CHECK: s_endpgm; encoding: 
[0x00,0x00,0x81,0xbf]
+# CHECK: ; codeLenInByte = 36
+---
+name:align32
+tracksRegLiveness: true
+body: |
+  bb.0:
+$scc = IMPLICIT_DEF
+S_CBRANCH_SCC1 %bb.2, implicit $scc
+
+  bb.1:
+S_BARRIER
+
+  bb.2 (align 32):
+S_ENDPGM 0
+...

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/20.x: [clang] StmtPrinter: Handle DeclRefExpr to a Decomposition (#125001) (PR #126659)

2025-02-18 Thread via llvm-branch-commits


https://github.com/cor3ntin approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/126659
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/20.x: [clang-format] Fix a bug in annotating StartOfName (#127545) (PR #127591)

2025-02-18 Thread kadir çetinkaya via llvm-branch-commits


https://github.com/kadircet approved this pull request.


https://github.com/llvm/llvm-project/pull/127591
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libclc] release/20.x: [libclc] Disable external-calls testing for clspv targets (#127529) (PR #127597)

2025-02-18 Thread via llvm-branch-commits


llvmbot wrote:

@mgorny What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/127597
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libclc] release/20.x: [libclc] Disable external-calls testing for clspv targets (#127529) (PR #127597)

2025-02-18 Thread via llvm-branch-commits


https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/127597

Backport 9fec0a0942f5a11f4dcfec20aa485a8513661720

Requested by: @nikic

>From 1df060a8aa0da99ba5c3da3c4981be2ac5bd2bb2 Mon Sep 17 00:00:00 2001
From: Fraser Cormack 
Date: Tue, 18 Feb 2025 09:14:04 +
Subject: [PATCH] [libclc] Disable external-calls testing for clspv targets
 (#127529)

These targets don't include all OpenCL builtins, so there will always be
external calls in the final bytecode module.

Fixes #127316.

(cherry picked from commit 9fec0a0942f5a11f4dcfec20aa485a8513661720)
---
 libclc/cmake/modules/AddLibclc.cmake | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/libclc/cmake/modules/AddLibclc.cmake 
b/libclc/cmake/modules/AddLibclc.cmake
index b520626c6ffd1..717121abb8c98 100644
--- a/libclc/cmake/modules/AddLibclc.cmake
+++ b/libclc/cmake/modules/AddLibclc.cmake
@@ -345,8 +345,9 @@ function(add_libclc_builtin_set)
   add_custom_target( prepare-${obj_suffix} ALL DEPENDS ${obj_suffix} )
   set_target_properties( "prepare-${obj_suffix}" PROPERTIES FOLDER 
"libclc/Device IR/Prepare" )
 
-  # nvptx-- targets don't include workitem builtins
-  if( NOT ARG_TRIPLE MATCHES ".*ptx.*--$" )
+  # nvptx-- targets don't include workitem builtins, and clspv targets don't
+  # include all OpenCL builtins
+  if( NOT ARG_ARCH MATCHES "^(nvptx|clspv)(64)?$" )
 add_test( NAME external-calls-${obj_suffix}
   COMMAND ./check_external_calls.sh 
${CMAKE_CURRENT_BINARY_DIR}/${obj_suffix} ${LLVM_TOOLS_BINARY_DIR}
   WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR} )

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libclc] release/20.x: [libclc] Disable external-calls testing for clspv targets (#127529) (PR #127597)

2025-02-18 Thread via llvm-branch-commits


https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/127597
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libclc] release/20.x: [libclc] Disable external-calls testing for clspv targets (#127529) (PR #127597)

2025-02-18 Thread Michał Górny via llvm-branch-commits


https://github.com/mgorny approved this pull request.


https://github.com/llvm/llvm-project/pull/127597
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] 8261d59 - Revert "[flang][Lower][OpenMP] Don't read moldarg for static sized array (#12…"

2025-02-18 Thread via llvm-branch-commits


Author: Tom Eccles
Date: 2025-02-18T09:12:08Z
New Revision: 8261d59fcb025db2fdecc2f4497b3314dd7d992e

URL: 
https://github.com/llvm/llvm-project/commit/8261d59fcb025db2fdecc2f4497b3314dd7d992e
DIFF: 
https://github.com/llvm/llvm-project/commit/8261d59fcb025db2fdecc2f4497b3314dd7d992e.diff

LOG: Revert "[flang][Lower][OpenMP] Don't read moldarg for static sized array 
(#12…"

This reverts commit 88dd372d673c7e6967c93aa2879f0ef04fc7ac20.

Added: 


Modified: 
flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
flang/lib/Lower/OpenMP/PrivateReductionUtils.h
flang/test/Lower/OpenMP/delayed-privatization-array.f90

Removed: 




diff  --git a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp 
b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
index d725dfd3e94f3..d13f101f516e7 100644
--- a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
@@ -508,8 +508,6 @@ void DataSharingProcessor::doPrivatize(const 
semantics::Symbol *sym,
 
   lower::SymbolBox hsb = converter.lookupOneLevelUpSymbol(*sym);
   assert(hsb && "Host symbol box not found");
-  hlfir::Entity entity{hsb.getAddr()};
-  bool cannotHaveNonDefaultLowerBounds = 
!entity.mayHaveNonDefaultLowerBounds();
 
   mlir::Location symLoc = hsb.getAddr().getLoc();
   std::string privatizerName = sym->name().ToString() + ".privatizer";
@@ -530,6 +528,7 @@ void DataSharingProcessor::doPrivatize(const 
semantics::Symbol *sym,
   // an alloca for a fir.array type there. Get around this by boxing all
   // arrays.
   if (mlir::isa(allocType)) {
+hlfir::Entity entity{hsb.getAddr()};
 entity = genVariableBox(symLoc, firOpBuilder, entity);
 privVal = entity.getBase();
 allocType = privVal.getType();
@@ -591,7 +590,7 @@ void DataSharingProcessor::doPrivatize(const 
semantics::Symbol *sym,
   result.getDeallocRegion(),
   isFirstPrivate ? DeclOperationKind::FirstPrivate
  : DeclOperationKind::Private,
-  sym, cannotHaveNonDefaultLowerBounds);
+  sym);
   // TODO: currently there are false positives from dead uses of the mold
   // arg
   if (!result.getInitMoldArg().getUses().empty())

diff  --git a/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp 
b/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
index 21ade77d82d37..22cd0679050db 100644
--- a/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
+++ b/flang/lib/Lower/OpenMP/PrivateReductionUtils.cpp
@@ -122,40 +122,25 @@ static void 
createCleanupRegion(Fortran::lower::AbstractConverter &converter,
   typeError();
 }
 
-fir::ShapeShiftOp
-Fortran::lower::omp::getShapeShift(fir::FirOpBuilder &builder,
-   mlir::Location loc, mlir::Value box,
-   bool cannotHaveNonDefaultLowerBounds) {
+fir::ShapeShiftOp Fortran::lower::omp::getShapeShift(fir::FirOpBuilder 
&builder,
+ mlir::Location loc,
+ mlir::Value box) {
   fir::SequenceType sequenceType = mlir::cast(
   hlfir::getFortranElementOrSequenceType(box.getType()));
   const unsigned rank = sequenceType.getDimension();
-
   llvm::SmallVector lbAndExtents;
   lbAndExtents.reserve(rank * 2);
-  mlir::Type idxTy = builder.getIndexType();
 
-  if (cannotHaveNonDefaultLowerBounds && !sequenceType.hasDynamicExtents()) {
-// We don't need fir::BoxDimsOp if all of the extents are statically known
-// and we can assume default lower bounds. This helps avoids reads from the
-// mold arg.
-mlir::Value one = builder.createIntegerConstant(loc, idxTy, 1);
-for (int64_t extent : sequenceType.getShape()) {
-  assert(extent != sequenceType.getUnknownExtent());
-  mlir::Value extentVal = builder.createIntegerConstant(loc, idxTy, 
extent);
-  lbAndExtents.push_back(one);
-  lbAndExtents.push_back(extentVal);
-}
-  } else {
-for (unsigned i = 0; i < rank; ++i) {
-  // TODO: ideally we want to hoist box reads out of the critical section.
-  // We could do this by having box dimensions in block arguments like
-  // OpenACC does
-  mlir::Value dim = builder.createIntegerConstant(loc, idxTy, i);
-  auto dimInfo =
-  builder.create(loc, idxTy, idxTy, idxTy, box, dim);
-  lbAndExtents.push_back(dimInfo.getLowerBound());
-  lbAndExtents.push_back(dimInfo.getExtent());
-}
+  mlir::Type idxTy = builder.getIndexType();
+  for (unsigned i = 0; i < rank; ++i) {
+// TODO: ideally we want to hoist box reads out of the critical section.
+// We could do this by having box dimensions in block arguments like
+// OpenACC does
+mlir::Value dim = builder.createIntegerConstant(loc, idxTy, i);
+auto dimInfo =
+builder.create(loc, idxTy, idxTy, idxTy, box, dim);
+

[llvm-branch-commits] [flang] [mlir] release/20.x: [mlir][cmake] Do not export MLIR_MAIN_SRC_DIR and MLIR_INCLUDE_DIR (#125842) (PR #127589)

2025-02-18 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-flang-fir-hlfir

Author: None (llvmbot)


Changes

Backport 82bd148a3f25439d7f52a32422dc1bcd2da03803

Requested by: @nikic

---
Full diff: https://github.com/llvm/llvm-project/pull/127589.diff


3 Files Affected:

- (modified) flang/CMakeLists.txt (+4-3) 
- (modified) flang/include/flang/Optimizer/Dialect/CMakeLists.txt (+1-1) 
- (modified) mlir/cmake/modules/MLIRConfig.cmake.in (-4) 


``diff
diff --git a/flang/CMakeLists.txt b/flang/CMakeLists.txt
index b619553ef8302..b24b177cc21cc 100644
--- a/flang/CMakeLists.txt
+++ b/flang/CMakeLists.txt
@@ -79,6 +79,8 @@ if(CMAKE_SIZEOF_VOID_P EQUAL 4)
   message(FATAL_ERROR "flang isn't supported on 32 bit CPUs")
 endif()
 
+set(MLIR_MAIN_SRC_DIR "${CMAKE_CURRENT_SOURCE_DIR}/../mlir" CACHE PATH "Path 
to MLIR source tree")
+
 if (FLANG_STANDALONE_BUILD)
   set(FLANG_BINARY_DIR ${CMAKE_CURRENT_BINARY_DIR})
   set(CMAKE_INCLUDE_CURRENT_DIR ON)
@@ -240,10 +242,9 @@ else()
 set(FLANG_BINARY_DIR ${CMAKE_CURRENT_BINARY_DIR})
   endif()
 
-  set(MLIR_MAIN_SRC_DIR ${LLVM_MAIN_SRC_DIR}/../mlir ) # --src-root
-  set(MLIR_INCLUDE_DIR ${MLIR_MAIN_SRC_DIR}/include ) # --includedir
+  set(MLIR_INCLUDE_DIRS ${MLIR_MAIN_SRC_DIR}/include ) # --includedir
   set(MLIR_TABLEGEN_OUTPUT_DIR ${CMAKE_BINARY_DIR}/tools/mlir/include)
-  include_directories(SYSTEM ${MLIR_INCLUDE_DIR})
+  include_directories(SYSTEM ${MLIR_INCLUDE_DIRS})
   include_directories(SYSTEM ${MLIR_TABLEGEN_OUTPUT_DIR})
 endif()
 
diff --git a/flang/include/flang/Optimizer/Dialect/CMakeLists.txt 
b/flang/include/flang/Optimizer/Dialect/CMakeLists.txt
index 10ab213b30b02..73f388cbab6c9 100644
--- a/flang/include/flang/Optimizer/Dialect/CMakeLists.txt
+++ b/flang/include/flang/Optimizer/Dialect/CMakeLists.txt
@@ -37,7 +37,7 @@ set_target_properties(flang-doc PROPERTIES FOLDER 
"Flang/Docs")
 set(dialect_doc_filename "FIRLangRef")
 
 set(LLVM_TARGET_DEFINITIONS FIROps.td)
-tablegen(MLIR ${dialect_doc_filename}.md -gen-op-doc "-I${MLIR_INCLUDE_DIR}")
+tablegen(MLIR ${dialect_doc_filename}.md -gen-op-doc)
 set(GEN_DOC_FILE ${FLANG_BINARY_DIR}/docs/Dialect/${dialect_doc_filename}.md)
 add_custom_command(
 OUTPUT ${GEN_DOC_FILE}
diff --git a/mlir/cmake/modules/MLIRConfig.cmake.in 
b/mlir/cmake/modules/MLIRConfig.cmake.in
index 7076d94a32f2b..c695b5787af66 100644
--- a/mlir/cmake/modules/MLIRConfig.cmake.in
+++ b/mlir/cmake/modules/MLIRConfig.cmake.in
@@ -16,10 +16,6 @@ set(MLIR_INSTALL_AGGREGATE_OBJECTS 
"@MLIR_INSTALL_AGGREGATE_OBJECTS@")
 set(MLIR_ENABLE_BINDINGS_PYTHON "@MLIR_ENABLE_BINDINGS_PYTHON@")
 set(MLIR_ENABLE_EXECUTION_ENGINE "@MLIR_ENABLE_EXECUTION_ENGINE@")
 
-# For mlir_tablegen()
-set(MLIR_INCLUDE_DIR "@MLIR_INCLUDE_DIR@")
-set(MLIR_MAIN_SRC_DIR "@MLIR_MAIN_SRC_DIR@")
-
 set_property(GLOBAL PROPERTY MLIR_ALL_LIBS "@MLIR_ALL_LIBS@")
 set_property(GLOBAL PROPERTY MLIR_DIALECT_LIBS "@MLIR_DIALECT_LIBS@")
 set_property(GLOBAL PROPERTY MLIR_CONVERSION_LIBS "@MLIR_CONVERSION_LIBS@")

``




https://github.com/llvm/llvm-project/pull/127589
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [mlir] release/20.x: [mlir][cmake] Do not export MLIR_MAIN_SRC_DIR and MLIR_INCLUDE_DIR (#125842) (PR #127589)

2025-02-18 Thread via llvm-branch-commits


https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/127589

Backport 82bd148a3f25439d7f52a32422dc1bcd2da03803

Requested by: @nikic

>From ac12ac1b351bd9749c23c4cd8eac6231727cad5d Mon Sep 17 00:00:00 2001
From: Nikita Popov 
Date: Tue, 11 Feb 2025 14:32:30 +0100
Subject: [PATCH] [mlir][cmake] Do not export MLIR_MAIN_SRC_DIR and
 MLIR_INCLUDE_DIR (#125842)

MLIR_MAIN_SRC_DIR and MLIR_INCLUDE_DIR point to the source directory,
which is not installed. As such, the installed MLIRConfig.cmake also
should not reference it.

The comment indicates that these are needed for mlir_tablegen(), but I
don't see any related uses.

The motivation for this is the use in flang, where we end up inheriting
a meaningless MLIR_MAIN_SRC_DIR from a previous MLIR build, whose source
directory doesn't exist anymore, and that cannot be overridden with the
correct path, because it's not a cached variable.

Instead do what all the other projects do for LLVM_MAIN_SRC_DIR and
initialize MLIR_MAIN_SRC_DIR to CMAKE_CURRENT_SOURCE_DIR/../mlir.

For MLIR_INCLUDE_DIR there already is an exported MLIR_INCLUDE_DIRS,
which can be used instead.

(cherry picked from commit 82bd148a3f25439d7f52a32422dc1bcd2da03803)
---
 flang/CMakeLists.txt | 7 ---
 flang/include/flang/Optimizer/Dialect/CMakeLists.txt | 2 +-
 mlir/cmake/modules/MLIRConfig.cmake.in   | 4 
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/flang/CMakeLists.txt b/flang/CMakeLists.txt
index b619553ef8302..b24b177cc21cc 100644
--- a/flang/CMakeLists.txt
+++ b/flang/CMakeLists.txt
@@ -79,6 +79,8 @@ if(CMAKE_SIZEOF_VOID_P EQUAL 4)
   message(FATAL_ERROR "flang isn't supported on 32 bit CPUs")
 endif()
 
+set(MLIR_MAIN_SRC_DIR "${CMAKE_CURRENT_SOURCE_DIR}/../mlir" CACHE PATH "Path 
to MLIR source tree")
+
 if (FLANG_STANDALONE_BUILD)
   set(FLANG_BINARY_DIR ${CMAKE_CURRENT_BINARY_DIR})
   set(CMAKE_INCLUDE_CURRENT_DIR ON)
@@ -240,10 +242,9 @@ else()
 set(FLANG_BINARY_DIR ${CMAKE_CURRENT_BINARY_DIR})
   endif()
 
-  set(MLIR_MAIN_SRC_DIR ${LLVM_MAIN_SRC_DIR}/../mlir ) # --src-root
-  set(MLIR_INCLUDE_DIR ${MLIR_MAIN_SRC_DIR}/include ) # --includedir
+  set(MLIR_INCLUDE_DIRS ${MLIR_MAIN_SRC_DIR}/include ) # --includedir
   set(MLIR_TABLEGEN_OUTPUT_DIR ${CMAKE_BINARY_DIR}/tools/mlir/include)
-  include_directories(SYSTEM ${MLIR_INCLUDE_DIR})
+  include_directories(SYSTEM ${MLIR_INCLUDE_DIRS})
   include_directories(SYSTEM ${MLIR_TABLEGEN_OUTPUT_DIR})
 endif()
 
diff --git a/flang/include/flang/Optimizer/Dialect/CMakeLists.txt 
b/flang/include/flang/Optimizer/Dialect/CMakeLists.txt
index 10ab213b30b02..73f388cbab6c9 100644
--- a/flang/include/flang/Optimizer/Dialect/CMakeLists.txt
+++ b/flang/include/flang/Optimizer/Dialect/CMakeLists.txt
@@ -37,7 +37,7 @@ set_target_properties(flang-doc PROPERTIES FOLDER 
"Flang/Docs")
 set(dialect_doc_filename "FIRLangRef")
 
 set(LLVM_TARGET_DEFINITIONS FIROps.td)
-tablegen(MLIR ${dialect_doc_filename}.md -gen-op-doc "-I${MLIR_INCLUDE_DIR}")
+tablegen(MLIR ${dialect_doc_filename}.md -gen-op-doc)
 set(GEN_DOC_FILE ${FLANG_BINARY_DIR}/docs/Dialect/${dialect_doc_filename}.md)
 add_custom_command(
 OUTPUT ${GEN_DOC_FILE}
diff --git a/mlir/cmake/modules/MLIRConfig.cmake.in 
b/mlir/cmake/modules/MLIRConfig.cmake.in
index 7076d94a32f2b..c695b5787af66 100644
--- a/mlir/cmake/modules/MLIRConfig.cmake.in
+++ b/mlir/cmake/modules/MLIRConfig.cmake.in
@@ -16,10 +16,6 @@ set(MLIR_INSTALL_AGGREGATE_OBJECTS 
"@MLIR_INSTALL_AGGREGATE_OBJECTS@")
 set(MLIR_ENABLE_BINDINGS_PYTHON "@MLIR_ENABLE_BINDINGS_PYTHON@")
 set(MLIR_ENABLE_EXECUTION_ENGINE "@MLIR_ENABLE_EXECUTION_ENGINE@")
 
-# For mlir_tablegen()
-set(MLIR_INCLUDE_DIR "@MLIR_INCLUDE_DIR@")
-set(MLIR_MAIN_SRC_DIR "@MLIR_MAIN_SRC_DIR@")
-
 set_property(GLOBAL PROPERTY MLIR_ALL_LIBS "@MLIR_ALL_LIBS@")
 set_property(GLOBAL PROPERTY MLIR_DIALECT_LIBS "@MLIR_DIALECT_LIBS@")
 set_property(GLOBAL PROPERTY MLIR_CONVERSION_LIBS "@MLIR_CONVERSION_LIBS@")

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [mlir] release/20.x: [mlir][cmake] Do not export MLIR_MAIN_SRC_DIR and MLIR_INCLUDE_DIR (#125842) (PR #127589)

2025-02-18 Thread via llvm-branch-commits


llvmbot wrote:

@joker-eph What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/127589
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [mlir] release/20.x: [mlir][cmake] Do not export MLIR_MAIN_SRC_DIR and MLIR_INCLUDE_DIR (#125842) (PR #127589)

2025-02-18 Thread via llvm-branch-commits


https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/127589
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [mlir] release/20.x: [mlir][cmake] Do not export MLIR_MAIN_SRC_DIR and MLIR_INCLUDE_DIR (#125842) (PR #127589)

2025-02-18 Thread Nikita Popov via llvm-branch-commits


nikic wrote:

Requesting this for backport, because it unbreaks the flang standalone build 
for us.

https://github.com/llvm/llvm-project/pull/127589
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Fix overly conservative immediate operand check (PR #127563)

2025-02-18 Thread Christudasan Devadasan via llvm-branch-commits


https://github.com/cdevadas approved this pull request.


https://github.com/llvm/llvm-project/pull/127563
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [AMDGPU] Add missing gfx architectures to AddFlangOffloadRuntime.cmake (PR #125827)

2025-02-18 Thread Fabian Ritter via llvm-branch-commits


https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/125827

>From 876dba72b049a1c84fceb42f8d3fff772cd6aa9f Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Wed, 5 Feb 2025 04:45:26 -0500
Subject: [PATCH] [AMDGPU] Add missing gfx architectures to
 AddFlangOffloadRuntime.cmake

---
 flang/cmake/modules/AddFlangOffloadRuntime.cmake | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/flang/cmake/modules/AddFlangOffloadRuntime.cmake 
b/flang/cmake/modules/AddFlangOffloadRuntime.cmake
index f1f6eb57c5d6c..eb0e964559ed5 100644
--- a/flang/cmake/modules/AddFlangOffloadRuntime.cmake
+++ b/flang/cmake/modules/AddFlangOffloadRuntime.cmake
@@ -98,10 +98,10 @@ macro(enable_omp_offload_compilation files)
 
   set(all_amdgpu_architectures
 "gfx700;gfx701;gfx801;gfx803;gfx900;gfx902;gfx906"
-"gfx908;gfx90a;gfx90c;gfx942;gfx1010;gfx1030"
+"gfx908;gfx90a;gfx90c;gfx942;gfx950;gfx1010;gfx1030"
 "gfx1031;gfx1032;gfx1033;gfx1034;gfx1035;gfx1036"
 "gfx1100;gfx1101;gfx1102;gfx1103;gfx1150;gfx1151"
-"gfx1152;gfx1153"
+"gfx1152;gfx1153;gfx1200;gfx1201"
 )
   set(all_nvptx_architectures
 "sm_35;sm_37;sm_50;sm_52;sm_53;sm_60;sm_61;sm_62"

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [mlir] [AMDGPU][MLIR] Replace gfx940 and gfx941 with gfx942 in MLIR (PR #125836)

2025-02-18 Thread Fabian Ritter via llvm-branch-commits


https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/125836

>From 6a184d4af1ab15e105155aa0d3463a467e16c89c Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Wed, 5 Feb 2025 05:50:12 -0500
Subject: [PATCH 1/2] [AMDGPU][MLIR] Replace gfx940 and gfx941 with gfx942 in
 MLIR

gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

For SWDEV-512631
---
 mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td |  2 +-
 mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td  |  8 +++
 .../AMDGPUToROCDL/AMDGPUToROCDL.cpp   | 22 +--
 .../ArithToAMDGPU/ArithToAMDGPU.cpp   |  2 +-
 .../AMDGPU/Transforms/EmulateAtomics.cpp  |  8 +--
 .../AMDGPUToROCDL/8-bit-floats.mlir   |  2 +-
 mlir/test/Conversion/AMDGPUToROCDL/mfma.mlir  |  2 +-
 .../ArithToAMDGPU/8-bit-float-saturation.mlir |  2 +-
 .../ArithToAMDGPU/8-bit-floats.mlir   |  2 +-
 .../Dialect/AMDGPU/AMDGPUUtilsTest.cpp| 20 +++--
 10 files changed, 30 insertions(+), 40 deletions(-)

diff --git a/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td 
b/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td
index 69745addfd748..24f541587cba8 100644
--- a/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td
+++ b/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td
@@ -602,7 +602,7 @@ def AMDGPU_MFMAOp :
 order (that is, v[0] will go to arg[7:0], v[1] to arg[15:8] and so on).
 
 The negateA, negateB, and negateC flags are only supported for 
double-precision
-operations on gfx940+.
+operations on gfx942+.
   }];
   let assemblyFormat = [{
 $sourceA `*` $sourceB `+` $destC
diff --git a/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td 
b/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
index 7efa4ffa2aa6f..77401bd6de4bd 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td
@@ -348,11 +348,11 @@ def ROCDL_mfma_f32_16x16x4bf16_1k : 
ROCDL_Mfma_IntrOp<"mfma.f32.16x16x4bf16.1k">
 def ROCDL_mfma_f32_4x4x4bf16_1k : ROCDL_Mfma_IntrOp<"mfma.f32.4x4x4bf16.1k">;
 def ROCDL_mfma_f32_32x32x8bf16_1k : 
ROCDL_Mfma_IntrOp<"mfma.f32.32x32x8bf16.1k">;
 def ROCDL_mfma_f32_16x16x16bf16_1k : 
ROCDL_Mfma_IntrOp<"mfma.f32.16x16x16bf16.1k">;
-// Note: in gfx940, unlike in gfx90a, the f64 xdlops use the "blgp" argument 
as a
-// NEG bitfield. See IntrinsicsAMDGPU.td for more info.
+// Note: in gfx942, unlike in gfx90a, the f64 xdlops use the "blgp" argument as
+// a NEG bitfield. See IntrinsicsAMDGPU.td for more info.
 def ROCDL_mfma_f64_16x16x4f64 : ROCDL_Mfma_IntrOp<"mfma.f64.16x16x4f64">;
 def ROCDL_mfma_f64_4x4x4f64 : ROCDL_Mfma_IntrOp<"mfma.f64.4x4x4f64">;
-// New in gfx940.
+// New in gfx942.
 def ROCDL_mfma_i32_16x16x32_i8 : ROCDL_Mfma_IntrOp<"mfma.i32.16x16x32.i8">;
 def ROCDL_mfma_i32_32x32x16_i8 : ROCDL_Mfma_IntrOp<"mfma.i32.32x32x16.i8">;
 def ROCDL_mfma_f32_16x16x8_xf32 : ROCDL_Mfma_IntrOp<"mfma.f32.16x16x8.xf32">;
@@ -375,7 +375,7 @@ def ROCDL_mfma_f32_32x32x16_f16 : 
ROCDL_Mfma_IntrOp<"mfma.f32.32x32x16.f16">;
 def ROCDL_mfma_scale_f32_16x16x128_f8f6f4 : 
ROCDL_Mfma_OO_IntrOp<"mfma.scale.f32.16x16x128.f8f6f4", [0,1]>;
 def ROCDL_mfma_scale_f32_32x32x64_f8f6f4 : 
ROCDL_Mfma_OO_IntrOp<"mfma.scale.f32.32x32x64.f8f6f4", [0,1]>;
 
-// 2:4 Sparsity ops (GFX940)
+// 2:4 Sparsity ops (GFX942)
 def ROCDL_smfmac_f32_16x16x32_f16 : 
ROCDL_Mfma_IntrOp<"smfmac.f32.16x16x32.f16">;
 def ROCDL_smfmac_f32_32x32x16_f16 : 
ROCDL_Mfma_IntrOp<"smfmac.f32.32x32x16.f16">;
 def ROCDL_smfmac_f32_16x16x32_bf16 : 
ROCDL_Mfma_IntrOp<"smfmac.f32.16x16x32.bf16">;
diff --git a/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp 
b/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
index c62314e504dcc..36fbdbed4ae2f 100644
--- a/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
+++ b/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
@@ -80,7 +80,7 @@ namespace {
 // Define commonly used chipsets versions for convenience.
 constexpr Chipset kGfx908 = Chipset(9, 0, 8);
 constexpr Chipset kGfx90a = Chipset(9, 0, 0xa);
-constexpr Chipset kGfx940 = Chipset(9, 4, 0);
+constexpr Chipset kGfx942 = Chipset(9, 4, 2);
 
 /// Define lowering patterns for raw buffer ops
 template 
@@ -483,7 +483,7 @@ static std::optional mfmaOpToIntrinsic(MFMAOp 
mfma,
 destElem = destType.getElementType();
 
   if (sourceElem.isF32() && destElem.isF32()) {
-if (mfma.getReducePrecision() && chipset >= kGfx940) {
+if (mfma.getReducePrecision() && chipset >= kGfx942) {
   if (m == 32 && n == 32 && k == 4 && b == 1)
 return ROCDL::mfma_f32_32x32x4_xf32::getOperationName();
   if (m == 16 && n == 16 && k == 8 && b == 1)
@@ -551,9 +551,9 @@ static std::optional mfmaOpToIntrinsic(MFMAOp 
mfma,
   return ROCDL::mfma_i32_32x32x8i8::getOperationName();
 if (m == 16 && n == 16 && k == 16 && b == 1)
   return ROCDL::mfma_i32_16x16x16i8::getOperationName();
-if (m == 32 && n == 32 && k == 16 && b == 1 && chipset >= kGfx940)
+

[llvm-branch-commits] [llvm] [AMDGPU][docs] Replace gfx940 and gfx941 with gfx942 in llvm/docs (PR #126887)

2025-02-18 Thread Fabian Ritter via llvm-branch-commits


https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/126887

>From ca4a62030e2586a928e56efc2c71583b63787819 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Wed, 12 Feb 2025 05:45:01 -0500
Subject: [PATCH] [AMDGPU][docs] Replace gfx940 and gfx941 with gfx942 in
 llvm/docs

gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

This PR removes all documentation occurrences of gfx940/gfx941 except
for the gfx940 ISA description, which will be the subject of a separate
PR.

For SWDEV-512631
---
 llvm/docs/AMDGPUOperandSyntax.rst |  4 +-
 llvm/docs/AMDGPUUsage.rst | 97 ++-
 2 files changed, 34 insertions(+), 67 deletions(-)

diff --git a/llvm/docs/AMDGPUOperandSyntax.rst 
b/llvm/docs/AMDGPUOperandSyntax.rst
index ff6ec6cf71ff2..e8a76322fe76a 100644
--- a/llvm/docs/AMDGPUOperandSyntax.rst
+++ b/llvm/docs/AMDGPUOperandSyntax.rst
@@ -63,7 +63,7 @@ Note: *N* and *K* must satisfy the following conditions:
 * 0 <= *K* <= 255.
 * *K-N+1* must be in the range from 1 to 12 or equal to 16 or 32.
 
-GFX90A and GFX940 have an additional alignment requirement:
+GFX90A and GFX942 have an additional alignment requirement:
 pairs of *vector* registers must be even-aligned
 (first register must be even).
 
@@ -183,7 +183,7 @@ Note: *N* and *K* must satisfy the following conditions:
 * 0 <= *K* <= 255.
 * *K-N+1* must be in the range from 1 to 12 or equal to 16 or 32.
 
-GFX90A and GFX940 have an additional alignment requirement:
+GFX90A and GFX942 have an additional alignment requirement:
 pairs of *accumulator* registers must be even-aligned
 (first register must be even).
 
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index d4742bb1eaf09..3b57ea91282ec 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -323,7 +323,7 @@ Every processor supports every OS ABI (see 
:ref:`amdgpu-os`) with the following

 Add product

 names.
 
- **GCN GFX9 (Vega)** [AMD-GCN-GFX900-GFX904-VEGA]_ 
[AMD-GCN-GFX906-VEGA7NM]_ [AMD-GCN-GFX908-CDNA1]_ [AMD-GCN-GFX90A-CDNA2]_ 
[AMD-GCN-GFX940-GFX942-CDNA3]_
+ **GCN GFX9 (Vega)** [AMD-GCN-GFX900-GFX904-VEGA]_ 
[AMD-GCN-GFX906-VEGA7NM]_ [AMD-GCN-GFX908-CDNA1]_ [AMD-GCN-GFX90A-CDNA2]_ 
[AMD-GCN-GFX942-CDNA3]_
  
---
  ``gfx900``  ``amdgcn``   dGPU  - xnack   - 
Absolute  - *rocm-amdhsa* - Radeon Vega
 flat   
   - *pal-amdhsa*Frontier Edition
@@ -378,20 +378,6 @@ Every processor supports every OS ABI (see 
:ref:`amdgpu-os`) with the following

   - Ryzen 3 Pro 4350G

   - Ryzen 3 Pro 4350GE
 
- ``gfx940``  ``amdgcn``   dGPU  - sramecc - 
Architected   *TBA*
-- tgsplit   flat
-- xnack 
scratch   .. TODO::
-- kernarg preload - Packed
-
work-item   Add product
-IDs
 names.
-
- ``gfx941``  ``amdgcn``   dGPU  - sramecc - 
Architected   *TBA*
-- tgsplit   flat
-- xnack 
scratch   .. TODO::
-- kernarg preload - Packed
-
work-item   Add product
-IDs
 names.
-
  ``gfx942``  ``amdgcn``   dGPU  - sramecc - 
Architected   - AMD Instinct MI300X
 - tgsplit   flat   
   - AMD Instinct MI300A
 - xnack scratch
@@ -583,10 +569,10 @@ Generic processor code objects are versioned. See 
:ref:`amdgpu-generic-processor

[llvm-branch-commits] [clang] [AMDGPU][clang] Replace gfx940 and gfx941 with gfx942 in clang (PR #126762)

2025-02-18 Thread Fabian Ritter via llvm-branch-commits


https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/126762

>From ae35c1af2d85a8914a4aa01df3f5dfc64b0561f4 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Tue, 11 Feb 2025 08:52:55 -0500
Subject: [PATCH] [AMDGPU][clang] Replace gfx940 and gfx941 with gfx942 in
 clang

gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

This PR removes all occurrences of gfx940/gfx941 from clang that can be
removed without changes in the llvm directory. The
target-invalid-cpu-note/amdgcn.c test is not included here since it
tests a list of targets that is defined in
llvm/lib/TargetParser/TargetParser.cpp.

For SWDEV-512631
---
 clang/include/clang/Basic/Cuda.h  |   2 -
 clang/lib/Basic/Cuda.cpp  |   2 -
 clang/lib/Basic/Targets/NVPTX.cpp |   2 -
 clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp  |   2 -
 clang/test/CodeGenCUDA/amdgpu-atomic-ops.cu   |   2 +-
 clang/test/CodeGenOpenCL/amdgpu-features.cl   |   4 -
 .../test/CodeGenOpenCL/builtins-amdgcn-fp8.cl |   2 +-
 ...cn-gfx940.cl => builtins-amdgcn-gfx942.cl} |   2 +-
 .../builtins-amdgcn-gfx950-err.cl |   2 +-
 .../builtins-amdgcn-gws-insts.cl  |   2 +-
 .../CodeGenOpenCL/builtins-amdgcn-mfma.cl | 110 +-
 ...fx940.cl => builtins-fp-atomics-gfx942.cl} |  34 +++---
 clang/test/Driver/amdgpu-macros.cl|   2 -
 clang/test/Driver/amdgpu-mcpu.cl  |   4 -
 clang/test/Driver/cuda-bad-arch.cu|   2 +-
 clang/test/Driver/hip-macros.hip  |  10 +-
 .../test/Misc/target-invalid-cpu-note/nvptx.c |   2 -
 ... => builtins-amdgcn-error-gfx942-param.cl} |   2 +-
 .../builtins-amdgcn-error-gfx950.cl   |   2 +-
 ...0-err.cl => builtins-amdgcn-gfx942-err.cl} |  14 +--
 20 files changed, 91 insertions(+), 113 deletions(-)
 rename clang/test/CodeGenOpenCL/{builtins-amdgcn-gfx940.cl => 
builtins-amdgcn-gfx942.cl} (98%)
 rename clang/test/CodeGenOpenCL/{builtins-fp-atomics-gfx940.cl => 
builtins-fp-atomics-gfx942.cl} (84%)
 rename clang/test/SemaOpenCL/{builtins-amdgcn-error-gfx940-param.cl => 
builtins-amdgcn-error-gfx942-param.cl} (99%)
 rename clang/test/SemaOpenCL/{builtins-amdgcn-gfx940-err.cl => 
builtins-amdgcn-gfx942-err.cl} (81%)

diff --git a/clang/include/clang/Basic/Cuda.h b/clang/include/clang/Basic/Cuda.h
index f33ba46233a7a..793cab1f4e84a 100644
--- a/clang/include/clang/Basic/Cuda.h
+++ b/clang/include/clang/Basic/Cuda.h
@@ -106,8 +106,6 @@ enum class OffloadArch {
   GFX90a,
   GFX90c,
   GFX9_4_GENERIC,
-  GFX940,
-  GFX941,
   GFX942,
   GFX950,
   GFX10_1_GENERIC,
diff --git a/clang/lib/Basic/Cuda.cpp b/clang/lib/Basic/Cuda.cpp
index 1bfec0b37c5ee..f45fb0eca3714 100644
--- a/clang/lib/Basic/Cuda.cpp
+++ b/clang/lib/Basic/Cuda.cpp
@@ -124,8 +124,6 @@ static const OffloadArchToStringMap arch_names[] = {
 GFX(90a),  // gfx90a
 GFX(90c),  // gfx90c
 {OffloadArch::GFX9_4_GENERIC, "gfx9-4-generic", "compute_amdgcn"},
-GFX(940),  // gfx940
-GFX(941),  // gfx941
 GFX(942),  // gfx942
 GFX(950),  // gfx950
 {OffloadArch::GFX10_1_GENERIC, "gfx10-1-generic", "compute_amdgcn"},
diff --git a/clang/lib/Basic/Targets/NVPTX.cpp 
b/clang/lib/Basic/Targets/NVPTX.cpp
index 7d13c1f145440..547cf3dfa2be7 100644
--- a/clang/lib/Basic/Targets/NVPTX.cpp
+++ b/clang/lib/Basic/Targets/NVPTX.cpp
@@ -211,8 +211,6 @@ void NVPTXTargetInfo::getTargetDefines(const LangOptions 
&Opts,
   case OffloadArch::GFX90a:
   case OffloadArch::GFX90c:
   case OffloadArch::GFX9_4_GENERIC:
-  case OffloadArch::GFX940:
-  case OffloadArch::GFX941:
   case OffloadArch::GFX942:
   case OffloadArch::GFX950:
   case OffloadArch::GFX10_1_GENERIC:
diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp 
b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
index c13928f61a748..826ec4da8ea28 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -2302,8 +2302,6 @@ void CGOpenMPRuntimeGPU::processRequiresDirective(const 
OMPRequiresDecl *D) {
   case OffloadArch::GFX90a:
   case OffloadArch::GFX90c:
   case OffloadArch::GFX9_4_GENERIC:
-  case OffloadArch::GFX940:
-  case OffloadArch::GFX941:
   case OffloadArch::GFX942:
   case OffloadArch::GFX950:
   case OffloadArch::GFX10_1_GENERIC:
diff --git a/clang/test/CodeGenCUDA/amdgpu-atomic-ops.cu 
b/clang/test/CodeGenCUDA/amdgpu-atomic-ops.cu
index 47fa3967fe237..37fca614c3111 100644
--- a/clang/test/CodeGenCUDA/amdgpu-atomic-ops.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-atomic-ops.cu
@@ -11,7 +11,7 @@
 // RUN:   -fnative-half-arguments-and-returns | FileCheck -check-prefix=SAFE %s
 
 // RUN: %clang_cc1 -x hip %s -O3 -S -o - -triple=amdgcn-amd-amdhsa \
-// RUN:   -fcuda-is-device -target-cpu gfx940 -fnative-half-type \
+// RUN:   -fcuda-is-device -target-cpu gfx942 -fnative-half-type \
 // RUN:   -fnative-half-arguments-and-retur

1 2 >

1 - 100 of 113 matches

Mail list logo