date:20250414

[llvm-branch-commits] [mlir] [MLIR][ArmSVE] Add an ArmSVE dialect operation which maps to svusmmla (PR #135634)

2025-04-14 Thread Andrzej Warzyński via llvm-branch-commits


https://github.com/banach-space edited 
https://github.com/llvm/llvm-project/pull/135634
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [mlir] [MLIR][ArmSVE] Add an ArmSVE dialect operation which maps to svusmmla (PR #135634)

2025-04-14 Thread Andrzej Warzyński via llvm-branch-commits


https://github.com/banach-space approved this pull request.

LGTM % nit

Thanks!

https://github.com/llvm/llvm-project/pull/135634
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [mlir] [MLIR][ArmSVE] Add an ArmSVE dialect operation which maps to svusmmla (PR #135634)

2025-04-14 Thread Andrzej Warzyński via llvm-branch-commits



@@ -273,6 +273,34 @@ def UmmlaOp : ArmSVE_Op<"ummla",
 "$acc `,` $src1 `,` $src2 attr-dict `:` type($src1) `to` type($dst)";
 }
 
+def UsmmlaOp : ArmSVE_Op<"usmmla", [Pure,
+AllTypesMatch<["src1", "src2"]>,
+AllTypesMatch<["acc", "dst"]>]> {

banach-space wrote:

This indentation is inconsistent with the other ops, but the existing 
indentation feels a bit ad-hoc. I like yours much more.

Would you mind updating other definitions so that we do maintain consistency? 
Probably as a separate PR to keep the history clean. Updating this PR instead 
would also be fine with me.

https://github.com/llvm/llvm-project/pull/135634
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] An attempt to cherry-pick the fix PR #132691 (cherry-pick from the main branch to the release/20.x branch) (PR #135231)

2025-04-14 Thread Paul Osmialowski via llvm-branch-commits


pawosm-arm wrote:

Thanks @fhahn for your comment and your patch. It is a good reason for closing 
down this one.


https://github.com/llvm/llvm-project/pull/135231
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [GOFF] Add writing of section symbols (PR #133799)

2025-04-14 Thread Kai Nacke via llvm-branch-commits



@@ -0,0 +1,113 @@
+//===- MCGOFFAttributes.h - Attributes of GOFF symbols 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Defines the various attribute collections defining GOFF symbols.
+//
+//===--===//
+
+#ifndef LLVM_MC_MCGOFFATTRIBUTES_H
+#define LLVM_MC_MCGOFFATTRIBUTES_H
+
+#include "llvm/ADT/StringRef.h"
+#include "llvm/BinaryFormat/GOFF.h"
+
+namespace llvm {
+namespace GOFF {
+// An "External Symbol Definition" in the GOFF file has a type, and depending 
on
+// the type a different subset of the fields is used.
+//
+// Unlike other formats, a 2 dimensional structure is used to define the
+// location of data. For example, the equivalent of the ELF .text section is
+// made up of a Section Definition (SD) and a class (Element Definition; ED).
+// The name of the SD symbol depends on the application, while the class has 
the
+// predefined name C_CODE/C_CODE64 in AMODE31 and AMODE64 respectively.
+//
+// Data can be placed into this structure in 2 ways. First, the data (in a text
+// record) can be associated with an ED symbol. To refer to data, a Label
+// Definition (LD) is used to give an offset into the data a name. When 
binding,
+// the whole data is pulled into the resulting executable, and the addresses
+// given by the LD symbols are resolved.
+//
+// The alternative is to use a Part Definition (PR). In this case, the data (in
+// a text record) is associated with the part. When binding, only the data of
+// referenced PRs is pulled into the resulting binary.
+//
+// Both approaches are used, which means that the equivalent of a section in 
ELF
+// results in 3 GOFF symbols, either SD/ED/LD or SD/ED/PR. Moreover, certain
+// sections are fine with just defining SD/ED symbols. The SymbolMapper takes
+// care of all those details.
+
+// Attributes for SD symbols.
+struct SDAttr {
+  GOFF::ESDTaskingBehavior TaskingBehavior = GOFF::ESD_TA_Unspecified;
+  GOFF::ESDBindingScope BindingScope = GOFF::ESD_BSC_Unspecified;
+};
+
+// Attributes for ED symbols.
+struct EDAttr {
+  bool IsReadOnly = false;
+  GOFF::ESDExecutable Executable = GOFF::ESD_EXE_Unspecified;
+  GOFF::ESDAmode Amode;

redstar wrote:

Ok, I removed this. However, it feels like that there is an inconsistency in 
the documentation / implementations. The following HLASM code
```
stdin#C CSECT
C_WSA64 CATTR ALIGN(4),DEFLOAD,NOTEXECUTABLE,PART(a),RMODE(64)
DC  0X
END
```
results in `RMODE(64)` set at the ED symbol, and `AMODE(64)` set at the PR 
symbol.
Setting the Amode on a PR symbol makes sense to me because it is not possible 
to add a LD symbol to the part - this is causing binder errors. To reference 
the part from a different compilation unit, I have to use `a`, thus the PR has 
also some symbol semantics. 

https://github.com/llvm/llvm-project/pull/133799
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [GOFF] Add writing of section symbols (PR #133799)

2025-04-14 Thread Kai Nacke via llvm-branch-commits



@@ -2759,6 +2762,29 @@ MCSection 
*TargetLoweringObjectFileXCOFF::getSectionForLSDA(
 
//===--===//
 TargetLoweringObjectFileGOFF::TargetLoweringObjectFileGOFF() = default;
 
+void TargetLoweringObjectFileGOFF::getModuleMetadata(Module &M) {
+  // Construct the default names for the root SD and the ADA PR symbol.
+  StringRef FileName = sys::path::stem(M.getSourceFileName());
+  if (FileName.size() > 1 && FileName.starts_with('<') &&
+  FileName.ends_with('>'))
+FileName = FileName.substr(1, FileName.size() - 2);
+  DefaultRootSDName = Twine(FileName).concat("#C").str();

redstar wrote:

Yes, my plan is to request an update to the documentation. 

https://github.com/llvm/llvm-project/pull/133799
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [lldb] release/20.x: [lldb] Respect LaunchInfo::SetExecutable in ProcessLauncherPosixFork (#133093) (PR #134079)

2025-04-14 Thread Tom Stellard via llvm-branch-commits


tstellar wrote:

Ping @JDevlieghere 

https://github.com/llvm/llvm-project/pull/134079
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [GOFF] Add writing of section symbols (PR #133799)

2025-04-14 Thread Kai Nacke via llvm-branch-commits



@@ -0,0 +1,145 @@
+//===- MCSectionGOFF.cpp - GOFF Code Section Representation 
---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "llvm/MC/MCSectionGOFF.h"
+#include "llvm/BinaryFormat/GOFF.h"
+#include "llvm/Support/raw_ostream.h"
+
+using namespace llvm;
+
+namespace {
+void emitRMode(raw_ostream &OS, GOFF::ESDRmode Rmode, bool UseParenthesis) {
+  if (Rmode != GOFF::ESD_RMODE_None) {
+OS << "RMODE" << (UseParenthesis ? '(' : ' ');
+switch (Rmode) {
+case GOFF::ESD_RMODE_24:
+  OS << "24";
+  break;
+case GOFF::ESD_RMODE_31:
+  OS << "31";
+  break;
+case GOFF::ESD_RMODE_64:
+  OS << "64";
+  break;
+case GOFF::ESD_RMODE_None:
+  break;
+}
+if (UseParenthesis)
+  OS << ')';
+  }
+}
+
+void emitCATTR(raw_ostream &OS, StringRef Name, StringRef ParentName,
+   bool EmitAmodeAndRmode, GOFF::ESDAmode Amode,
+   GOFF::ESDRmode Rmode, GOFF::ESDAlignment Alignment,
+   GOFF::ESDLoadingBehavior LoadBehavior,
+   GOFF::ESDExecutable Executable, bool IsReadOnly,
+   StringRef PartName) {
+  if (EmitAmodeAndRmode && Amode != GOFF::ESD_AMODE_None) {
+OS << ParentName << " AMODE ";
+switch (Amode) {
+case GOFF::ESD_AMODE_24:
+  OS << "24";
+  break;
+case GOFF::ESD_AMODE_31:
+  OS << "31";
+  break;
+case GOFF::ESD_AMODE_ANY:
+  OS << "ANY";
+  break;
+case GOFF::ESD_AMODE_64:
+  OS << "64";
+  break;
+case GOFF::ESD_AMODE_MIN:
+  OS << "ANY64";
+  break;
+case GOFF::ESD_AMODE_None:
+  break;
+}
+OS << "\n";
+  }
+  if (EmitAmodeAndRmode && Rmode != GOFF::ESD_RMODE_None) {
+OS << ParentName << ' ';
+emitRMode(OS, Rmode, /*UseParenthesis=*/false);

redstar wrote:

I changed it, only the `CATTR RMODE` is now emitted.

https://github.com/llvm/llvm-project/pull/133799
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [mlir] [MLIR][ArmSVE] Add initial lowering of vector.contract to SVE `*MMLA` instructions (PR #135636)

2025-04-14 Thread Andrzej Warzyński via llvm-branch-commits


https://github.com/banach-space commented:

Thanks! This one is a bit longer, so I may need to wait till Thursday before I 
can review.

One high-level question - would sharing some code between NEON and SVE be 
possible?

https://github.com/llvm/llvm-project/pull/135636
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [LV] An attempt to cherry-pick the fix PR #132691 (cherry-pick from the main branch to the release/20.x branch) (PR #135231)

2025-04-14 Thread Paul Osmialowski via llvm-branch-commits


https://github.com/pawosm-arm closed 
https://github.com/llvm/llvm-project/pull/135231
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] 2e7710e - [clang] Introduce "binary" StringLiteral for #embed data (#127629)

2025-04-14 Thread Tom Stellard via llvm-branch-commits


Author: Mariya Podchishchaeva
Date: 2025-04-14T12:26:02-07:00
New Revision: 2e7710eaffddcbb6094e32826ec6e69bb4cb1799

URL: 
https://github.com/llvm/llvm-project/commit/2e7710eaffddcbb6094e32826ec6e69bb4cb1799
DIFF: 
https://github.com/llvm/llvm-project/commit/2e7710eaffddcbb6094e32826ec6e69bb4cb1799.diff

LOG: [clang] Introduce "binary" StringLiteral for #embed data (#127629)

StringLiteral is used as internal data of EmbedExpr and we directly use
it as an initializer if a single EmbedExpr appears in the initializer
list of a char array. It is fast and convenient, but it is causing
problems when string literal character values are checked because #embed
data values are within a range [0-2^(char width)] but ordinary
StringLiteral is of maybe signed char type.
This PR introduces new kind of StringLiteral to hold binary data coming
from an embedded resource to mitigate these problems. The new kind of
StringLiteral is not assumed to have signed char type. The new kind of
StringLiteral also helps to prevent crashes when trying to find
StringLiteral token locations since these simply do not exist for binary
data.

Fixes https://github.com/llvm/llvm-project/issues/119256

Added: 
clang/test/Preprocessor/embed_constexpr.c

Modified: 
clang/include/clang/AST/Expr.h
clang/lib/AST/Expr.cpp
clang/lib/Parse/ParseInit.cpp
clang/lib/Sema/SemaInit.cpp

Removed: 




diff  --git a/clang/include/clang/AST/Expr.h b/clang/include/clang/AST/Expr.h
index 7be4022649329..06ac0f1704aa9 100644
--- a/clang/include/clang/AST/Expr.h
+++ b/clang/include/clang/AST/Expr.h
@@ -1752,7 +1752,14 @@ enum class StringLiteralKind {
   UTF8,
   UTF16,
   UTF32,
-  Unevaluated
+  Unevaluated,
+  // Binary kind of string literal is used for the data coming via #embed
+  // directive. File's binary contents is transformed to a special kind of
+  // string literal that in some cases may be used directly as an initializer
+  // and some features of classic string literals are not applicable to this
+  // kind of a string literal, for example finding a particular byte's source
+  // location for better diagnosing.
+  Binary
 };
 
 /// StringLiteral - This represents a string literal expression, e.g. "foo"
@@ -1884,6 +1891,8 @@ class StringLiteral final
   int64_t getCodeUnitS(size_t I, uint64_t BitWidth) const {
 int64_t V = getCodeUnit(I);
 if (isOrdinary() || isWide()) {
+  // Ordinary and wide string literals have types that can be signed.
+  // It is important for checking C23 constexpr initializers.
   unsigned Width = getCharByteWidth() * BitWidth;
   llvm::APInt AInt(Width, (uint64_t)V);
   V = AInt.getSExtValue();
@@ -4965,9 +4974,9 @@ class EmbedExpr final : public Expr {
   assert(EExpr && CurOffset != ULLONG_MAX &&
  "trying to dereference an invalid iterator");
   IntegerLiteral *N = EExpr->FakeChildNode;
-  StringRef DataRef = EExpr->Data->BinaryData->getBytes();
   N->setValue(*EExpr->Ctx,
-  llvm::APInt(N->getValue().getBitWidth(), DataRef[CurOffset],
+  llvm::APInt(N->getValue().getBitWidth(),
+  EExpr->Data->BinaryData->getCodeUnit(CurOffset),
   N->getType()->isSignedIntegerType()));
   // We want to return a reference to the fake child node in the
   // EmbedExpr, not the local variable N.

diff  --git a/clang/lib/AST/Expr.cpp b/clang/lib/AST/Expr.cpp
index aa7e14329a21b..8571b617c70eb 100644
--- a/clang/lib/AST/Expr.cpp
+++ b/clang/lib/AST/Expr.cpp
@@ -1104,6 +1104,7 @@ unsigned StringLiteral::mapCharByteWidth(TargetInfo const 
&Target,
   switch (SK) {
   case StringLiteralKind::Ordinary:
   case StringLiteralKind::UTF8:
+  case StringLiteralKind::Binary:
 CharByteWidth = Target.getCharWidth();
 break;
   case StringLiteralKind::Wide:
@@ -1216,6 +1217,7 @@ void StringLiteral::outputString(raw_ostream &OS) const {
   switch (getKind()) {
   case StringLiteralKind::Unevaluated:
   case StringLiteralKind::Ordinary:
+  case StringLiteralKind::Binary:
 break; // no prefix.
   case StringLiteralKind::Wide:
 OS << 'L';
@@ -1332,6 +1334,11 @@ StringLiteral::getLocationOfByte(unsigned ByteNo, const 
SourceManager &SM,
  const LangOptions &Features,
  const TargetInfo &Target, unsigned 
*StartToken,
  unsigned *StartTokenByteOffset) const {
+  // No source location of bytes for binary literals since they don't come from
+  // source.
+  if (getKind() == StringLiteralKind::Binary)
+return getStrTokenLoc(0);
+
   assert((getKind() == StringLiteralKind::Ordinary ||
   getKind() == StringLiteralKind::UTF8 ||
   getKind() == StringLiteralKind::Unevaluated) &&

diff  --git a/clang/lib/Parse/ParseInit.cpp b/clang/lib/Parse/ParseInit.cpp
index 63b1d7bd9db53..471b3eaf

[llvm-branch-commits] [clang] [X86] Backport saturate-convert intrinsics renaming & YMM rounding intrinsics removal in AVX10.2 (PR #135549)

2025-04-14 Thread via llvm-branch-commits


github-actions[bot] wrote:

@phoebewang (or anyone else). If you would like to add a note about this fix in 
the release notes (completely optional). Please reply to this comment with a 
one or two sentence description of the fix.  When you are done, please add the 
release:note label to this PR. 

https://github.com/llvm/llvm-project/pull/135549
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/20.x: [LLVM][MemCpyOpt] Unify alias tags if we optimize allocas (#129537) (PR #135615)

2025-04-14 Thread Tom Stellard via llvm-branch-commits


tstellar wrote:

@nikic Was an approver for the PR in main.

https://github.com/llvm/llvm-project/pull/135615
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/20.x: [Clang] Fix a lambda pattern comparison mismatch after ecc7e6ce4 (#133863) (PR #134194)

2025-04-14 Thread Tom Stellard via llvm-branch-commits


tstellar wrote:

Has this fix had enough time in main yet?

https://github.com/llvm/llvm-project/pull/134194
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/20.x: [llvm][CodeGen] avoid repeated interval calculation in window scheduler (#132352) (PR #134775)

2025-04-14 Thread Tom Stellard via llvm-branch-commits


https://github.com/tstellar closed 
https://github.com/llvm/llvm-project/pull/134775
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/20.x: [SCEV] Use ashr to adjust constant multipliers (#135534) (PR #135543)

2025-04-14 Thread via llvm-branch-commits


https://github.com/llvmbot updated 
https://github.com/llvm/llvm-project/pull/135543

>From 0dd4235473d4f5a99c46ea631351616d62e9b32e Mon Sep 17 00:00:00 2001
From: Yingwei Zheng 
Date: Sun, 13 Apr 2025 20:22:48 +0800
Subject: [PATCH] [SCEV] Use ashr to adjust constant multipliers (#135534)

SCEV converts "-2 *nsw (i32 V)" into "2148473647 *nsw (i32 V)". But we
cannot preserve the nsw flag when the constant multiplier is negative.
This patch changes lshr to ashr so that we can preserve both nsw and nuw
flags.

Alive2 proof: https://alive2.llvm.org/ce/z/LZVSEa
Closes https://github.com/llvm/llvm-project/issues/135531.

(cherry picked from commit bb9580a02b393683ff0b6c360df684f33c715a1f)
---
 llvm/lib/Analysis/ScalarEvolution.cpp |  2 +-
 .../test/Analysis/ScalarEvolution/pr135531.ll | 19 +++
 2 files changed, 20 insertions(+), 1 deletion(-)
 create mode 100644 llvm/test/Analysis/ScalarEvolution/pr135531.ll

diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp 
b/llvm/lib/Analysis/ScalarEvolution.cpp
index b8069df4e6598..36fe036aa9e9f 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -7854,7 +7854,7 @@ const SCEV *ScalarEvolution::createSCEV(Value *V) {
   unsigned GCD = std::min(MulZeros, TZ);
   APInt DivAmt = APInt::getOneBitSet(BitWidth, TZ - GCD);
   SmallVector MulOps;
-  MulOps.push_back(getConstant(OpC->getAPInt().lshr(GCD)));
+  MulOps.push_back(getConstant(OpC->getAPInt().ashr(GCD)));
   append_range(MulOps, LHSMul->operands().drop_front());
   auto *NewMul = getMulExpr(MulOps, LHSMul->getNoWrapFlags());
   ShiftedLHS = getUDivExpr(NewMul, getConstant(DivAmt));
diff --git a/llvm/test/Analysis/ScalarEvolution/pr135531.ll 
b/llvm/test/Analysis/ScalarEvolution/pr135531.ll
new file mode 100644
index 0..e172d56d3a515
--- /dev/null
+++ b/llvm/test/Analysis/ScalarEvolution/pr135531.ll
@@ -0,0 +1,19 @@
+; NOTE: Assertions have been autogenerated by 
utils/update_analyze_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -disable-output -passes='print' < %s 2>&1 | 
FileCheck %s
+
+define i32 @pr135511(i32 %x) {
+; CHECK-LABEL: 'pr135511'
+; CHECK-NEXT:  Classifying expressions for: @pr135511
+; CHECK-NEXT:%and = and i32 %x, 16382
+; CHECK-NEXT:--> (2 * (zext i13 (trunc i32 (%x /u 2) to i13) to 
i32)) U: [0,16383) S: [0,16383)
+; CHECK-NEXT:%neg = sub nsw i32 0, %and
+; CHECK-NEXT:--> (-2 * (zext i13 (trunc i32 (%x /u 2) to i13) to 
i32)) U: [0,-1) S: [-16382,1)
+; CHECK-NEXT:%res = and i32 %neg, 268431360
+; CHECK-NEXT:--> (4096 * (zext i16 (trunc i32 ((-1 * (zext i13 (trunc i32 
(%x /u 2) to i13) to i32)) /u 2048) to i16) to i32)) U: 
[0,268431361) S: [0,268431361)
+; CHECK-NEXT:  Determining loop execution counts for: @pr135511
+;
+  %and = and i32 %x, 16382
+  %neg = sub nsw i32 0, %and
+  %res = and i32 %neg, 268431360
+  ret i32 %res
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [SPARC] Use lzcnt to implement CTLZ when we have VIS3 (PR #135715)

2025-04-14 Thread Sergei Barannikov via llvm-branch-commits



@@ -0,0 +1,171 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=sparcv9 | FileCheck %s -check-prefix=V9
+; RUN: llc < %s -mtriple=sparcv9 -mattr=popc | FileCheck %s -check-prefix=POPC
+; RUN: llc < %s -mtriple=sparcv9 -mattr=vis3 | FileCheck %s -check-prefix=VIS3
+
+define i32 @f(i32 %x) nounwind {
+; V9-LABEL: f:
+; V9:   ! %bb.0: ! %entry
+; V9-NEXT:srl %o0, 1, %o1
+; V9-NEXT:or %o0, %o1, %o1
+; V9-NEXT:srl %o1, 2, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:srl %o1, 4, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:srl %o1, 8, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:srl %o1, 16, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:xor %o1, -1, %o1
+; V9-NEXT:srl %o1, 1, %o2
+; V9-NEXT:sethi 1398101, %o3
+; V9-NEXT:or %o3, 341, %o3
+; V9-NEXT:and %o2, %o3, %o2
+; V9-NEXT:sub %o1, %o2, %o1
+; V9-NEXT:sethi 838860, %o2
+; V9-NEXT:or %o2, 819, %o2
+; V9-NEXT:and %o1, %o2, %o3
+; V9-NEXT:srl %o1, 2, %o1
+; V9-NEXT:and %o1, %o2, %o1
+; V9-NEXT:add %o3, %o1, %o1
+; V9-NEXT:srl %o1, 4, %o2
+; V9-NEXT:add %o1, %o2, %o1
+; V9-NEXT:sethi 246723, %o2
+; V9-NEXT:or %o2, 783, %o2
+; V9-NEXT:and %o1, %o2, %o1
+; V9-NEXT:sll %o1, 8, %o2
+; V9-NEXT:add %o1, %o2, %o1
+; V9-NEXT:sll %o1, 16, %o2
+; V9-NEXT:add %o1, %o2, %o1
+; V9-NEXT:srl %o1, 24, %o1
+; V9-NEXT:cmp %o0, 0
+; V9-NEXT:move %icc, 0, %o1
+; V9-NEXT:retl
+; V9-NEXT:mov %o1, %o0
+;
+; POPC-LABEL: f:
+; POPC:   ! %bb.0: ! %entry
+; POPC-NEXT:srl %o0, 1, %o1
+; POPC-NEXT:or %o0, %o1, %o1
+; POPC-NEXT:srl %o1, 2, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:srl %o1, 4, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:srl %o1, 8, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:srl %o1, 16, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:xor %o1, -1, %o1
+; POPC-NEXT:srl %o1, 0, %o1
+; POPC-NEXT:popc %o1, %o1
+; POPC-NEXT:cmp %o0, 0
+; POPC-NEXT:move %icc, 0, %o1
+; POPC-NEXT:retl
+; POPC-NEXT:mov %o1, %o0
+;
+; VIS3-LABEL: f:
+; VIS3:   ! %bb.0: ! %entry
+; VIS3-NEXT:srl %o0, 0, %o1
+; VIS3-NEXT:lzcnt %o1, %o1
+; VIS3-NEXT:add %o1, -32, %o1
+; VIS3-NEXT:cmp %o0, 0
+; VIS3-NEXT:move %icc, 0, %o1
+; VIS3-NEXT:retl
+; VIS3-NEXT:mov %o1, %o0
+entry:
+  %0 = call i32 @llvm.ctlz.i32(i32 %x, i1 true)
+  %1 = icmp eq i32 %x, 0
+  %2 = select i1 %1, i32 0, i32 %0
+  %3 = trunc i32 %2 to i8
+  %conv = zext i8 %3 to i32
+  ret i32 %conv
+}
+
+define i64 @g(i64 %x) nounwind {
+; V9-LABEL: g:
+; V9:   ! %bb.0: ! %entry
+; V9-NEXT:srlx %o0, 1, %o1
+; V9-NEXT:or %o0, %o1, %o1
+; V9-NEXT:srlx %o1, 2, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:srlx %o1, 4, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:srlx %o1, 8, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:srlx %o1, 16, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:srlx %o1, 32, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:xor %o1, -1, %o1
+; V9-NEXT:srlx %o1, 1, %o2
+; V9-NEXT:sethi 1398101, %o3
+; V9-NEXT:or %o3, 341, %o3
+; V9-NEXT:sllx %o3, 32, %o4
+; V9-NEXT:or %o4, %o3, %o3
+; V9-NEXT:and %o2, %o3, %o2
+; V9-NEXT:sub %o1, %o2, %o1
+; V9-NEXT:sethi 838860, %o2
+; V9-NEXT:or %o2, 819, %o2
+; V9-NEXT:sllx %o2, 32, %o3
+; V9-NEXT:or %o3, %o2, %o2
+; V9-NEXT:and %o1, %o2, %o3
+; V9-NEXT:srlx %o1, 2, %o1
+; V9-NEXT:and %o1, %o2, %o1
+; V9-NEXT:add %o3, %o1, %o1
+; V9-NEXT:srlx %o1, 4, %o2
+; V9-NEXT:add %o1, %o2, %o1
+; V9-NEXT:sethi 246723, %o2
+; V9-NEXT:or %o2, 783, %o2
+; V9-NEXT:sllx %o2, 32, %o3
+; V9-NEXT:or %o3, %o2, %o2
+; V9-NEXT:and %o1, %o2, %o1
+; V9-NEXT:sethi 16448, %o2
+; V9-NEXT:or %o2, 257, %o2
+; V9-NEXT:sllx %o2, 32, %o3
+; V9-NEXT:or %o3, %o2, %o2
+; V9-NEXT:mulx %o1, %o2, %o1
+; V9-NEXT:srlx %o1, 56, %o1
+; V9-NEXT:movrz %o0, 0, %o1
+; V9-NEXT:retl
+; V9-NEXT:mov %o1, %o0
+;
+; POPC-LABEL: g:
+; POPC:   ! %bb.0: ! %entry
+; POPC-NEXT:srlx %o0, 1, %o1
+; POPC-NEXT:or %o0, %o1, %o1
+; POPC-NEXT:srlx %o1, 2, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:srlx %o1, 4, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:srlx %o1, 8, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:srlx %o1, 16, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:srlx %o1, 32, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:xor %o1, -1, %o1
+; POPC-NEXT:popc %o1, %o1
+; POPC-NEXT:movrz %o0, 0, %o1
+; POPC-NEXT:retl
+; POPC-NEXT:mov %o1, %o0
+;
+; VIS3-LABEL: g:
+; VIS3:   ! %bb.0: ! %entry
+; VIS3-NEXT:lzcnt %o0, %o1
+; VIS3-NEXT:movrz %o0, 0, %o1
+; VIS3-NEXT:retl
+; VIS3-NEXT:mov %o1, %o0
+entry:
+  %0 = call i64 @llvm.ctlz.i64(i64 %x, i1 true)
+  %1 = icmp eq i64 %x, 0
+  %

[llvm-branch-commits] [SPARC] Use lzcnt to implement CTLZ when we have VIS3 (PR #135715)

2025-04-14 Thread Sergei Barannikov via llvm-branch-commits



@@ -0,0 +1,171 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=sparcv9 | FileCheck %s -check-prefix=V9
+; RUN: llc < %s -mtriple=sparcv9 -mattr=popc | FileCheck %s -check-prefix=POPC
+; RUN: llc < %s -mtriple=sparcv9 -mattr=vis3 | FileCheck %s -check-prefix=VIS3
+
+define i32 @f(i32 %x) nounwind {
+; V9-LABEL: f:
+; V9:   ! %bb.0: ! %entry
+; V9-NEXT:srl %o0, 1, %o1
+; V9-NEXT:or %o0, %o1, %o1
+; V9-NEXT:srl %o1, 2, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:srl %o1, 4, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:srl %o1, 8, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:srl %o1, 16, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:xor %o1, -1, %o1
+; V9-NEXT:srl %o1, 1, %o2
+; V9-NEXT:sethi 1398101, %o3
+; V9-NEXT:or %o3, 341, %o3
+; V9-NEXT:and %o2, %o3, %o2
+; V9-NEXT:sub %o1, %o2, %o1
+; V9-NEXT:sethi 838860, %o2
+; V9-NEXT:or %o2, 819, %o2
+; V9-NEXT:and %o1, %o2, %o3
+; V9-NEXT:srl %o1, 2, %o1
+; V9-NEXT:and %o1, %o2, %o1
+; V9-NEXT:add %o3, %o1, %o1
+; V9-NEXT:srl %o1, 4, %o2
+; V9-NEXT:add %o1, %o2, %o1
+; V9-NEXT:sethi 246723, %o2
+; V9-NEXT:or %o2, 783, %o2
+; V9-NEXT:and %o1, %o2, %o1
+; V9-NEXT:sll %o1, 8, %o2
+; V9-NEXT:add %o1, %o2, %o1
+; V9-NEXT:sll %o1, 16, %o2
+; V9-NEXT:add %o1, %o2, %o1
+; V9-NEXT:srl %o1, 24, %o1
+; V9-NEXT:cmp %o0, 0
+; V9-NEXT:move %icc, 0, %o1
+; V9-NEXT:retl
+; V9-NEXT:mov %o1, %o0
+;
+; POPC-LABEL: f:
+; POPC:   ! %bb.0: ! %entry
+; POPC-NEXT:srl %o0, 1, %o1
+; POPC-NEXT:or %o0, %o1, %o1
+; POPC-NEXT:srl %o1, 2, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:srl %o1, 4, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:srl %o1, 8, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:srl %o1, 16, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:xor %o1, -1, %o1
+; POPC-NEXT:srl %o1, 0, %o1
+; POPC-NEXT:popc %o1, %o1
+; POPC-NEXT:cmp %o0, 0
+; POPC-NEXT:move %icc, 0, %o1
+; POPC-NEXT:retl
+; POPC-NEXT:mov %o1, %o0
+;
+; VIS3-LABEL: f:
+; VIS3:   ! %bb.0: ! %entry
+; VIS3-NEXT:srl %o0, 0, %o1
+; VIS3-NEXT:lzcnt %o1, %o1
+; VIS3-NEXT:add %o1, -32, %o1
+; VIS3-NEXT:cmp %o0, 0
+; VIS3-NEXT:move %icc, 0, %o1
+; VIS3-NEXT:retl
+; VIS3-NEXT:mov %o1, %o0
+entry:
+  %0 = call i32 @llvm.ctlz.i32(i32 %x, i1 true)
+  %1 = icmp eq i32 %x, 0
+  %2 = select i1 %1, i32 0, i32 %0
+  %3 = trunc i32 %2 to i8
+  %conv = zext i8 %3 to i32
+  ret i32 %conv
+}
+
+define i64 @g(i64 %x) nounwind {
+; V9-LABEL: g:
+; V9:   ! %bb.0: ! %entry
+; V9-NEXT:srlx %o0, 1, %o1
+; V9-NEXT:or %o0, %o1, %o1
+; V9-NEXT:srlx %o1, 2, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:srlx %o1, 4, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:srlx %o1, 8, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:srlx %o1, 16, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:srlx %o1, 32, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:xor %o1, -1, %o1
+; V9-NEXT:srlx %o1, 1, %o2
+; V9-NEXT:sethi 1398101, %o3
+; V9-NEXT:or %o3, 341, %o3
+; V9-NEXT:sllx %o3, 32, %o4
+; V9-NEXT:or %o4, %o3, %o3
+; V9-NEXT:and %o2, %o3, %o2
+; V9-NEXT:sub %o1, %o2, %o1
+; V9-NEXT:sethi 838860, %o2
+; V9-NEXT:or %o2, 819, %o2
+; V9-NEXT:sllx %o2, 32, %o3
+; V9-NEXT:or %o3, %o2, %o2
+; V9-NEXT:and %o1, %o2, %o3
+; V9-NEXT:srlx %o1, 2, %o1
+; V9-NEXT:and %o1, %o2, %o1
+; V9-NEXT:add %o3, %o1, %o1
+; V9-NEXT:srlx %o1, 4, %o2
+; V9-NEXT:add %o1, %o2, %o1
+; V9-NEXT:sethi 246723, %o2
+; V9-NEXT:or %o2, 783, %o2
+; V9-NEXT:sllx %o2, 32, %o3
+; V9-NEXT:or %o3, %o2, %o2
+; V9-NEXT:and %o1, %o2, %o1
+; V9-NEXT:sethi 16448, %o2
+; V9-NEXT:or %o2, 257, %o2
+; V9-NEXT:sllx %o2, 32, %o3
+; V9-NEXT:or %o3, %o2, %o2
+; V9-NEXT:mulx %o1, %o2, %o1
+; V9-NEXT:srlx %o1, 56, %o1
+; V9-NEXT:movrz %o0, 0, %o1
+; V9-NEXT:retl
+; V9-NEXT:mov %o1, %o0
+;
+; POPC-LABEL: g:
+; POPC:   ! %bb.0: ! %entry
+; POPC-NEXT:srlx %o0, 1, %o1
+; POPC-NEXT:or %o0, %o1, %o1
+; POPC-NEXT:srlx %o1, 2, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:srlx %o1, 4, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:srlx %o1, 8, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:srlx %o1, 16, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:srlx %o1, 32, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:xor %o1, -1, %o1
+; POPC-NEXT:popc %o1, %o1
+; POPC-NEXT:movrz %o0, 0, %o1
+; POPC-NEXT:retl
+; POPC-NEXT:mov %o1, %o0
+;
+; VIS3-LABEL: g:
+; VIS3:   ! %bb.0: ! %entry
+; VIS3-NEXT:lzcnt %o0, %o1
+; VIS3-NEXT:movrz %o0, 0, %o1
+; VIS3-NEXT:retl
+; VIS3-NEXT:mov %o1, %o0
+entry:
+  %0 = call i64 @llvm.ctlz.i64(i64 %x, i1 true)
+  %1 = icmp eq i64 %x, 0
+  %

[llvm-branch-commits] [SPARC] Use lzcnt to implement CTLZ when we have VIS3 (PR #135715)

2025-04-14 Thread Sergei Barannikov via llvm-branch-commits



@@ -303,4 +303,10 @@ def : Pat<(i64 (mulhs i64:$lhs, i64:$rhs)),
   (SUBrr (UMULXHI $lhs, $rhs),
  (ADDrr (ANDrr (SRAXri $lhs, 63), $rhs),
 (ANDrr (SRAXri $rhs, 63), $lhs)))>;
+
+def : Pat<(i64 (ctlz i64:$src)), (LZCNT $src)>;
+// 32-bit LZCNT.
+// The zero extension will leave us with 32 extra leading zeros,
+// so we need to compensate for it.
+def : Pat<(i32 (ctlz i32:$src)), (ADDri (LZCNT (SRLri $src, 0)), (i32 -32))>;

s-barannikov wrote:

It may make sense to `Promote` 32-bit ctlz. IIUC DAG type legalizer does the 
same expansion, but has a benefit of potentially optimizing `shr` and `add` 
with outer instructions during DAG combining phase.


https://github.com/llvm/llvm-project/pull/135715
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [SPARC] Use lzcnt to implement CTLZ when we have VIS3 (PR #135715)

2025-04-14 Thread Sergei Barannikov via llvm-branch-commits



@@ -0,0 +1,171 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=sparcv9 | FileCheck %s -check-prefix=V9
+; RUN: llc < %s -mtriple=sparcv9 -mattr=popc | FileCheck %s -check-prefix=POPC
+; RUN: llc < %s -mtriple=sparcv9 -mattr=vis3 | FileCheck %s -check-prefix=VIS3
+
+define i32 @f(i32 %x) nounwind {
+; V9-LABEL: f:
+; V9:   ! %bb.0: ! %entry
+; V9-NEXT:srl %o0, 1, %o1
+; V9-NEXT:or %o0, %o1, %o1
+; V9-NEXT:srl %o1, 2, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:srl %o1, 4, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:srl %o1, 8, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:srl %o1, 16, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:xor %o1, -1, %o1
+; V9-NEXT:srl %o1, 1, %o2
+; V9-NEXT:sethi 1398101, %o3
+; V9-NEXT:or %o3, 341, %o3
+; V9-NEXT:and %o2, %o3, %o2
+; V9-NEXT:sub %o1, %o2, %o1
+; V9-NEXT:sethi 838860, %o2
+; V9-NEXT:or %o2, 819, %o2
+; V9-NEXT:and %o1, %o2, %o3
+; V9-NEXT:srl %o1, 2, %o1
+; V9-NEXT:and %o1, %o2, %o1
+; V9-NEXT:add %o3, %o1, %o1
+; V9-NEXT:srl %o1, 4, %o2
+; V9-NEXT:add %o1, %o2, %o1
+; V9-NEXT:sethi 246723, %o2
+; V9-NEXT:or %o2, 783, %o2
+; V9-NEXT:and %o1, %o2, %o1
+; V9-NEXT:sll %o1, 8, %o2
+; V9-NEXT:add %o1, %o2, %o1
+; V9-NEXT:sll %o1, 16, %o2
+; V9-NEXT:add %o1, %o2, %o1
+; V9-NEXT:srl %o1, 24, %o1
+; V9-NEXT:cmp %o0, 0
+; V9-NEXT:move %icc, 0, %o1
+; V9-NEXT:retl
+; V9-NEXT:mov %o1, %o0
+;
+; POPC-LABEL: f:
+; POPC:   ! %bb.0: ! %entry
+; POPC-NEXT:srl %o0, 1, %o1
+; POPC-NEXT:or %o0, %o1, %o1
+; POPC-NEXT:srl %o1, 2, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:srl %o1, 4, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:srl %o1, 8, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:srl %o1, 16, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:xor %o1, -1, %o1
+; POPC-NEXT:srl %o1, 0, %o1
+; POPC-NEXT:popc %o1, %o1
+; POPC-NEXT:cmp %o0, 0
+; POPC-NEXT:move %icc, 0, %o1
+; POPC-NEXT:retl
+; POPC-NEXT:mov %o1, %o0
+;
+; VIS3-LABEL: f:
+; VIS3:   ! %bb.0: ! %entry
+; VIS3-NEXT:srl %o0, 0, %o1
+; VIS3-NEXT:lzcnt %o1, %o1
+; VIS3-NEXT:add %o1, -32, %o1
+; VIS3-NEXT:cmp %o0, 0
+; VIS3-NEXT:move %icc, 0, %o1
+; VIS3-NEXT:retl
+; VIS3-NEXT:mov %o1, %o0
+entry:
+  %0 = call i32 @llvm.ctlz.i32(i32 %x, i1 true)
+  %1 = icmp eq i32 %x, 0
+  %2 = select i1 %1, i32 0, i32 %0
+  %3 = trunc i32 %2 to i8
+  %conv = zext i8 %3 to i32
+  ret i32 %conv
+}
+
+define i64 @g(i64 %x) nounwind {
+; V9-LABEL: g:
+; V9:   ! %bb.0: ! %entry
+; V9-NEXT:srlx %o0, 1, %o1
+; V9-NEXT:or %o0, %o1, %o1
+; V9-NEXT:srlx %o1, 2, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:srlx %o1, 4, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:srlx %o1, 8, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:srlx %o1, 16, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:srlx %o1, 32, %o2
+; V9-NEXT:or %o1, %o2, %o1
+; V9-NEXT:xor %o1, -1, %o1
+; V9-NEXT:srlx %o1, 1, %o2
+; V9-NEXT:sethi 1398101, %o3
+; V9-NEXT:or %o3, 341, %o3
+; V9-NEXT:sllx %o3, 32, %o4
+; V9-NEXT:or %o4, %o3, %o3
+; V9-NEXT:and %o2, %o3, %o2
+; V9-NEXT:sub %o1, %o2, %o1
+; V9-NEXT:sethi 838860, %o2
+; V9-NEXT:or %o2, 819, %o2
+; V9-NEXT:sllx %o2, 32, %o3
+; V9-NEXT:or %o3, %o2, %o2
+; V9-NEXT:and %o1, %o2, %o3
+; V9-NEXT:srlx %o1, 2, %o1
+; V9-NEXT:and %o1, %o2, %o1
+; V9-NEXT:add %o3, %o1, %o1
+; V9-NEXT:srlx %o1, 4, %o2
+; V9-NEXT:add %o1, %o2, %o1
+; V9-NEXT:sethi 246723, %o2
+; V9-NEXT:or %o2, 783, %o2
+; V9-NEXT:sllx %o2, 32, %o3
+; V9-NEXT:or %o3, %o2, %o2
+; V9-NEXT:and %o1, %o2, %o1
+; V9-NEXT:sethi 16448, %o2
+; V9-NEXT:or %o2, 257, %o2
+; V9-NEXT:sllx %o2, 32, %o3
+; V9-NEXT:or %o3, %o2, %o2
+; V9-NEXT:mulx %o1, %o2, %o1
+; V9-NEXT:srlx %o1, 56, %o1
+; V9-NEXT:movrz %o0, 0, %o1
+; V9-NEXT:retl
+; V9-NEXT:mov %o1, %o0
+;
+; POPC-LABEL: g:
+; POPC:   ! %bb.0: ! %entry
+; POPC-NEXT:srlx %o0, 1, %o1
+; POPC-NEXT:or %o0, %o1, %o1
+; POPC-NEXT:srlx %o1, 2, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:srlx %o1, 4, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:srlx %o1, 8, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:srlx %o1, 16, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:srlx %o1, 32, %o2
+; POPC-NEXT:or %o1, %o2, %o1
+; POPC-NEXT:xor %o1, -1, %o1
+; POPC-NEXT:popc %o1, %o1
+; POPC-NEXT:movrz %o0, 0, %o1
+; POPC-NEXT:retl
+; POPC-NEXT:mov %o1, %o0
+;
+; VIS3-LABEL: g:
+; VIS3:   ! %bb.0: ! %entry
+; VIS3-NEXT:lzcnt %o0, %o1
+; VIS3-NEXT:movrz %o0, 0, %o1
+; VIS3-NEXT:retl
+; VIS3-NEXT:mov %o1, %o0
+entry:
+  %0 = call i64 @llvm.ctlz.i64(i64 %x, i1 true)

s-barannikov

[llvm-branch-commits] [SPARC] Use umulxhi to do extending 64x64->128 multiply when we have VIS3 (PR #135714)

2025-04-14 Thread Sergei Barannikov via llvm-branch-commits



@@ -294,4 +294,13 @@ def : Pat<(f32 fpnegimm0), (FNEGS (FZEROS))>;
 // VIS3 instruction patterns.
 let Predicates = [HasVIS3] in {
 def : Pat<(i64 (adde i64:$lhs, i64:$rhs)), (ADDXCCC $lhs, $rhs)>;
+
+def : Pat<(i64 (mulhu i64:$lhs, i64:$rhs)), (UMULXHI $lhs, $rhs)>;
+// Signed "MULXHI".
+// Based on the formula presented in OSA2011 §7.140, but with bitops to select
+// the values to be added.
+def : Pat<(i64 (mulhs i64:$lhs, i64:$rhs)),
+  (SUBrr (UMULXHI $lhs, $rhs),
+ (ADDrr (ANDrr (SRAXri $lhs, 63), $rhs),
+(ANDrr (SRAXri $rhs, 63), $lhs)))>;

s-barannikov wrote:

Does it produce better code than setting MULHS to Expand?

https://github.com/llvm/llvm-project/pull/135714
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [llvm][IR] Treat memcmp and bcmp as libcalls (PR #135706)

2025-04-14 Thread Paul Kirth via llvm-branch-commits


https://github.com/ilovepi created 
https://github.com/llvm/llvm-project/pull/135706

Since the backend may emit calls to these functions, they should be
treated like other libcalls. If we don't, then it is possible to
have their definitions removed during LTO because they are dead, only to
have a later transform introduce calls to them.

See 
https://discourse.llvm.org/t/rfc-addressing-deficiencies-in-llvm-s-lto-implementation/84999
for more information.

>From af02216b9358166b635335491934ff44cdcc89a5 Mon Sep 17 00:00:00 2001
From: Paul Kirth 
Date: Mon, 14 Apr 2025 08:25:15 -0700
Subject: [PATCH] [llvm][IR] Treat memcmp and bcmp as libcalls

Since the backend may emit calls to these functions, they should be
treated like other libcalls. If we don't, then it is possible to
have their definitions removed during LTO because they are dead, only to
have a later transform introduce calls to them.

See 
https://discourse.llvm.org/t/rfc-addressing-deficiencies-in-llvm-s-lto-implementation/84999
for more information.
---
 llvm/include/llvm/IR/RuntimeLibcalls.def   | 2 ++
 llvm/test/LTO/Resolution/RISCV/bcmp-libcall.ll | 3 +--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.def 
b/llvm/include/llvm/IR/RuntimeLibcalls.def
index 2545aebc73391..2c72bc8c012cc 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.def
+++ b/llvm/include/llvm/IR/RuntimeLibcalls.def
@@ -513,6 +513,8 @@ HANDLE_LIBCALL(UO_PPCF128, "__gcc_qunord")
 HANDLE_LIBCALL(MEMCPY, "memcpy")
 HANDLE_LIBCALL(MEMMOVE, "memmove")
 HANDLE_LIBCALL(MEMSET, "memset")
+HANDLE_LIBCALL(MEMCMP, "memcmp")
+HANDLE_LIBCALL(BCMP, "bcmp")
 // DSEPass can emit calloc if it finds a pair of malloc/memset
 HANDLE_LIBCALL(CALLOC, "calloc")
 HANDLE_LIBCALL(BZERO, nullptr)
diff --git a/llvm/test/LTO/Resolution/RISCV/bcmp-libcall.ll 
b/llvm/test/LTO/Resolution/RISCV/bcmp-libcall.ll
index 4c6bebf69a074..80421cd9350c8 100644
--- a/llvm/test/LTO/Resolution/RISCV/bcmp-libcall.ll
+++ b/llvm/test/LTO/Resolution/RISCV/bcmp-libcall.ll
@@ -29,8 +29,7 @@ define i1 @foo(ptr %0, [2 x i32] %1) {
 declare i32 @memcmp(ptr, ptr, i32)
 
 ;; Ensure bcmp is removed from module. Follow up patches can address this.
-; INTERNALIZE-NOT: declare{{.*}}i32 @bcmp
-; INTERNALIZE-NOT: define{{.*}}i32 @bcmp
+; INTERNALIZE: define{{.*}}i32 @bcmp
 define i32 @bcmp(ptr %0, ptr %1, i32 %2) {
   ret i32 0
 }

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [llvm][IR] Treat memcmp and bcmp as libcalls (PR #135706)

2025-04-14 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-lto

Author: Paul Kirth (ilovepi)


Changes

Since the backend may emit calls to these functions, they should be
treated like other libcalls. If we don't, then it is possible to
have their definitions removed during LTO because they are dead, only to
have a later transform introduce calls to them.

See 
https://discourse.llvm.org/t/rfc-addressing-deficiencies-in-llvm-s-lto-implementation/84999
for more information.

---
Full diff: https://github.com/llvm/llvm-project/pull/135706.diff


2 Files Affected:

- (modified) llvm/include/llvm/IR/RuntimeLibcalls.def (+2) 
- (modified) llvm/test/LTO/Resolution/RISCV/bcmp-libcall.ll (+1-2) 


``diff
diff --git a/llvm/include/llvm/IR/RuntimeLibcalls.def 
b/llvm/include/llvm/IR/RuntimeLibcalls.def
index 2545aebc73391..2c72bc8c012cc 100644
--- a/llvm/include/llvm/IR/RuntimeLibcalls.def
+++ b/llvm/include/llvm/IR/RuntimeLibcalls.def
@@ -513,6 +513,8 @@ HANDLE_LIBCALL(UO_PPCF128, "__gcc_qunord")
 HANDLE_LIBCALL(MEMCPY, "memcpy")
 HANDLE_LIBCALL(MEMMOVE, "memmove")
 HANDLE_LIBCALL(MEMSET, "memset")
+HANDLE_LIBCALL(MEMCMP, "memcmp")
+HANDLE_LIBCALL(BCMP, "bcmp")
 // DSEPass can emit calloc if it finds a pair of malloc/memset
 HANDLE_LIBCALL(CALLOC, "calloc")
 HANDLE_LIBCALL(BZERO, nullptr)
diff --git a/llvm/test/LTO/Resolution/RISCV/bcmp-libcall.ll 
b/llvm/test/LTO/Resolution/RISCV/bcmp-libcall.ll
index 4c6bebf69a074..80421cd9350c8 100644
--- a/llvm/test/LTO/Resolution/RISCV/bcmp-libcall.ll
+++ b/llvm/test/LTO/Resolution/RISCV/bcmp-libcall.ll
@@ -29,8 +29,7 @@ define i1 @foo(ptr %0, [2 x i32] %1) {
 declare i32 @memcmp(ptr, ptr, i32)
 
 ;; Ensure bcmp is removed from module. Follow up patches can address this.
-; INTERNALIZE-NOT: declare{{.*}}i32 @bcmp
-; INTERNALIZE-NOT: define{{.*}}i32 @bcmp
+; INTERNALIZE: define{{.*}}i32 @bcmp
 define i32 @bcmp(ptr %0, ptr %1, i32 %2) {
   ret i32 0
 }

``




https://github.com/llvm/llvm-project/pull/135706
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [llvm][IR] Treat memcmp and bcmp as libcalls (PR #135706)

2025-04-14 Thread Paul Kirth via llvm-branch-commits


https://github.com/ilovepi ready_for_review 
https://github.com/llvm/llvm-project/pull/135706
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [compiler-rt] [llvm] Reentry (PR #135656)

2025-04-14 Thread via llvm-branch-commits


github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff HEAD~1 HEAD --extensions h,cpp -- 
compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp 
compiler-rt/lib/ctx_profile/tests/CtxInstrProfilingTest.cpp 
llvm/include/llvm/ProfileData/CtxInstrContextNode.h 
llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
``





View the diff from clang-format here.


``diff
diff --git a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp 
b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
index 2e26541c1..0261bab53 100644
--- a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
+++ b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
@@ -55,7 +55,7 @@ void onFunctionExited(void *Address) {
 }
 
 // Returns true if it was entered the first time
-bool rootEnterIsFirst(void* Address) {
+bool rootEnterIsFirst(void *Address) {
   bool Ret = true;
   if (!EnteredContextAddress) {
 EnteredContextAddress = Address;
@@ -67,14 +67,13 @@ bool rootEnterIsFirst(void* Address) {
 }
 
 // Return true if this also exits the root.
-bool exitsRoot(void* Address) {
+bool exitsRoot(void *Address) {
   onFunctionExited(Address);
   if (UnderContextRefCount == 0) {
 EnteredContextAddress = nullptr;
 return true;
   }
   return false;
-
 }
 
 bool hasEnteredARoot() { return UnderContextRefCount > 0; }
@@ -367,8 +366,7 @@ ContextNode 
*getOrStartContextOutsideCollection(FunctionData &Data,
 
   // If we didn't start profiling, or if we are under a context, just not
   // collecting, return the scratch buffer.
-  if (hasEnteredARoot() ||
-  !__sanitizer::atomic_load_relaxed(&ProfilingStarted))
+  if (hasEnteredARoot() || 
!__sanitizer::atomic_load_relaxed(&ProfilingStarted))
 return TheScratchContext;
   return markAsScratch(
   onContextEnter(*getFlatProfile(Data, Callee, Guid, NumCounters)));
diff --git a/compiler-rt/lib/ctx_profile/tests/CtxInstrProfilingTest.cpp 
b/compiler-rt/lib/ctx_profile/tests/CtxInstrProfilingTest.cpp
index 39a225ac1..064608d28 100644
--- a/compiler-rt/lib/ctx_profile/tests/CtxInstrProfilingTest.cpp
+++ b/compiler-rt/lib/ctx_profile/tests/CtxInstrProfilingTest.cpp
@@ -276,7 +276,7 @@ TEST_F(ContextTest, RootEntersOtherRoot) {
   __llvm_ctx_profile_release_context(&Root);
   EXPECT_EQ(__llvm_ctx_profile_current_context_root, Root.CtxRoot);
   __llvm_ctx_profile_release_context(&Root);
-  EXPECT_EQ(__llvm_ctx_profile_current_context_root, nullptr);  
+  EXPECT_EQ(__llvm_ctx_profile_current_context_root, nullptr);
 }
 
 TEST_F(ContextTest, NeedMoreMemory) {

``




https://github.com/llvm/llvm-project/pull/135656
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [HLSL] Adding support for Root Constants in LLVM Metadata (PR #135085)

2025-04-14 Thread via llvm-branch-commits



@@ -0,0 +1,34 @@
+; RUN: opt %s -dxil-embed -dxil-globals -S -o - | FileCheck %s
+; RUN: llc %s --filetype=obj -o - | obj2yaml | FileCheck %s --check-prefix=DXC
+
+target triple = "dxil-unknown-shadermodel6.0-compute"
+
+; CHECK: @dx.rts0 = private constant [48 x i8]  c"{{.*}}", section "RTS0", 
align 4
+
+define void @main() #0 {
+entry:
+  ret void
+}
+attributes #0 = { "hlsl.numthreads"="1,1,1" "hlsl.shader"="compute" }
+
+
+!dx.rootsignatures = !{!2} ; list of function/root signature pairs
+!2 = !{ ptr @main, !3 } ; function, root signature
+!3 = !{ !4, !5 } ; list of root signature elements
+!4 = !{ !"RootFlags", i32 1 } ; 1 = allow_input_assembler_input_layout
+!5 = !{ !"RootConstants", i32 0, i32 1, i32 2, i32 3 }
+
+; DXC:  - Name:RTS0
+; DXC-NEXT:Size:48
+; DXC-NEXT:RootSignature:
+; DXC-NEXT:  Version: 2
+; DXC-NEXT:  NumStaticSamplers: 0
+; DXC-NEXT:  StaticSamplersOffset: 0

joaosaffran wrote:

The difference comes to the fact that `obj2yaml` has no support for static 
samplers yet. So those tools are not dealing with this value consistently. This 
will be fixed when adding the support is added to `obj2yaml`

https://github.com/llvm/llvm-project/pull/135085
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] release/20.x: [X86][AVX10] Remove VAES and VPCLMULQDQ feature from AVX10.1 (#135489) (PR #135577)

2025-04-14 Thread via llvm-branch-commits


https://github.com/llvmbot updated 
https://github.com/llvm/llvm-project/pull/135577

>From 0c30835a63db5bb309abe8533a9c57b3b00a15ed Mon Sep 17 00:00:00 2001
From: Phoebe Wang 
Date: Mon, 14 Apr 2025 08:54:10 +0800
Subject: [PATCH] [X86][AVX10] Remove VAES and VPCLMULQDQ feature from AVX10.1
 (#135489)

According to SDM, they require both VAES/VPCLMULQDQ and AVX10.1 CPUID
bits.

Fixes: #135394
(cherry picked from commit ebba554a3211b0b98d3ae33ba70f9d6ceaab6ad4)
---
 clang/test/CodeGen/attr-target-x86.c  | 8 
 llvm/lib/Target/X86/X86.td| 2 +-
 llvm/lib/TargetParser/X86TargetParser.cpp | 3 +--
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/clang/test/CodeGen/attr-target-x86.c 
b/clang/test/CodeGen/attr-target-x86.c
index c92aad633082f..e5067c1c3b075 100644
--- a/clang/test/CodeGen/attr-target-x86.c
+++ b/clang/test/CodeGen/attr-target-x86.c
@@ -56,7 +56,7 @@ void f_default2(void) {
 __attribute__((target("avx,  sse4.2,  arch=   ivybridge")))
 void f_avx_sse4_2_ivybridge_2(void) {}
 
-// CHECK: [[f_no_aes_ivybridge]] = {{.*}}"target-cpu"="ivybridge" 
"target-features"="+avx,+cmov,+crc32,+cx16,+cx8,+f16c,+fsgsbase,+fxsr,+mmx,+pclmul,+popcnt,+rdrnd,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt,-aes,-amx-avx512,-avx10.1-256,-avx10.1-512,-avx10.2-256,-avx10.2-512,-vaes"
+// CHECK: [[f_no_aes_ivybridge]] = {{.*}}"target-cpu"="ivybridge" 
"target-features"="+avx,+cmov,+crc32,+cx16,+cx8,+f16c,+fsgsbase,+fxsr,+mmx,+pclmul,+popcnt,+rdrnd,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt,-aes,-vaes"
 __attribute__((target("no-aes, arch=ivybridge")))
 void f_no_aes_ivybridge(void) {}
 
@@ -98,11 +98,11 @@ void f_x86_64_v3(void) {}
 __attribute__((target("arch=x86-64-v4")))
 void f_x86_64_v4(void) {}
 
-// CHECK: [[f_avx10_1_256]] = {{.*}}"target-cpu"="i686" 
"target-features"="+aes,+avx,+avx10.1-256,+avx2,+avx512bf16,+avx512bitalg,+avx512bw,+avx512cd,+avx512dq,+avx512f,+avx512fp16,+avx512ifma,+avx512vbmi,+avx512vbmi2,+avx512vl,+avx512vnni,+avx512vpopcntdq,+cmov,+crc32,+cx8,+f16c,+fma,+mmx,+pclmul,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+vaes,+vpclmulqdq,+x87,+xsave,-amx-avx512,-avx10.1-512,-avx10.2-512,-evex512"
+// CHECK: [[f_avx10_1_256]] = {{.*}}"target-cpu"="i686" 
"target-features"="+avx,+avx10.1-256,+avx2,+avx512bf16,+avx512bitalg,+avx512bw,+avx512cd,+avx512dq,+avx512f,+avx512fp16,+avx512ifma,+avx512vbmi,+avx512vbmi2,+avx512vl,+avx512vnni,+avx512vpopcntdq,+cmov,+crc32,+cx8,+f16c,+fma,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,-amx-avx512,-avx10.1-512,-avx10.2-512,-evex512"
 __attribute__((target("avx10.1-256")))
 void f_avx10_1_256(void) {}
 
-// CHECK: [[f_avx10_1_512]] = {{.*}}"target-cpu"="i686" 
"target-features"="+aes,+avx,+avx10.1-256,+avx10.1-512,+avx2,+avx512bf16,+avx512bitalg,+avx512bw,+avx512cd,+avx512dq,+avx512f,+avx512fp16,+avx512ifma,+avx512vbmi,+avx512vbmi2,+avx512vl,+avx512vnni,+avx512vpopcntdq,+cmov,+crc32,+cx8,+evex512,+f16c,+fma,+mmx,+pclmul,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+vaes,+vpclmulqdq,+x87,+xsave"
+// CHECK: [[f_avx10_1_512]] = {{.*}}"target-cpu"="i686" 
"target-features"="+avx,+avx10.1-256,+avx10.1-512,+avx2,+avx512bf16,+avx512bitalg,+avx512bw,+avx512cd,+avx512dq,+avx512f,+avx512fp16,+avx512ifma,+avx512vbmi,+avx512vbmi2,+avx512vl,+avx512vnni,+avx512vpopcntdq,+cmov,+crc32,+cx8,+evex512,+f16c,+fma,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave"
 __attribute__((target("avx10.1-512")))
 void f_avx10_1_512(void) {}
 
@@ -112,4 +112,4 @@ void f_prefer_256_bit(void) {}
 
 // CHECK: [[f_no_prefer_256_bit]] = 
{{.*}}"target-features"="{{.*}}-prefer-256-bit
 __attribute__((target("no-prefer-256-bit")))
-void f_no_prefer_256_bit(void) {}
\ No newline at end of file
+void f_no_prefer_256_bit(void) {}
diff --git a/llvm/lib/Target/X86/X86.td b/llvm/lib/Target/X86/X86.td
index 38761e1fd7eec..577428cad6d61 100644
--- a/llvm/lib/Target/X86/X86.td
+++ b/llvm/lib/Target/X86/X86.td
@@ -338,7 +338,7 @@ def FeatureAVX10_1 : SubtargetFeature<"avx10.1-256", 
"HasAVX10_1", "true",
   "Support AVX10.1 up to 256-bit 
instruction",
   [FeatureCDI, FeatureVBMI, FeatureIFMA, 
FeatureVNNI,
FeatureBF16, FeatureVPOPCNTDQ, 
FeatureVBMI2, FeatureBITALG,
-   FeatureVAES, FeatureVPCLMULQDQ, 
FeatureFP16]>;
+   FeatureFP16]>;
 def FeatureAVX10_1_512 : SubtargetFeature<"avx10.1-512", "HasAVX10_1_512", 
"true",
   "Support AVX10.1 up to 512-bit 
instruction",
   [FeatureAVX10_1, FeatureEVEX512]>;
diff --git a/llvm/lib/TargetParser/X86TargetParser.cpp 
b/llvm/lib/TargetParser/X86TargetParser.cpp
index e4b7ed7cf9b61..2ae6dd6b3d1ef 100644
--- a/llvm/lib/TargetParser/X86TargetParser.cpp
+++ b/llvm/lib/TargetParser/X86T

[llvm-branch-commits] [clang] 0c30835 - [X86][AVX10] Remove VAES and VPCLMULQDQ feature from AVX10.1 (#135489)

2025-04-14 Thread Tom Stellard via llvm-branch-commits


Author: Phoebe Wang
Date: 2025-04-14T12:45:29-07:00
New Revision: 0c30835a63db5bb309abe8533a9c57b3b00a15ed

URL: 
https://github.com/llvm/llvm-project/commit/0c30835a63db5bb309abe8533a9c57b3b00a15ed
DIFF: 
https://github.com/llvm/llvm-project/commit/0c30835a63db5bb309abe8533a9c57b3b00a15ed.diff

LOG: [X86][AVX10] Remove VAES and VPCLMULQDQ feature from AVX10.1 (#135489)

According to SDM, they require both VAES/VPCLMULQDQ and AVX10.1 CPUID
bits.

Fixes: #135394
(cherry picked from commit ebba554a3211b0b98d3ae33ba70f9d6ceaab6ad4)

Added: 


Modified: 
clang/test/CodeGen/attr-target-x86.c
llvm/lib/Target/X86/X86.td
llvm/lib/TargetParser/X86TargetParser.cpp

Removed: 




diff  --git a/clang/test/CodeGen/attr-target-x86.c 
b/clang/test/CodeGen/attr-target-x86.c
index c92aad633082f..e5067c1c3b075 100644
--- a/clang/test/CodeGen/attr-target-x86.c
+++ b/clang/test/CodeGen/attr-target-x86.c
@@ -56,7 +56,7 @@ void f_default2(void) {
 __attribute__((target("avx,  sse4.2,  arch=   ivybridge")))
 void f_avx_sse4_2_ivybridge_2(void) {}
 
-// CHECK: [[f_no_aes_ivybridge]] = {{.*}}"target-cpu"="ivybridge" 
"target-features"="+avx,+cmov,+crc32,+cx16,+cx8,+f16c,+fsgsbase,+fxsr,+mmx,+pclmul,+popcnt,+rdrnd,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt,-aes,-amx-avx512,-avx10.1-256,-avx10.1-512,-avx10.2-256,-avx10.2-512,-vaes"
+// CHECK: [[f_no_aes_ivybridge]] = {{.*}}"target-cpu"="ivybridge" 
"target-features"="+avx,+cmov,+crc32,+cx16,+cx8,+f16c,+fsgsbase,+fxsr,+mmx,+pclmul,+popcnt,+rdrnd,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt,-aes,-vaes"
 __attribute__((target("no-aes, arch=ivybridge")))
 void f_no_aes_ivybridge(void) {}
 
@@ -98,11 +98,11 @@ void f_x86_64_v3(void) {}
 __attribute__((target("arch=x86-64-v4")))
 void f_x86_64_v4(void) {}
 
-// CHECK: [[f_avx10_1_256]] = {{.*}}"target-cpu"="i686" 
"target-features"="+aes,+avx,+avx10.1-256,+avx2,+avx512bf16,+avx512bitalg,+avx512bw,+avx512cd,+avx512dq,+avx512f,+avx512fp16,+avx512ifma,+avx512vbmi,+avx512vbmi2,+avx512vl,+avx512vnni,+avx512vpopcntdq,+cmov,+crc32,+cx8,+f16c,+fma,+mmx,+pclmul,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+vaes,+vpclmulqdq,+x87,+xsave,-amx-avx512,-avx10.1-512,-avx10.2-512,-evex512"
+// CHECK: [[f_avx10_1_256]] = {{.*}}"target-cpu"="i686" 
"target-features"="+avx,+avx10.1-256,+avx2,+avx512bf16,+avx512bitalg,+avx512bw,+avx512cd,+avx512dq,+avx512f,+avx512fp16,+avx512ifma,+avx512vbmi,+avx512vbmi2,+avx512vl,+avx512vnni,+avx512vpopcntdq,+cmov,+crc32,+cx8,+f16c,+fma,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,-amx-avx512,-avx10.1-512,-avx10.2-512,-evex512"
 __attribute__((target("avx10.1-256")))
 void f_avx10_1_256(void) {}
 
-// CHECK: [[f_avx10_1_512]] = {{.*}}"target-cpu"="i686" 
"target-features"="+aes,+avx,+avx10.1-256,+avx10.1-512,+avx2,+avx512bf16,+avx512bitalg,+avx512bw,+avx512cd,+avx512dq,+avx512f,+avx512fp16,+avx512ifma,+avx512vbmi,+avx512vbmi2,+avx512vl,+avx512vnni,+avx512vpopcntdq,+cmov,+crc32,+cx8,+evex512,+f16c,+fma,+mmx,+pclmul,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+vaes,+vpclmulqdq,+x87,+xsave"
+// CHECK: [[f_avx10_1_512]] = {{.*}}"target-cpu"="i686" 
"target-features"="+avx,+avx10.1-256,+avx10.1-512,+avx2,+avx512bf16,+avx512bitalg,+avx512bw,+avx512cd,+avx512dq,+avx512f,+avx512fp16,+avx512ifma,+avx512vbmi,+avx512vbmi2,+avx512vl,+avx512vnni,+avx512vpopcntdq,+cmov,+crc32,+cx8,+evex512,+f16c,+fma,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave"
 __attribute__((target("avx10.1-512")))
 void f_avx10_1_512(void) {}
 
@@ -112,4 +112,4 @@ void f_prefer_256_bit(void) {}
 
 // CHECK: [[f_no_prefer_256_bit]] = 
{{.*}}"target-features"="{{.*}}-prefer-256-bit
 __attribute__((target("no-prefer-256-bit")))
-void f_no_prefer_256_bit(void) {}
\ No newline at end of file
+void f_no_prefer_256_bit(void) {}

diff  --git a/llvm/lib/Target/X86/X86.td b/llvm/lib/Target/X86/X86.td
index 38761e1fd7eec..577428cad6d61 100644
--- a/llvm/lib/Target/X86/X86.td
+++ b/llvm/lib/Target/X86/X86.td
@@ -338,7 +338,7 @@ def FeatureAVX10_1 : SubtargetFeature<"avx10.1-256", 
"HasAVX10_1", "true",
   "Support AVX10.1 up to 256-bit 
instruction",
   [FeatureCDI, FeatureVBMI, FeatureIFMA, 
FeatureVNNI,
FeatureBF16, FeatureVPOPCNTDQ, 
FeatureVBMI2, FeatureBITALG,
-   FeatureVAES, FeatureVPCLMULQDQ, 
FeatureFP16]>;
+   FeatureFP16]>;
 def FeatureAVX10_1_512 : SubtargetFeature<"avx10.1-512", "HasAVX10_1_512", 
"true",
   "Support AVX10.1 up to 512-bit 
instruction",
   [FeatureAVX10_1, FeatureEVEX512]>;

diff  --git a/llvm/lib/TargetParser/X86TargetParser.cpp 
b/llvm/lib/TargetParser/X86TargetParser.cpp
index e4b7ed7cf9b

[llvm-branch-commits] [llvm] [GOFF] Add writing of section symbols (PR #133799)

2025-04-14 Thread Kai Nacke via llvm-branch-commits



@@ -0,0 +1,113 @@
+//===- MCGOFFAttributes.h - Attributes of GOFF symbols 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Defines the various attribute collections defining GOFF symbols.
+//
+//===--===//
+
+#ifndef LLVM_MC_MCGOFFATTRIBUTES_H
+#define LLVM_MC_MCGOFFATTRIBUTES_H
+
+#include "llvm/ADT/StringRef.h"
+#include "llvm/BinaryFormat/GOFF.h"
+
+namespace llvm {
+namespace GOFF {
+// An "External Symbol Definition" in the GOFF file has a type, and depending 
on
+// the type a different subset of the fields is used.
+//
+// Unlike other formats, a 2 dimensional structure is used to define the
+// location of data. For example, the equivalent of the ELF .text section is
+// made up of a Section Definition (SD) and a class (Element Definition; ED).
+// The name of the SD symbol depends on the application, while the class has 
the
+// predefined name C_CODE/C_CODE64 in AMODE31 and AMODE64 respectively.
+//
+// Data can be placed into this structure in 2 ways. First, the data (in a text
+// record) can be associated with an ED symbol. To refer to data, a Label
+// Definition (LD) is used to give an offset into the data a name. When 
binding,
+// the whole data is pulled into the resulting executable, and the addresses
+// given by the LD symbols are resolved.
+//
+// The alternative is to use a Part Definition (PR). In this case, the data (in
+// a text record) is associated with the part. When binding, only the data of
+// referenced PRs is pulled into the resulting binary.
+//
+// Both approaches are used, which means that the equivalent of a section in 
ELF
+// results in 3 GOFF symbols, either SD/ED/LD or SD/ED/PR. Moreover, certain
+// sections are fine with just defining SD/ED symbols. The SymbolMapper takes
+// care of all those details.
+
+// Attributes for SD symbols.
+struct SDAttr {
+  GOFF::ESDTaskingBehavior TaskingBehavior = GOFF::ESD_TA_Unspecified;
+  GOFF::ESDBindingScope BindingScope = GOFF::ESD_BSC_Unspecified;
+};
+
+// Attributes for ED symbols.
+struct EDAttr {
+  bool IsReadOnly = false;
+  GOFF::ESDExecutable Executable = GOFF::ESD_EXE_Unspecified;
+  GOFF::ESDAmode Amode;
+  GOFF::ESDRmode Rmode;
+  GOFF::ESDNameSpaceId NameSpace = GOFF::ESD_NS_NormalName;
+  GOFF::ESDTextStyle TextStyle = GOFF::ESD_TS_ByteOriented;
+  GOFF::ESDBindingAlgorithm BindAlgorithm = GOFF::ESD_BA_Concatenate;
+  GOFF::ESDLoadingBehavior LoadBehavior = GOFF::ESD_LB_Initial;
+  GOFF::ESDReserveQwords ReservedQwords = GOFF::ESD_RQ_0;
+  GOFF::ESDAlignment Alignment = GOFF::ESD_ALIGN_Doubleword;
+};
+
+// Attributes for LD symbols.
+struct LDAttr {
+  bool IsRenamable = false;
+  GOFF::ESDExecutable Executable = GOFF::ESD_EXE_Unspecified;
+  GOFF::ESDNameSpaceId NameSpace = GOFF::ESD_NS_NormalName;
+  GOFF::ESDBindingStrength BindingStrength = GOFF::ESD_BST_Strong;
+  GOFF::ESDLinkageType Linkage = GOFF::ESD_LT_XPLink;
+  GOFF::ESDAmode Amode;
+  GOFF::ESDBindingScope BindingScope = GOFF::ESD_BSC_Unspecified;
+};
+
+// Attributes for PR symbols.
+struct PRAttr {
+  bool IsRenamable = false;
+  bool IsReadOnly = false; //  Not documented.
+  GOFF::ESDExecutable Executable = GOFF::ESD_EXE_Unspecified;
+  GOFF::ESDNameSpaceId NameSpace = GOFF::ESD_NS_NormalName;
+  GOFF::ESDLinkageType Linkage = GOFF::ESD_LT_XPLink;
+  GOFF::ESDAmode Amode;

redstar wrote:

Removed.

https://github.com/llvm/llvm-project/pull/133799
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] [HLSL] Allow resource annotations to specify only register space (PR #135287)

2025-04-14 Thread Sarah Spall via llvm-branch-commits



@@ -163,11 +163,16 @@ void Parser::ParseHLSLAnnotations(ParsedAttributes &Attrs,
 SourceLocation SlotLoc = Tok.getLocation();
 ArgExprs.push_back(ParseIdentifierLoc());
 
-// Add numeric_constant for fix-it.
-if (SlotStr.size() == 1 && Tok.is(tok::numeric_constant))
+if (SlotStr.size() == 1) {
+  if (!Tok.is(tok::numeric_constant)) {
+Diag(Tok.getLocation(), diag::err_expected) << tok::numeric_constant;
+SkipUntil(tok::r_paren, StopAtSemi); // skip through )

spall wrote:

I'm unfamiliar with the parsing code so this might be a dumb question, but why 
do you SkipUntil here? What happens after the code returns?

https://github.com/llvm/llvm-project/pull/135287
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [mlir] [MLIR][ArmSVE] Add initial lowering of vector.contract to SVE `*MMLA` instructions (PR #135636)

2025-04-14 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-mlir-sve

Author: Momchil Velikov (momchil-velikov)


Changes

Supersedes https://github.com/llvm/llvm-project/pull/135359

---

Patch is 77.36 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/135636.diff


16 Files Affected:

- (modified) mlir/include/mlir/Conversion/Passes.td (+4) 
- (modified) mlir/include/mlir/Dialect/ArmSVE/Transforms/Transforms.h (+3) 
- (modified) mlir/lib/Conversion/VectorToLLVM/CMakeLists.txt (+1) 
- (modified) mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp (+7) 
- (modified) 
mlir/lib/Dialect/ArmNeon/Transforms/LowerContractionToSMMLAPattern.cpp (+4-1) 
- (modified) mlir/lib/Dialect/ArmSVE/Transforms/CMakeLists.txt (+1) 
- (added) 
mlir/lib/Dialect/ArmSVE/Transforms/LowerContractionToSVEI8MMPattern.cpp (+304) 
- (added) mlir/test/Dialect/Vector/CPU/ArmSVE/vector-smmla.mlir (+94) 
- (added) mlir/test/Dialect/Vector/CPU/ArmSVE/vector-summla.mlir (+85) 
- (added) mlir/test/Dialect/Vector/CPU/ArmSVE/vector-ummla.mlir (+94) 
- (added) mlir/test/Dialect/Vector/CPU/ArmSVE/vector-usmmla.mlir (+95) 
- (added) 
mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/contraction-smmla-4x8x4.mlir 
(+117) 
- (added) 
mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/contraction-smmla-8x8x8-vs2.mlir
 (+159) 
- (added) 
mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/contraction-summla-4x8x4.mlir 
(+118) 
- (added) 
mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/contraction-ummla-4x8x4.mlir 
(+119) 
- (added) 
mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/contraction-usmmla-4x8x4.mlir 
(+117) 


``diff
diff --git a/mlir/include/mlir/Conversion/Passes.td 
b/mlir/include/mlir/Conversion/Passes.td
index bbba495e613b2..930d8b44abca0 100644
--- a/mlir/include/mlir/Conversion/Passes.td
+++ b/mlir/include/mlir/Conversion/Passes.td
@@ -1406,6 +1406,10 @@ def ConvertVectorToLLVMPass : 
Pass<"convert-vector-to-llvm"> {
"bool", /*default=*/"false",
"Enables the use of ArmSVE dialect while lowering the vector "
"dialect.">,
+Option<"armI8MM", "enable-arm-i8mm",
+   "bool", /*default=*/"false",
+   "Enables the use of Arm FEAT_I8MM instructions while lowering "
+   "the vector dialect.">,
 Option<"x86Vector", "enable-x86vector",
"bool", /*default=*/"false",
"Enables the use of X86Vector dialect while lowering the vector "
diff --git a/mlir/include/mlir/Dialect/ArmSVE/Transforms/Transforms.h 
b/mlir/include/mlir/Dialect/ArmSVE/Transforms/Transforms.h
index 8665c8224cc45..232e2be29e574 100644
--- a/mlir/include/mlir/Dialect/ArmSVE/Transforms/Transforms.h
+++ b/mlir/include/mlir/Dialect/ArmSVE/Transforms/Transforms.h
@@ -20,6 +20,9 @@ class RewritePatternSet;
 void populateArmSVELegalizeForLLVMExportPatterns(
 const LLVMTypeConverter &converter, RewritePatternSet &patterns);
 
+void populateLowerContractionToSVEI8MMPatternPatterns(
+RewritePatternSet &patterns);
+
 /// Configure the target to support lowering ArmSVE ops to ops that map to LLVM
 /// intrinsics.
 void configureArmSVELegalizeForExportTarget(LLVMConversionTarget &target);
diff --git a/mlir/lib/Conversion/VectorToLLVM/CMakeLists.txt 
b/mlir/lib/Conversion/VectorToLLVM/CMakeLists.txt
index 330474a718e30..8e2620029c354 100644
--- a/mlir/lib/Conversion/VectorToLLVM/CMakeLists.txt
+++ b/mlir/lib/Conversion/VectorToLLVM/CMakeLists.txt
@@ -35,6 +35,7 @@ add_mlir_conversion_library(MLIRVectorToLLVMPass
   MLIRVectorToLLVM
 
   MLIRArmNeonDialect
+  MLIRArmNeonTransforms
   MLIRArmSVEDialect
   MLIRArmSVETransforms
   MLIRAMXDialect
diff --git a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp 
b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp
index 7082b92c95d1d..1e6c8122b1d0e 100644
--- a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp
+++ b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp
@@ -14,6 +14,7 @@
 #include "mlir/Dialect/AMX/Transforms.h"
 #include "mlir/Dialect/Arith/IR/Arith.h"
 #include "mlir/Dialect/ArmNeon/ArmNeonDialect.h"
+#include "mlir/Dialect/ArmNeon/Transforms.h"
 #include "mlir/Dialect/ArmSVE/IR/ArmSVEDialect.h"
 #include "mlir/Dialect/ArmSVE/Transforms/Transforms.h"
 #include "mlir/Dialect/Func/IR/FuncOps.h"
@@ -82,6 +83,12 @@ void ConvertVectorToLLVMPass::runOnOperation() {
 populateVectorStepLoweringPatterns(patterns);
 populateVectorRankReducingFMAPattern(patterns);
 populateVectorGatherLoweringPatterns(patterns);
+if (armI8MM) {
+  if (armNeon)
+arm_neon::populateLowerContractionToSMMLAPatternPatterns(patterns);
+  if (armSVE)
+populateLowerContractionToSVEI8MMPatternPatterns(patterns);
+}
 (void)applyPatternsGreedily(getOperation(), std::move(patterns));
   }
 
diff --git 
a/mlir/lib/Dialect/ArmNeon/Transforms/LowerContractionToSMMLAPattern.cpp 
b/mlir/lib/Dialect/ArmNeon/Transforms/LowerContractionToSMMLAPattern.cpp
index 2a1271dfd6bdf..e

[llvm-branch-commits] [mlir] [MLIR][ArmSVE] Add initial lowering of vector.contract to SVE `*MMLA` instructions (PR #135636)

2025-04-14 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-mlir-neon

Author: Momchil Velikov (momchil-velikov)


Changes

Supersedes https://github.com/llvm/llvm-project/pull/135359

---

Patch is 77.36 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/135636.diff


16 Files Affected:

- (modified) mlir/include/mlir/Conversion/Passes.td (+4) 
- (modified) mlir/include/mlir/Dialect/ArmSVE/Transforms/Transforms.h (+3) 
- (modified) mlir/lib/Conversion/VectorToLLVM/CMakeLists.txt (+1) 
- (modified) mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp (+7) 
- (modified) 
mlir/lib/Dialect/ArmNeon/Transforms/LowerContractionToSMMLAPattern.cpp (+4-1) 
- (modified) mlir/lib/Dialect/ArmSVE/Transforms/CMakeLists.txt (+1) 
- (added) 
mlir/lib/Dialect/ArmSVE/Transforms/LowerContractionToSVEI8MMPattern.cpp (+304) 
- (added) mlir/test/Dialect/Vector/CPU/ArmSVE/vector-smmla.mlir (+94) 
- (added) mlir/test/Dialect/Vector/CPU/ArmSVE/vector-summla.mlir (+85) 
- (added) mlir/test/Dialect/Vector/CPU/ArmSVE/vector-ummla.mlir (+94) 
- (added) mlir/test/Dialect/Vector/CPU/ArmSVE/vector-usmmla.mlir (+95) 
- (added) 
mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/contraction-smmla-4x8x4.mlir 
(+117) 
- (added) 
mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/contraction-smmla-8x8x8-vs2.mlir
 (+159) 
- (added) 
mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/contraction-summla-4x8x4.mlir 
(+118) 
- (added) 
mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/contraction-ummla-4x8x4.mlir 
(+119) 
- (added) 
mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/contraction-usmmla-4x8x4.mlir 
(+117) 


``diff
diff --git a/mlir/include/mlir/Conversion/Passes.td 
b/mlir/include/mlir/Conversion/Passes.td
index bbba495e613b2..930d8b44abca0 100644
--- a/mlir/include/mlir/Conversion/Passes.td
+++ b/mlir/include/mlir/Conversion/Passes.td
@@ -1406,6 +1406,10 @@ def ConvertVectorToLLVMPass : 
Pass<"convert-vector-to-llvm"> {
"bool", /*default=*/"false",
"Enables the use of ArmSVE dialect while lowering the vector "
"dialect.">,
+Option<"armI8MM", "enable-arm-i8mm",
+   "bool", /*default=*/"false",
+   "Enables the use of Arm FEAT_I8MM instructions while lowering "
+   "the vector dialect.">,
 Option<"x86Vector", "enable-x86vector",
"bool", /*default=*/"false",
"Enables the use of X86Vector dialect while lowering the vector "
diff --git a/mlir/include/mlir/Dialect/ArmSVE/Transforms/Transforms.h 
b/mlir/include/mlir/Dialect/ArmSVE/Transforms/Transforms.h
index 8665c8224cc45..232e2be29e574 100644
--- a/mlir/include/mlir/Dialect/ArmSVE/Transforms/Transforms.h
+++ b/mlir/include/mlir/Dialect/ArmSVE/Transforms/Transforms.h
@@ -20,6 +20,9 @@ class RewritePatternSet;
 void populateArmSVELegalizeForLLVMExportPatterns(
 const LLVMTypeConverter &converter, RewritePatternSet &patterns);
 
+void populateLowerContractionToSVEI8MMPatternPatterns(
+RewritePatternSet &patterns);
+
 /// Configure the target to support lowering ArmSVE ops to ops that map to LLVM
 /// intrinsics.
 void configureArmSVELegalizeForExportTarget(LLVMConversionTarget &target);
diff --git a/mlir/lib/Conversion/VectorToLLVM/CMakeLists.txt 
b/mlir/lib/Conversion/VectorToLLVM/CMakeLists.txt
index 330474a718e30..8e2620029c354 100644
--- a/mlir/lib/Conversion/VectorToLLVM/CMakeLists.txt
+++ b/mlir/lib/Conversion/VectorToLLVM/CMakeLists.txt
@@ -35,6 +35,7 @@ add_mlir_conversion_library(MLIRVectorToLLVMPass
   MLIRVectorToLLVM
 
   MLIRArmNeonDialect
+  MLIRArmNeonTransforms
   MLIRArmSVEDialect
   MLIRArmSVETransforms
   MLIRAMXDialect
diff --git a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp 
b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp
index 7082b92c95d1d..1e6c8122b1d0e 100644
--- a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp
+++ b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp
@@ -14,6 +14,7 @@
 #include "mlir/Dialect/AMX/Transforms.h"
 #include "mlir/Dialect/Arith/IR/Arith.h"
 #include "mlir/Dialect/ArmNeon/ArmNeonDialect.h"
+#include "mlir/Dialect/ArmNeon/Transforms.h"
 #include "mlir/Dialect/ArmSVE/IR/ArmSVEDialect.h"
 #include "mlir/Dialect/ArmSVE/Transforms/Transforms.h"
 #include "mlir/Dialect/Func/IR/FuncOps.h"
@@ -82,6 +83,12 @@ void ConvertVectorToLLVMPass::runOnOperation() {
 populateVectorStepLoweringPatterns(patterns);
 populateVectorRankReducingFMAPattern(patterns);
 populateVectorGatherLoweringPatterns(patterns);
+if (armI8MM) {
+  if (armNeon)
+arm_neon::populateLowerContractionToSMMLAPatternPatterns(patterns);
+  if (armSVE)
+populateLowerContractionToSVEI8MMPatternPatterns(patterns);
+}
 (void)applyPatternsGreedily(getOperation(), std::move(patterns));
   }
 
diff --git 
a/mlir/lib/Dialect/ArmNeon/Transforms/LowerContractionToSMMLAPattern.cpp 
b/mlir/lib/Dialect/ArmNeon/Transforms/LowerContractionToSMMLAPattern.cpp
index 2a1271dfd6bdf..

[llvm-branch-commits] [mlir] [MLIR][ArmSVE] Add an ArmSVE dialect operation which maps to svusmmla (PR #135634)

2025-04-14 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-mlir-sve

Author: Momchil Velikov (momchil-velikov)


Changes

Supersedes https://github.com/llvm/llvm-project/pull/135358

---
Full diff: https://github.com/llvm/llvm-project/pull/135634.diff


5 Files Affected:

- (modified) mlir/include/mlir/Dialect/ArmSVE/IR/ArmSVE.td (+32) 
- (modified) mlir/lib/Dialect/ArmSVE/Transforms/LegalizeForLLVMExport.cpp (+4) 
- (modified) mlir/test/Dialect/ArmSVE/legalize-for-llvm.mlir (+12) 
- (modified) mlir/test/Dialect/ArmSVE/roundtrip.mlir (+11) 
- (modified) mlir/test/Target/LLVMIR/arm-sve.mlir (+12) 


``diff
diff --git a/mlir/include/mlir/Dialect/ArmSVE/IR/ArmSVE.td 
b/mlir/include/mlir/Dialect/ArmSVE/IR/ArmSVE.td
index 1a59062ccc93d..da2a8f89b4cfd 100644
--- a/mlir/include/mlir/Dialect/ArmSVE/IR/ArmSVE.td
+++ b/mlir/include/mlir/Dialect/ArmSVE/IR/ArmSVE.td
@@ -273,6 +273,34 @@ def UmmlaOp : ArmSVE_Op<"ummla",
 "$acc `,` $src1 `,` $src2 attr-dict `:` type($src1) `to` type($dst)";
 }
 
+def UsmmlaOp : ArmSVE_Op<"usmmla", [Pure,
+AllTypesMatch<["src1", "src2"]>,
+AllTypesMatch<["acc", "dst"]>]> {
+  let summary = "Matrix-matrix multiply and accumulate op";
+  let description = [{
+USMMLA: Unsigned by signed integer matrix multiply-accumulate.
+
+The unsigned by signed integer matrix multiply-accumulate operation
+multiplies the 2×8 matrix of unsigned 8-bit integer values held
+the first source vector by the 8×2 matrix of signed 8-bit integer
+values in the second source vector. The resulting 2×2 widened 32-bit
+integer matrix product is then added to the 32-bit integer matrix
+accumulator.
+
+Source:
+https://developer.arm.com/documentation/100987/
+  }];
+  // Supports (vector<16xi8>, vector<16xi8>) -> (vector<4xi32>)
+  let arguments = (ins
+  ScalableVectorOfLengthAndType<[4], [I32]>:$acc,
+  ScalableVectorOfLengthAndType<[16], [I8]>:$src1,
+  ScalableVectorOfLengthAndType<[16], [I8]>:$src2
+  );
+  let results = (outs ScalableVectorOfLengthAndType<[4], [I32]>:$dst);
+  let assemblyFormat =
+"$acc `,` $src1 `,` $src2 attr-dict `:` type($src1) `to` type($dst)";
+}
+
 class SvboolTypeConstraint : TypesMatchWith<
   "expected corresponding svbool type widened to [16]xi1",
   lhsArg, rhsArg,
@@ -568,6 +596,10 @@ def SmmlaIntrOp :
   ArmSVE_IntrBinaryOverloadedOp<"smmla">,
   Arguments<(ins AnyScalableVectorOfAnyRank, AnyScalableVectorOfAnyRank, 
AnyScalableVectorOfAnyRank)>;
 
+def UsmmlaIntrOp :
+  ArmSVE_IntrBinaryOverloadedOp<"usmmla">,
+  Arguments<(ins AnyScalableVectorOfAnyRank, AnyScalableVectorOfAnyRank, 
AnyScalableVectorOfAnyRank)>;
+
 def SdotIntrOp :
   ArmSVE_IntrBinaryOverloadedOp<"sdot">,
   Arguments<(ins AnyScalableVectorOfAnyRank, AnyScalableVectorOfAnyRank, 
AnyScalableVectorOfAnyRank)>;
diff --git a/mlir/lib/Dialect/ArmSVE/Transforms/LegalizeForLLVMExport.cpp 
b/mlir/lib/Dialect/ArmSVE/Transforms/LegalizeForLLVMExport.cpp
index fe13ed03356b2..b1846e15196fc 100644
--- a/mlir/lib/Dialect/ArmSVE/Transforms/LegalizeForLLVMExport.cpp
+++ b/mlir/lib/Dialect/ArmSVE/Transforms/LegalizeForLLVMExport.cpp
@@ -24,6 +24,7 @@ using SdotOpLowering = OneToOneConvertToLLVMPattern;
 using SmmlaOpLowering = OneToOneConvertToLLVMPattern;
 using UdotOpLowering = OneToOneConvertToLLVMPattern;
 using UmmlaOpLowering = OneToOneConvertToLLVMPattern;
+using UsmmlaOpLowering = OneToOneConvertToLLVMPattern;
 using DupQLaneLowering =
 OneToOneConvertToLLVMPattern;
 using ScalableMaskedAddIOpLowering =
@@ -194,6 +195,7 @@ void mlir::populateArmSVELegalizeForLLVMExportPatterns(
SmmlaOpLowering,
UdotOpLowering,
UmmlaOpLowering,
+   UsmmlaOpLowering,
DupQLaneLowering,
ScalableMaskedAddIOpLowering,
ScalableMaskedAddFOpLowering,
@@ -222,6 +224,7 @@ void mlir::configureArmSVELegalizeForExportTarget(
 SmmlaIntrOp,
 UdotIntrOp,
 UmmlaIntrOp,
+UsmmlaIntrOp,
 DupQLaneIntrOp,
 ScalableMaskedAddIIntrOp,
 ScalableMaskedAddFIntrOp,
@@ -242,6 +245,7 @@ void mlir::configureArmSVELegalizeForExportTarget(
   SmmlaOp,
   UdotOp,
   UmmlaOp,
+  UsmmlaOp,
   DupQLaneOp,
   ScalableMaskedAddIOp,
   ScalableMaskedAddFOp,
diff --git a/mlir/test/Dialect/ArmSVE/legalize-for-llvm.mlir 
b/mlir/test/Dialect/ArmSVE/legalize-for-llvm.mlir
index 5d044517e0ea8..47587aa26506c 100644
--- a/mlir/test/Dialect/ArmSVE/legalize-for-llvm.mlir
+++ b/mlir/test/Dialect/ArmSVE/legalize-for-llvm.mlir
@@ -48,6 +48,18 @@ func.func @arm_sve_ummla(%a: vector<[16]xi8>,
 
 // -
 
+func.func @arm_sve_usmmla(%a: vector<[16]xi8>,
+

[llvm-branch-commits] [llvm] [ctxprof] Extend the notion of "cannot return" (PR #135651)

2025-04-14 Thread Mircea Trofin via llvm-branch-commits


https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/135651

>From e88504d840c675fbd196dc75c981098abc2c970d Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Mon, 14 Apr 2025 10:03:55 -0700
Subject: [PATCH] [ctxprof] Extend the notion of "cannot return"

---
 .../Instrumentation/PGOCtxProfLowering.cpp| 19 --
 .../ctx-instrumentation-invalid-roots.ll  | 25 +++
 .../PGOProfile/ctx-instrumentation.ll | 13 ++
 3 files changed, 40 insertions(+), 17 deletions(-)

diff --git a/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp 
b/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
index f99d7b9d03e02..136225ab27cdc 100644
--- a/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
+++ b/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
@@ -9,6 +9,7 @@
 
 #include "llvm/Transforms/Instrumentation/PGOCtxProfLowering.h"
 #include "llvm/ADT/STLExtras.h"
+#include "llvm/Analysis/CFG.h"
 #include "llvm/Analysis/CtxProfAnalysis.h"
 #include "llvm/Analysis/OptimizationRemarkEmitter.h"
 #include "llvm/IR/Analysis.h"
@@ -105,6 +106,12 @@ std::pair 
getNumCountersAndCallsites(const Function &F) {
   }
   return {NumCounters, NumCallsites};
 }
+
+void emitUnsupportedRoot(const Function &F, StringRef Reason) {
+  F.getContext().emitError("[ctxprof] The function " + F.getName() +
+   " was indicated as context root but " + Reason +
+   ", which is not supported.");
+}
 } // namespace
 
 // set up tie-in with compiler-rt.
@@ -164,12 +171,8 @@ 
CtxInstrumentationLowerer::CtxInstrumentationLowerer(Module &M,
   for (const auto &BB : *F)
 for (const auto &I : BB)
   if (const auto *CB = dyn_cast(&I))
-if (CB->isMustTailCall()) {
-  M.getContext().emitError("The function " + Fname +
-   " was indicated as a context root, "
-   "but it features musttail "
-   "calls, which is not supported.");
-}
+if (CB->isMustTailCall())
+  emitUnsupportedRoot(*F, "it features musttail calls");
 }
   }
 
@@ -230,11 +233,13 @@ bool CtxInstrumentationLowerer::lowerFunction(Function 
&F) {
 
   // Probably pointless to try to do anything here, unlikely to be
   // performance-affecting.
-  if (F.doesNotReturn()) {
+  if (!llvm::canReturn(F)) {
 for (auto &BB : F)
   for (auto &I : make_early_inc_range(BB))
 if (isa(&I))
   I.eraseFromParent();
+if (ContextRootSet.contains(&F))
+  emitUnsupportedRoot(F, "it does not return");
 return true;
   }
 
diff --git 
a/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll 
b/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll
index 454780153b823..b5ceb4602c60b 100644
--- a/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll
+++ b/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll
@@ -1,17 +1,22 @@
-; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=good \
-; RUN:   -profile-context-root=bad \
-; RUN:   -S < %s 2>&1 | FileCheck %s
+; RUN: split-file %s %t
+; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=the_func -S %t/musttail.ll -o - 2>&1 | FileCheck %s
+; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=the_func -S %t/unreachable.ll -o - 2>&1 | FileCheck %s
+; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=the_func -S %t/noreturn.ll -o - 2>&1 | FileCheck %s
 
+;--- musttail.ll
 declare void @foo()
 
-define void @good() {
-  call void @foo()
-  ret void
-}
-
-define void @bad() {
+define void @the_func() {
   musttail call void @foo()
   ret void
 }
+;--- unreachable.ll
+define void @the_func() {
+  unreachable
+}
+;--- noreturn.ll
+define void @the_func() noreturn {
+  unreachable
+}
 
-; CHECK: error: The function bad was indicated as a context root, but it 
features musttail calls, which is not supported.
+; CHECK: error: [ctxprof] The function the_func was indicated as context root
diff --git a/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll 
b/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll
index 6b2f25a585ec3..71d54f98d26e1 100644
--- a/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll
+++ b/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll
@@ -323,6 +323,18 @@ define void @does_not_return() noreturn {
 ;
   unreachable
 }
+
+define void @unreachable() {
+; INSTRUMENT-LABEL: define void @unreachable() {
+; INSTRUMENT-NEXT:call void @llvm.instrprof.increment(ptr @unreachable, 
i64 742261418966908927, i32 1, i32 0)
+; INSTRUMENT-NEXT:unreachable
+;
+; LOWERING-LABEL: define void @unreachable(
+; LOWERING-SAME: ) !guid [[META9:![0-9]+]] {
+; LOWERING-NEXT:unreachable
+;
+  unreachable
+}
 ;.
 ; LOWERING: attributes #[[ATTR0]

[llvm-branch-commits] [llvm] [ctxprof] Extend the notion of "cannot return" (PR #135651)

2025-04-14 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-llvm-transforms

Author: Mircea Trofin (mtrofin)


Changes



---
Full diff: https://github.com/llvm/llvm-project/pull/135651.diff


3 Files Affected:

- (modified) llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp (+12-7) 
- (modified) 
llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll (+15-10) 
- (modified) llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll (+13) 


``diff
diff --git a/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp 
b/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
index f99d7b9d03e02..136225ab27cdc 100644
--- a/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
+++ b/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
@@ -9,6 +9,7 @@
 
 #include "llvm/Transforms/Instrumentation/PGOCtxProfLowering.h"
 #include "llvm/ADT/STLExtras.h"
+#include "llvm/Analysis/CFG.h"
 #include "llvm/Analysis/CtxProfAnalysis.h"
 #include "llvm/Analysis/OptimizationRemarkEmitter.h"
 #include "llvm/IR/Analysis.h"
@@ -105,6 +106,12 @@ std::pair 
getNumCountersAndCallsites(const Function &F) {
   }
   return {NumCounters, NumCallsites};
 }
+
+void emitUnsupportedRoot(const Function &F, StringRef Reason) {
+  F.getContext().emitError("[ctxprof] The function " + F.getName() +
+   " was indicated as context root but " + Reason +
+   ", which is not supported.");
+}
 } // namespace
 
 // set up tie-in with compiler-rt.
@@ -164,12 +171,8 @@ 
CtxInstrumentationLowerer::CtxInstrumentationLowerer(Module &M,
   for (const auto &BB : *F)
 for (const auto &I : BB)
   if (const auto *CB = dyn_cast(&I))
-if (CB->isMustTailCall()) {
-  M.getContext().emitError("The function " + Fname +
-   " was indicated as a context root, "
-   "but it features musttail "
-   "calls, which is not supported.");
-}
+if (CB->isMustTailCall())
+  emitUnsupportedRoot(*F, "it features musttail calls");
 }
   }
 
@@ -230,11 +233,13 @@ bool CtxInstrumentationLowerer::lowerFunction(Function 
&F) {
 
   // Probably pointless to try to do anything here, unlikely to be
   // performance-affecting.
-  if (F.doesNotReturn()) {
+  if (!llvm::canReturn(F)) {
 for (auto &BB : F)
   for (auto &I : make_early_inc_range(BB))
 if (isa(&I))
   I.eraseFromParent();
+if (ContextRootSet.contains(&F))
+  emitUnsupportedRoot(F, "it does not return");
 return true;
   }
 
diff --git 
a/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll 
b/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll
index 454780153b823..b5ceb4602c60b 100644
--- a/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll
+++ b/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll
@@ -1,17 +1,22 @@
-; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=good \
-; RUN:   -profile-context-root=bad \
-; RUN:   -S < %s 2>&1 | FileCheck %s
+; RUN: split-file %s %t
+; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=the_func -S %t/musttail.ll -o - 2>&1 | FileCheck %s
+; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=the_func -S %t/unreachable.ll -o - 2>&1 | FileCheck %s
+; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=the_func -S %t/noreturn.ll -o - 2>&1 | FileCheck %s
 
+;--- musttail.ll
 declare void @foo()
 
-define void @good() {
-  call void @foo()
-  ret void
-}
-
-define void @bad() {
+define void @the_func() {
   musttail call void @foo()
   ret void
 }
+;--- unreachable.ll
+define void @the_func() {
+  unreachable
+}
+;--- noreturn.ll
+define void @the_func() noreturn {
+  unreachable
+}
 
-; CHECK: error: The function bad was indicated as a context root, but it 
features musttail calls, which is not supported.
+; CHECK: error: [ctxprof] The function the_func was indicated as context root
diff --git a/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll 
b/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll
index 6b2f25a585ec3..71d54f98d26e1 100644
--- a/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll
+++ b/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll
@@ -323,6 +323,18 @@ define void @does_not_return() noreturn {
 ;
   unreachable
 }
+
+define void @unreachable() {
+; INSTRUMENT-LABEL: define void @unreachable() {
+; INSTRUMENT-NEXT:call void @llvm.instrprof.increment(ptr @unreachable, 
i64 742261418966908927, i32 1, i32 0)
+; INSTRUMENT-NEXT:unreachable
+;
+; LOWERING-LABEL: define void @unreachable(
+; LOWERING-SAME: ) !guid [[META9:![0-9]+]] {
+; LOWERING-NEXT:unreachable
+;
+  unreachable
+}
 ;.
 ; LOWERING: attributes #[[ATTR0]] = { noreturn }
 ; LOWERING: attributes #[[ATTR1:[0-9]+]] = { nounwind }
@@ -340,4

[llvm-branch-commits] [llvm] [ctxprof] Extend the notion of "cannot return" (PR #135651)

2025-04-14 Thread Mircea Trofin via llvm-branch-commits


https://github.com/mtrofin ready_for_review 
https://github.com/llvm/llvm-project/pull/135651
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [HLSL] Adding support for Root Constants in LLVM Metadata (PR #135085)

2025-04-14 Thread Finn Plummer via llvm-branch-commits



@@ -94,10 +144,56 @@ static bool parse(LLVMContext *Ctx, 
mcdxbc::RootSignatureDesc &RSD,
 
 static bool verifyRootFlag(uint32_t Flags) { return (Flags & ~0xfff) == 0; }
 
+static bool verifyShaderVisibility(dxbc::ShaderVisibility Flags) {
+  switch (Flags) {
+
+  case dxbc::ShaderVisibility::All:
+  case dxbc::ShaderVisibility::Vertex:
+  case dxbc::ShaderVisibility::Hull:
+  case dxbc::ShaderVisibility::Domain:
+  case dxbc::ShaderVisibility::Geometry:
+  case dxbc::ShaderVisibility::Pixel:
+  case dxbc::ShaderVisibility::Amplification:
+  case dxbc::ShaderVisibility::Mesh:
+return true;
+  }
+
+  return false;
+}
+
+static bool verifyParameterType(dxbc::RootParameterType Flags) {
+  switch (Flags) {
+  case dxbc::RootParameterType::Constants32Bit:
+return true;
+  }
+
+  return false;
+}
+
+static bool verifyVersion(uint32_t Version) {
+  return (Version == 1 || Version == 2);
+}
+
 static bool validate(LLVMContext *Ctx, const mcdxbc::RootSignatureDesc &RSD) {
+
+  if (!verifyVersion(RSD.Header.Version)) {
+return reportValueError(Ctx, "Version", RSD.Header.Version);
+  }
+
   if (!verifyRootFlag(RSD.Header.Flags)) {
-return reportError(Ctx, "Invalid Root Signature flag value");
+return reportValueError(Ctx, "RootFlags", RSD.Header.Flags);
+  }
+
+  for (const auto &P : RSD.Parameters) {
+if (!verifyShaderVisibility(P.Header.ShaderVisibility))
+  return reportValueError(Ctx, "ShaderVisibility",
+  (uint32_t)P.Header.ShaderVisibility);
+
+if (!verifyParameterType(P.Header.ParameterType))
+  return reportValueError(Ctx, "ParameterType",

inbelic wrote:

There isn't a test case demonstrating this

https://github.com/llvm/llvm-project/pull/135085
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [HLSL] Adding support for Root Constants in LLVM Metadata (PR #135085)

2025-04-14 Thread Finn Plummer via llvm-branch-commits





inbelic wrote:

I think we should change the name of the file to be more descriptive

https://github.com/llvm/llvm-project/pull/135085
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [ctxprof] Extend the notion of "cannot return" (PR #135651)

2025-04-14 Thread Mircea Trofin via llvm-branch-commits


https://github.com/mtrofin created 
https://github.com/llvm/llvm-project/pull/135651

None

>From 41540073aaa8adc96e6f7889df66e1791cb4dbc9 Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Mon, 14 Apr 2025 10:03:55 -0700
Subject: [PATCH] [ctxprof] Extend the notion of "cannot return"

---
 .../Instrumentation/PGOCtxProfLowering.cpp| 19 --
 .../ctx-instrumentation-invalid-roots.ll  | 25 +++
 .../PGOProfile/ctx-instrumentation.ll | 13 ++
 3 files changed, 40 insertions(+), 17 deletions(-)

diff --git a/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp 
b/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
index a314457423819..603022d94838c 100644
--- a/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
+++ b/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
@@ -9,6 +9,7 @@
 
 #include "llvm/Transforms/Instrumentation/PGOCtxProfLowering.h"
 #include "llvm/ADT/STLExtras.h"
+#include "llvm/Analysis/CFG.h"
 #include "llvm/Analysis/CtxProfAnalysis.h"
 #include "llvm/Analysis/OptimizationRemarkEmitter.h"
 #include "llvm/IR/Analysis.h"
@@ -102,6 +103,12 @@ std::pair 
getNumCountersAndCallsites(const Function &F) {
   }
   return {NumCounters, NumCallsites};
 }
+
+void emitUnsupportedRoot(const Function &F, StringRef Reason) {
+  F.getContext().emitError("[ctxprof] The function " + F.getName() +
+   " was indicated as context root but " + Reason +
+   ", which is not supported.");
+}
 } // namespace
 
 // set up tie-in with compiler-rt.
@@ -144,12 +151,8 @@ 
CtxInstrumentationLowerer::CtxInstrumentationLowerer(Module &M,
   for (const auto &BB : *F)
 for (const auto &I : BB)
   if (const auto *CB = dyn_cast(&I))
-if (CB->isMustTailCall()) {
-  M.getContext().emitError(
-  "The function " + Fname +
-  " was indicated as a context root, but it features musttail "
-  "calls, which is not supported.");
-}
+if (CB->isMustTailCall())
+  emitUnsupportedRoot(*F, "it features musttail calls");
 }
   }
 
@@ -210,11 +213,13 @@ bool CtxInstrumentationLowerer::lowerFunction(Function 
&F) {
 
   // Probably pointless to try to do anything here, unlikely to be
   // performance-affecting.
-  if (F.doesNotReturn()) {
+  if (!llvm::canReturn(F)) {
 for (auto &BB : F)
   for (auto &I : make_early_inc_range(BB))
 if (isa(&I))
   I.eraseFromParent();
+if (ContextRootSet.contains(&F))
+  emitUnsupportedRoot(F, "it does not return");
 return true;
   }
 
diff --git 
a/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll 
b/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll
index 454780153b823..b5ceb4602c60b 100644
--- a/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll
+++ b/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll
@@ -1,17 +1,22 @@
-; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=good \
-; RUN:   -profile-context-root=bad \
-; RUN:   -S < %s 2>&1 | FileCheck %s
+; RUN: split-file %s %t
+; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=the_func -S %t/musttail.ll -o - 2>&1 | FileCheck %s
+; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=the_func -S %t/unreachable.ll -o - 2>&1 | FileCheck %s
+; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=the_func -S %t/noreturn.ll -o - 2>&1 | FileCheck %s
 
+;--- musttail.ll
 declare void @foo()
 
-define void @good() {
-  call void @foo()
-  ret void
-}
-
-define void @bad() {
+define void @the_func() {
   musttail call void @foo()
   ret void
 }
+;--- unreachable.ll
+define void @the_func() {
+  unreachable
+}
+;--- noreturn.ll
+define void @the_func() noreturn {
+  unreachable
+}
 
-; CHECK: error: The function bad was indicated as a context root, but it 
features musttail calls, which is not supported.
+; CHECK: error: [ctxprof] The function the_func was indicated as context root
diff --git a/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll 
b/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll
index 8f72711a9c8b1..6afa37ef286f5 100644
--- a/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll
+++ b/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll
@@ -323,6 +323,18 @@ define void @does_not_return() noreturn {
 ;
   unreachable
 }
+
+define void @unreachable() {
+; INSTRUMENT-LABEL: define void @unreachable() {
+; INSTRUMENT-NEXT:call void @llvm.instrprof.increment(ptr @unreachable, 
i64 742261418966908927, i32 1, i32 0)
+; INSTRUMENT-NEXT:unreachable
+;
+; LOWERING-LABEL: define void @unreachable(
+; LOWERING-SAME: ) !guid [[META9:![0-9]+]] {
+; LOWERING-NEXT:unreachable
+;
+  unreachable
+}
 ;.
 ; LOWERING: attributes #[[ATTR0]] = { noreturn }
 ; LOWERING: attributes #[[ATTR1:[0-9]+]]

[llvm-branch-commits] [llvm] [ctxprof] Extend the notion of "cannot return" (PR #135651)

2025-04-14 Thread Mircea Trofin via llvm-branch-commits


mtrofin wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/135651?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#135651** https://app.graphite.dev/github/pr/llvm/llvm-project/135651?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/135651?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#135650** https://app.graphite.dev/github/pr/llvm/llvm-project/135650?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/135651
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [ctxprof] Extend the notion of "cannot return" (PR #135651)

2025-04-14 Thread Mircea Trofin via llvm-branch-commits


https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/135651

>From e88504d840c675fbd196dc75c981098abc2c970d Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Mon, 14 Apr 2025 10:03:55 -0700
Subject: [PATCH] [ctxprof] Extend the notion of "cannot return"

---
 .../Instrumentation/PGOCtxProfLowering.cpp| 19 --
 .../ctx-instrumentation-invalid-roots.ll  | 25 +++
 .../PGOProfile/ctx-instrumentation.ll | 13 ++
 3 files changed, 40 insertions(+), 17 deletions(-)

diff --git a/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp 
b/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
index f99d7b9d03e02..136225ab27cdc 100644
--- a/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
+++ b/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
@@ -9,6 +9,7 @@
 
 #include "llvm/Transforms/Instrumentation/PGOCtxProfLowering.h"
 #include "llvm/ADT/STLExtras.h"
+#include "llvm/Analysis/CFG.h"
 #include "llvm/Analysis/CtxProfAnalysis.h"
 #include "llvm/Analysis/OptimizationRemarkEmitter.h"
 #include "llvm/IR/Analysis.h"
@@ -105,6 +106,12 @@ std::pair 
getNumCountersAndCallsites(const Function &F) {
   }
   return {NumCounters, NumCallsites};
 }
+
+void emitUnsupportedRoot(const Function &F, StringRef Reason) {
+  F.getContext().emitError("[ctxprof] The function " + F.getName() +
+   " was indicated as context root but " + Reason +
+   ", which is not supported.");
+}
 } // namespace
 
 // set up tie-in with compiler-rt.
@@ -164,12 +171,8 @@ 
CtxInstrumentationLowerer::CtxInstrumentationLowerer(Module &M,
   for (const auto &BB : *F)
 for (const auto &I : BB)
   if (const auto *CB = dyn_cast(&I))
-if (CB->isMustTailCall()) {
-  M.getContext().emitError("The function " + Fname +
-   " was indicated as a context root, "
-   "but it features musttail "
-   "calls, which is not supported.");
-}
+if (CB->isMustTailCall())
+  emitUnsupportedRoot(*F, "it features musttail calls");
 }
   }
 
@@ -230,11 +233,13 @@ bool CtxInstrumentationLowerer::lowerFunction(Function 
&F) {
 
   // Probably pointless to try to do anything here, unlikely to be
   // performance-affecting.
-  if (F.doesNotReturn()) {
+  if (!llvm::canReturn(F)) {
 for (auto &BB : F)
   for (auto &I : make_early_inc_range(BB))
 if (isa(&I))
   I.eraseFromParent();
+if (ContextRootSet.contains(&F))
+  emitUnsupportedRoot(F, "it does not return");
 return true;
   }
 
diff --git 
a/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll 
b/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll
index 454780153b823..b5ceb4602c60b 100644
--- a/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll
+++ b/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll
@@ -1,17 +1,22 @@
-; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=good \
-; RUN:   -profile-context-root=bad \
-; RUN:   -S < %s 2>&1 | FileCheck %s
+; RUN: split-file %s %t
+; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=the_func -S %t/musttail.ll -o - 2>&1 | FileCheck %s
+; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=the_func -S %t/unreachable.ll -o - 2>&1 | FileCheck %s
+; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=the_func -S %t/noreturn.ll -o - 2>&1 | FileCheck %s
 
+;--- musttail.ll
 declare void @foo()
 
-define void @good() {
-  call void @foo()
-  ret void
-}
-
-define void @bad() {
+define void @the_func() {
   musttail call void @foo()
   ret void
 }
+;--- unreachable.ll
+define void @the_func() {
+  unreachable
+}
+;--- noreturn.ll
+define void @the_func() noreturn {
+  unreachable
+}
 
-; CHECK: error: The function bad was indicated as a context root, but it 
features musttail calls, which is not supported.
+; CHECK: error: [ctxprof] The function the_func was indicated as context root
diff --git a/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll 
b/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll
index 6b2f25a585ec3..71d54f98d26e1 100644
--- a/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll
+++ b/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll
@@ -323,6 +323,18 @@ define void @does_not_return() noreturn {
 ;
   unreachable
 }
+
+define void @unreachable() {
+; INSTRUMENT-LABEL: define void @unreachable() {
+; INSTRUMENT-NEXT:call void @llvm.instrprof.increment(ptr @unreachable, 
i64 742261418966908927, i32 1, i32 0)
+; INSTRUMENT-NEXT:unreachable
+;
+; LOWERING-LABEL: define void @unreachable(
+; LOWERING-SAME: ) !guid [[META9:![0-9]+]] {
+; LOWERING-NEXT:unreachable
+;
+  unreachable
+}
 ;.
 ; LOWERING: attributes #[[ATTR0]

[llvm-branch-commits] [llvm] [HLSL] Adding support for Root Constants in LLVM Metadata (PR #135085)

2025-04-14 Thread via llvm-branch-commits


https://github.com/joaosaffran updated 
https://github.com/llvm/llvm-project/pull/135085

>From 9b59d0108f6b23c039e2c417247216862073cd4b Mon Sep 17 00:00:00 2001
From: joaosaffran 
Date: Wed, 9 Apr 2025 21:05:58 +
Subject: [PATCH 1/9] adding support for root constants in metadata generation

---
 llvm/lib/Target/DirectX/DXILRootSignature.cpp | 120 +-
 llvm/lib/Target/DirectX/DXILRootSignature.h   |   6 +-
 .../RootSignature-Flags-Validation-Error.ll   |   7 +-
 .../RootSignature-RootConstants.ll|  34 +
 ...ature-ShaderVisibility-Validation-Error.ll |  20 +++
 5 files changed, 182 insertions(+), 5 deletions(-)
 create mode 100644 
llvm/test/CodeGen/DirectX/ContainerData/RootSignature-RootConstants.ll
 create mode 100644 
llvm/test/CodeGen/DirectX/ContainerData/RootSignature-ShaderVisibility-Validation-Error.ll

diff --git a/llvm/lib/Target/DirectX/DXILRootSignature.cpp 
b/llvm/lib/Target/DirectX/DXILRootSignature.cpp
index 412ab7765a7ae..7686918b0fc75 100644
--- a/llvm/lib/Target/DirectX/DXILRootSignature.cpp
+++ b/llvm/lib/Target/DirectX/DXILRootSignature.cpp
@@ -40,6 +40,13 @@ static bool reportError(LLVMContext *Ctx, Twine Message,
   return true;
 }
 
+static bool reportValueError(LLVMContext *Ctx, Twine ParamName, uint32_t Value,
+ DiagnosticSeverity Severity = DS_Error) {
+  Ctx->diagnose(DiagnosticInfoGeneric(
+  "Invalid value for " + ParamName + ": " + Twine(Value), Severity));
+  return true;
+}
+
 static bool parseRootFlags(LLVMContext *Ctx, mcdxbc::RootSignatureDesc &RSD,
MDNode *RootFlagNode) {
 
@@ -52,6 +59,45 @@ static bool parseRootFlags(LLVMContext *Ctx, 
mcdxbc::RootSignatureDesc &RSD,
   return false;
 }
 
+static bool extractMdValue(uint32_t &Value, MDNode *Node, unsigned int OpId) {
+
+  auto *CI = mdconst::extract(Node->getOperand(OpId));
+  if (CI == nullptr)
+return true;
+
+  Value = CI->getZExtValue();
+  return false;
+}
+
+static bool parseRootConstants(LLVMContext *Ctx, mcdxbc::RootSignatureDesc 
&RSD,
+   MDNode *RootFlagNode) {
+
+  if (RootFlagNode->getNumOperands() != 5)
+return reportError(Ctx, "Invalid format for RootConstants Element");
+
+  mcdxbc::RootParameter NewParameter;
+  NewParameter.Header.ParameterType = dxbc::RootParameterType::Constants32Bit;
+
+  uint32_t SV;
+  if (extractMdValue(SV, RootFlagNode, 1))
+return reportError(Ctx, "Invalid value for ShaderVisibility");
+
+  NewParameter.Header.ShaderVisibility = (dxbc::ShaderVisibility)SV;
+
+  if (extractMdValue(NewParameter.Constants.ShaderRegister, RootFlagNode, 2))
+return reportError(Ctx, "Invalid value for ShaderRegister");
+
+  if (extractMdValue(NewParameter.Constants.RegisterSpace, RootFlagNode, 3))
+return reportError(Ctx, "Invalid value for RegisterSpace");
+
+  if (extractMdValue(NewParameter.Constants.Num32BitValues, RootFlagNode, 4))
+return reportError(Ctx, "Invalid value for Num32BitValues");
+
+  RSD.Parameters.push_back(NewParameter);
+
+  return false;
+}
+
 static bool parseRootSignatureElement(LLVMContext *Ctx,
   mcdxbc::RootSignatureDesc &RSD,
   MDNode *Element) {
@@ -62,12 +108,16 @@ static bool parseRootSignatureElement(LLVMContext *Ctx,
   RootSignatureElementKind ElementKind =
   StringSwitch(ElementText->getString())
   .Case("RootFlags", RootSignatureElementKind::RootFlags)
+  .Case("RootConstants", RootSignatureElementKind::RootConstants)
   .Default(RootSignatureElementKind::Error);
 
   switch (ElementKind) {
 
   case RootSignatureElementKind::RootFlags:
 return parseRootFlags(Ctx, RSD, Element);
+  case RootSignatureElementKind::RootConstants:
+return parseRootConstants(Ctx, RSD, Element);
+break;
   case RootSignatureElementKind::Error:
 return reportError(Ctx, "Invalid Root Signature Element: " +
 ElementText->getString());
@@ -94,10 +144,56 @@ static bool parse(LLVMContext *Ctx, 
mcdxbc::RootSignatureDesc &RSD,
 
 static bool verifyRootFlag(uint32_t Flags) { return (Flags & ~0xfff) == 0; }
 
+static bool verifyShaderVisibility(dxbc::ShaderVisibility Flags) {
+  switch (Flags) {
+
+  case dxbc::ShaderVisibility::All:
+  case dxbc::ShaderVisibility::Vertex:
+  case dxbc::ShaderVisibility::Hull:
+  case dxbc::ShaderVisibility::Domain:
+  case dxbc::ShaderVisibility::Geometry:
+  case dxbc::ShaderVisibility::Pixel:
+  case dxbc::ShaderVisibility::Amplification:
+  case dxbc::ShaderVisibility::Mesh:
+return true;
+  }
+
+  return false;
+}
+
+static bool verifyParameterType(dxbc::RootParameterType Flags) {
+  switch (Flags) {
+  case dxbc::RootParameterType::Constants32Bit:
+return true;
+  }
+
+  return false;
+}
+
+static bool verifyVersion(uint32_t Version) {
+  return (Version == 1 || Version == 2);
+}
+
 static bool validate(LLVMContext *Ctx, const mcdxbc::RootSignatureDesc

[llvm-branch-commits] [llvm] release/20.x: [HEXAGON] Fix corner cases for hwloops pass (#135439) (PR #135657)

2025-04-14 Thread via llvm-branch-commits


https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/135657

Backport da8ce56c53fe6e34809ba0b310fa90257e230a89

Requested by: @androm3da

>From e2d8942a7829df234bdd0f505008d1db215e4bf8 Mon Sep 17 00:00:00 2001
From: aankit-ca 
Date: Mon, 14 Apr 2025 11:03:10 -0700
Subject: [PATCH] [HEXAGON] Fix corner cases for hwloops pass (#135439)

Add check to make sure Dist > 0 or Dist < 0 for appropriate cmp cases to
hexagon hardware loops pass. The change modifies the
HexagonHardwareLoops pass to add runtime checks to make sure that
end_value > initial_value for less than comparisons and end_value <
initial_value for greater than comparisons.

Fix for https://github.com/llvm/llvm-project/issues/133241

@androm3da @iajbar PTAL

-

Co-authored-by: aankit-quic 
(cherry picked from commit da8ce56c53fe6e34809ba0b310fa90257e230a89)
---
 .../Target/Hexagon/HexagonHardwareLoops.cpp   |  46 ++-
 .../CodeGen/Hexagon/hwloop-dist-check.mir | 277 ++
 llvm/test/CodeGen/Hexagon/swp-phi-start.ll|   5 +-
 3 files changed, 325 insertions(+), 3 deletions(-)
 create mode 100644 llvm/test/CodeGen/Hexagon/hwloop-dist-check.mir

diff --git a/llvm/lib/Target/Hexagon/HexagonHardwareLoops.cpp 
b/llvm/lib/Target/Hexagon/HexagonHardwareLoops.cpp
index 9334746349240..dd4b240455126 100644
--- a/llvm/lib/Target/Hexagon/HexagonHardwareLoops.cpp
+++ b/llvm/lib/Target/Hexagon/HexagonHardwareLoops.cpp
@@ -731,6 +731,11 @@ CountValue *HexagonHardwareLoops::computeCount(MachineLoop 
*Loop,
Register IVReg,
int64_t IVBump,
Comparison::Kind Cmp) const {
+  LLVM_DEBUG(llvm::dbgs() << "Loop: " << *Loop << "\n");
+  LLVM_DEBUG(llvm::dbgs() << "Initial Value: " << *Start << "\n");
+  LLVM_DEBUG(llvm::dbgs() << "End Value: " << *End << "\n");
+  LLVM_DEBUG(llvm::dbgs() << "Inc/Dec Value: " << IVBump << "\n");
+  LLVM_DEBUG(llvm::dbgs() << "Comparison: " << Cmp << "\n");
   // Cannot handle comparison EQ, i.e. while (A == B).
   if (Cmp == Comparison::EQ)
 return nullptr;
@@ -846,6 +851,7 @@ CountValue *HexagonHardwareLoops::computeCount(MachineLoop 
*Loop,
   if (IVBump < 0) {
 std::swap(Start, End);
 IVBump = -IVBump;
+std::swap(CmpLess, CmpGreater);
   }
   // Cmp may now have a wrong direction, e.g.  LEs may now be GEs.
   // Signedness, and "including equality" are preserved.
@@ -989,7 +995,45 @@ CountValue *HexagonHardwareLoops::computeCount(MachineLoop 
*Loop,
 CountSR = 0;
   }
 
-  return new CountValue(CountValue::CV_Register, CountR, CountSR);
+  const TargetRegisterClass *PredRC = &Hexagon::PredRegsRegClass;
+  Register MuxR = CountR;
+  unsigned MuxSR = CountSR;
+  // For the loop count to be valid unsigned number, CmpLess should imply
+  // Dist >= 0. Similarly, CmpGreater should imply Dist < 0. We can skip the
+  // check if the initial distance is zero and the comparison is LTu || LTEu.
+  if (!(Start->isImm() && StartV == 0 && Comparison::isUnsigned(Cmp) &&
+CmpLess) &&
+  (CmpLess || CmpGreater)) {
+// Generate:
+//   DistCheck = CMP_GT DistR,  0   --> CmpLess
+//   DistCheck = CMP_GT DistR, -1   --> CmpGreater
+Register DistCheckR = MRI->createVirtualRegister(PredRC);
+const MCInstrDesc &DistCheckD = TII->get(Hexagon::C2_cmpgti);
+BuildMI(*PH, InsertPos, DL, DistCheckD, DistCheckR)
+.addReg(DistR, 0, DistSR)
+.addImm((CmpLess) ? 0 : -1);
+
+// Generate:
+//   MUXR = MUX DistCheck, CountR, 1   --> CmpLess
+//   MUXR = MUX DistCheck, 1, CountR   --> CmpGreater
+MuxR = MRI->createVirtualRegister(IntRC);
+if (CmpLess) {
+  const MCInstrDesc &MuxD = TII->get(Hexagon::C2_muxir);
+  BuildMI(*PH, InsertPos, DL, MuxD, MuxR)
+  .addReg(DistCheckR)
+  .addReg(CountR, 0, CountSR)
+  .addImm(1);
+} else {
+  const MCInstrDesc &MuxD = TII->get(Hexagon::C2_muxri);
+  BuildMI(*PH, InsertPos, DL, MuxD, MuxR)
+  .addReg(DistCheckR)
+  .addImm(1)
+  .addReg(CountR, 0, CountSR);
+}
+MuxSR = 0;
+  }
+
+  return new CountValue(CountValue::CV_Register, MuxR, MuxSR);
 }
 
 /// Return true if the operation is invalid within hardware loop.
diff --git a/llvm/test/CodeGen/Hexagon/hwloop-dist-check.mir 
b/llvm/test/CodeGen/Hexagon/hwloop-dist-check.mir
new file mode 100644
index 0..9f8c14a314309
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/hwloop-dist-check.mir
@@ -0,0 +1,277 @@
+# RUN: llc --mtriple=hexagon -run-pass=hwloops %s -o - | FileCheck %s
+
+# CHECK-LABEL: name: f
+# CHECK: [[R1:%[0-9]+]]:predregs = C2_cmpgti [[R0:%[0-9]+]], 0
+# CHECK: [[R3:%[0-9]+]]:intregs = C2_muxir [[R1:%[0-9]+]], [[R2:%[0-9]+]], 1
+# CHECK-LABEL: name: g
+# CHECK: [[R1:%[0-9]+]]:predregs = C2_cmpgti [[R0:%[0-9]+]], 0
+# CHECK: [[R3:%[0-9]+]]:intregs = C2_muxir [[R1:%[0-9]+]], [[R2:%[0-9]+]], 1
+--- |
+  @a = dso_

[llvm-branch-commits] [flang] [llvm] [Github][CI] Upload .ninja_log as an artifact (PR #135539)

2025-04-14 Thread Aiden Grossman via llvm-branch-commits


https://github.com/boomanaiden154 updated 
https://github.com/llvm/llvm-project/pull/135539

>From 109923e35d854d63faa5b9599f5fd128bcfe5c79 Mon Sep 17 00:00:00 2001
From: Aiden Grossman 
Date: Sun, 13 Apr 2025 11:26:06 +
Subject: [PATCH 1/3] testing

Created using spr 1.3.4
---
 .ci/monolithic-linux.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.ci/monolithic-linux.sh b/.ci/monolithic-linux.sh
index 6461c9d40ad59..8c1ad5d80da51 100755
--- a/.ci/monolithic-linux.sh
+++ b/.ci/monolithic-linux.sh
@@ -34,6 +34,7 @@ function at-exit {
   mkdir -p artifacts
   ccache --print-stats > artifacts/ccache_stats.txt
   cp "${BUILD_DIR}"/.ninja_log artifacts/.ninja_log
+  ls artifacts/
 
   # If building fails there will be no results files.
   shopt -s nullglob

>From 78c42b3aed24e533d53b2f701f5a0abd5f611e2a Mon Sep 17 00:00:00 2001
From: Aiden Grossman 
Date: Sun, 13 Apr 2025 11:43:16 +
Subject: [PATCH 2/3] cleanup

Created using spr 1.3.4
---
 flang/CMakeLists.txt | 1 -
 1 file changed, 1 deletion(-)

diff --git a/flang/CMakeLists.txt b/flang/CMakeLists.txt
index 236b4644404ec..76eb13295eb07 100644
--- a/flang/CMakeLists.txt
+++ b/flang/CMakeLists.txt
@@ -1,4 +1,3 @@
-# testing
 cmake_minimum_required(VERSION 3.20.0)
 set(LLVM_SUBPROJECT_TITLE "Flang")
 

>From cb1924f997f4e35014df8bc072f038992b7a6bca Mon Sep 17 00:00:00 2001
From: Aiden Grossman 
Date: Mon, 14 Apr 2025 07:36:51 +
Subject: [PATCH 3/3] fix

Created using spr 1.3.4
---
 .ci/monolithic-linux.sh | 1 -
 1 file changed, 1 deletion(-)

diff --git a/.ci/monolithic-linux.sh b/.ci/monolithic-linux.sh
index 8c1ad5d80da51..6461c9d40ad59 100755
--- a/.ci/monolithic-linux.sh
+++ b/.ci/monolithic-linux.sh
@@ -34,7 +34,6 @@ function at-exit {
   mkdir -p artifacts
   ccache --print-stats > artifacts/ccache_stats.txt
   cp "${BUILD_DIR}"/.ninja_log artifacts/.ninja_log
-  ls artifacts/
 
   # If building fails there will be no results files.
   shopt -s nullglob

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] release/20.x: [X86][AVX10] Remove VAES and VPCLMULQDQ feature from AVX10.1 (#135489) (PR #135577)

2025-04-14 Thread Simon Pilgrim via llvm-branch-commits


https://github.com/RKSimon approved this pull request.

LGTM - cheers

https://github.com/llvm/llvm-project/pull/135577
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [RISCV][NFC] Use bitmasks generated by TableGen (PR #135600)

2025-04-14 Thread Piyou Chen via llvm-branch-commits


https://github.com/BeMg approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/135600
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [GOFF] Add writing of section symbols (PR #133799)

2025-04-14 Thread via llvm-branch-commits


https://github.com/AidoP edited https://github.com/llvm/llvm-project/pull/133799
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [GOFF] Add writing of section symbols (PR #133799)

2025-04-14 Thread via llvm-branch-commits


https://github.com/AidoP edited https://github.com/llvm/llvm-project/pull/133799
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [compiler-rt] [llvm] Reentry (PR #135656)

2025-04-14 Thread Mircea Trofin via llvm-branch-commits


https://github.com/mtrofin created 
https://github.com/llvm/llvm-project/pull/135656

None

>From e7b5f81cf9bd3237b0b1ee2f51f105b99f1061b7 Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Mon, 14 Apr 2025 07:19:58 -0700
Subject: [PATCH] Reentry

---
 .../lib/ctx_profile/CtxInstrProfiling.cpp | 151 --
 .../tests/CtxInstrProfilingTest.cpp   | 115 -
 .../llvm/ProfileData/CtxInstrContextNode.h|   6 +-
 .../Instrumentation/PGOCtxProfLowering.cpp|  82 ++
 .../PGOProfile/ctx-instrumentation.ll |   4 +-
 5 files changed, 269 insertions(+), 89 deletions(-)

diff --git a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp 
b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
index 2d173f0fcb19a..2e26541c1acea 100644
--- a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
+++ b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
@@ -41,7 +41,44 @@ Arena *FlatCtxArena = nullptr;
 
 // Set to true when we enter a root, and false when we exit - regardless if 
this
 // thread collects a contextual profile for that root.
-__thread bool IsUnderContext = false;
+__thread int UnderContextRefCount = 0;
+__thread void *volatile EnteredContextAddress = 0;
+
+void onFunctionEntered(void *Address) {
+  UnderContextRefCount += (Address == EnteredContextAddress);
+  assert(UnderContextRefCount > 0);
+}
+
+void onFunctionExited(void *Address) {
+  UnderContextRefCount -= (Address == EnteredContextAddress);
+  assert(UnderContextRefCount >= 0);
+}
+
+// Returns true if it was entered the first time
+bool rootEnterIsFirst(void* Address) {
+  bool Ret = true;
+  if (!EnteredContextAddress) {
+EnteredContextAddress = Address;
+assert(UnderContextRefCount == 0);
+Ret = true;
+  }
+  onFunctionEntered(Address);
+  return Ret;
+}
+
+// Return true if this also exits the root.
+bool exitsRoot(void* Address) {
+  onFunctionExited(Address);
+  if (UnderContextRefCount == 0) {
+EnteredContextAddress = nullptr;
+return true;
+  }
+  return false;
+
+}
+
+bool hasEnteredARoot() { return UnderContextRefCount > 0; }
+
 __sanitizer::atomic_uint8_t ProfilingStarted = {};
 
 __sanitizer::atomic_uintptr_t RootDetector = {};
@@ -287,62 +324,65 @@ ContextRoot *FunctionData::getOrAllocateContextRoot() {
   return Root;
 }
 
-ContextNode *tryStartContextGivenRoot(ContextRoot *Root, GUID Guid,
-  uint32_t Counters, uint32_t Callsites)
-SANITIZER_NO_THREAD_SAFETY_ANALYSIS {
-  IsUnderContext = true;
-  __sanitizer::atomic_fetch_add(&Root->TotalEntries, 1,
-__sanitizer::memory_order_relaxed);
+ContextNode *tryStartContextGivenRoot(
+ContextRoot *Root, void *EntryAddress, GUID Guid, uint32_t Counters,
+uint32_t Callsites) SANITIZER_NO_THREAD_SAFETY_ANALYSIS {
+
+  if (rootEnterIsFirst(EntryAddress))
+__sanitizer::atomic_fetch_add(&Root->TotalEntries, 1,
+  __sanitizer::memory_order_relaxed);
   if (!Root->FirstMemBlock) {
 setupContext(Root, Guid, Counters, Callsites);
   }
   if (Root->Taken.TryLock()) {
+assert(__llvm_ctx_profile_current_context_root == nullptr);
 __llvm_ctx_profile_current_context_root = Root;
 onContextEnter(*Root->FirstNode);
 return Root->FirstNode;
   }
   // If this thread couldn't take the lock, return scratch context.
-  __llvm_ctx_profile_current_context_root = nullptr;
   return TheScratchContext;
 }
 
+ContextNode *getOrStartContextOutsideCollection(FunctionData &Data,
+ContextRoot *OwnCtxRoot,
+void *Callee, GUID Guid,
+uint32_t NumCounters,
+uint32_t NumCallsites) {
+  // This must only be called when __llvm_ctx_profile_current_context_root is
+  // null.
+  assert(__llvm_ctx_profile_current_context_root == nullptr);
+  // OwnCtxRoot is Data.CtxRoot. Since it's volatile, and is used by the 
caller,
+  // pre-load it.
+  assert(Data.CtxRoot == OwnCtxRoot);
+  // If we have a root detector, try sampling.
+  // Otherwise - regardless if we started profiling or not, if Data.CtxRoot is
+  // allocated, try starting a context tree - basically, as-if
+  // __llvm_ctx_profile_start_context were called.
+  if (auto *RAD = getRootDetector())
+RAD->sample();
+  else if (reinterpret_cast(OwnCtxRoot) > 1)
+return tryStartContextGivenRoot(OwnCtxRoot, Data.EntryAddress, Guid,
+NumCounters, NumCallsites);
+
+  // If we didn't start profiling, or if we are under a context, just not
+  // collecting, return the scratch buffer.
+  if (hasEnteredARoot() ||
+  !__sanitizer::atomic_load_relaxed(&ProfilingStarted))
+return TheScratchContext;
+  return markAsScratch(
+  onContextEnter(*getFlatProfile(Data, Callee, Guid, NumCounters)));
+}
+
 ContextNode *getUnhandledContext(FunctionData &Data,

[llvm-branch-commits] [mlir] [MLIR][ArmSVE] Add initial lowering of vector.contract to SVE `*MMLA` instructions (PR #135636)

2025-04-14 Thread Momchil Velikov via llvm-branch-commits


https://github.com/momchil-velikov created 
https://github.com/llvm/llvm-project/pull/135636

Supersedes https://github.com/llvm/llvm-project/pull/135359

>From 2e61d3ee7b9ac88ae1be8ca248dad1a0880ccff4 Mon Sep 17 00:00:00 2001
From: Momchil Velikov 
Date: Tue, 8 Apr 2025 14:43:54 +
Subject: [PATCH] [MLIR][ArmSVE] Add initial lowering of `vector.contract` to
 SVE `*MMLA` instructions

---
 mlir/include/mlir/Conversion/Passes.td|   4 +
 .../Dialect/ArmSVE/Transforms/Transforms.h|   3 +
 .../Conversion/VectorToLLVM/CMakeLists.txt|   1 +
 .../VectorToLLVM/ConvertVectorToLLVMPass.cpp  |   7 +
 .../LowerContractionToSMMLAPattern.cpp|   5 +-
 .../Dialect/ArmSVE/Transforms/CMakeLists.txt  |   1 +
 .../LowerContractionToSVEI8MMPattern.cpp  | 304 ++
 .../Vector/CPU/ArmSVE/vector-smmla.mlir   |  94 ++
 .../Vector/CPU/ArmSVE/vector-summla.mlir  |  85 +
 .../Vector/CPU/ArmSVE/vector-ummla.mlir   |  94 ++
 .../Vector/CPU/ArmSVE/vector-usmmla.mlir  |  95 ++
 .../CPU/ArmSVE/contraction-smmla-4x8x4.mlir   | 117 +++
 .../ArmSVE/contraction-smmla-8x8x8-vs2.mlir   | 159 +
 .../CPU/ArmSVE/contraction-summla-4x8x4.mlir  | 118 +++
 .../CPU/ArmSVE/contraction-ummla-4x8x4.mlir   | 119 +++
 .../CPU/ArmSVE/contraction-usmmla-4x8x4.mlir  | 117 +++
 16 files changed, 1322 insertions(+), 1 deletion(-)
 create mode 100644 
mlir/lib/Dialect/ArmSVE/Transforms/LowerContractionToSVEI8MMPattern.cpp
 create mode 100644 mlir/test/Dialect/Vector/CPU/ArmSVE/vector-smmla.mlir
 create mode 100644 mlir/test/Dialect/Vector/CPU/ArmSVE/vector-summla.mlir
 create mode 100644 mlir/test/Dialect/Vector/CPU/ArmSVE/vector-ummla.mlir
 create mode 100644 mlir/test/Dialect/Vector/CPU/ArmSVE/vector-usmmla.mlir
 create mode 100644 
mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/contraction-smmla-4x8x4.mlir
 create mode 100644 
mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/contraction-smmla-8x8x8-vs2.mlir
 create mode 100644 
mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/contraction-summla-4x8x4.mlir
 create mode 100644 
mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/contraction-ummla-4x8x4.mlir
 create mode 100644 
mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/contraction-usmmla-4x8x4.mlir

diff --git a/mlir/include/mlir/Conversion/Passes.td 
b/mlir/include/mlir/Conversion/Passes.td
index bbba495e613b2..930d8b44abca0 100644
--- a/mlir/include/mlir/Conversion/Passes.td
+++ b/mlir/include/mlir/Conversion/Passes.td
@@ -1406,6 +1406,10 @@ def ConvertVectorToLLVMPass : 
Pass<"convert-vector-to-llvm"> {
"bool", /*default=*/"false",
"Enables the use of ArmSVE dialect while lowering the vector "
"dialect.">,
+Option<"armI8MM", "enable-arm-i8mm",
+   "bool", /*default=*/"false",
+   "Enables the use of Arm FEAT_I8MM instructions while lowering "
+   "the vector dialect.">,
 Option<"x86Vector", "enable-x86vector",
"bool", /*default=*/"false",
"Enables the use of X86Vector dialect while lowering the vector "
diff --git a/mlir/include/mlir/Dialect/ArmSVE/Transforms/Transforms.h 
b/mlir/include/mlir/Dialect/ArmSVE/Transforms/Transforms.h
index 8665c8224cc45..232e2be29e574 100644
--- a/mlir/include/mlir/Dialect/ArmSVE/Transforms/Transforms.h
+++ b/mlir/include/mlir/Dialect/ArmSVE/Transforms/Transforms.h
@@ -20,6 +20,9 @@ class RewritePatternSet;
 void populateArmSVELegalizeForLLVMExportPatterns(
 const LLVMTypeConverter &converter, RewritePatternSet &patterns);
 
+void populateLowerContractionToSVEI8MMPatternPatterns(
+RewritePatternSet &patterns);
+
 /// Configure the target to support lowering ArmSVE ops to ops that map to LLVM
 /// intrinsics.
 void configureArmSVELegalizeForExportTarget(LLVMConversionTarget &target);
diff --git a/mlir/lib/Conversion/VectorToLLVM/CMakeLists.txt 
b/mlir/lib/Conversion/VectorToLLVM/CMakeLists.txt
index 330474a718e30..8e2620029c354 100644
--- a/mlir/lib/Conversion/VectorToLLVM/CMakeLists.txt
+++ b/mlir/lib/Conversion/VectorToLLVM/CMakeLists.txt
@@ -35,6 +35,7 @@ add_mlir_conversion_library(MLIRVectorToLLVMPass
   MLIRVectorToLLVM
 
   MLIRArmNeonDialect
+  MLIRArmNeonTransforms
   MLIRArmSVEDialect
   MLIRArmSVETransforms
   MLIRAMXDialect
diff --git a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp 
b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp
index 7082b92c95d1d..1e6c8122b1d0e 100644
--- a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp
+++ b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp
@@ -14,6 +14,7 @@
 #include "mlir/Dialect/AMX/Transforms.h"
 #include "mlir/Dialect/Arith/IR/Arith.h"
 #include "mlir/Dialect/ArmNeon/ArmNeonDialect.h"
+#include "mlir/Dialect/ArmNeon/Transforms.h"
 #include "mlir/Dialect/ArmSVE/IR/ArmSVEDialect.h"
 #include "mlir/Dialect/ArmSVE/Transforms/Transforms.h"
 #include "mlir/Dialect/Func/IR/FuncOps.h"
@@ -82,6 +83,12 @@ void ConvertVectorToLLVMPass::

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: refactor issue reporting (PR #135662)

2025-04-14 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko created 
https://github.com/llvm/llvm-project/pull/135662

Remove `getAffectedRegisters` and `setOverwritingInstrs` methods from
the base `Report` class. Instead, make `Report` always represent the
brief version of the report. When an issue is detected on the first run
of the analysis, return an optional request for extra details to attach
to the report on the second run.

>From c82cab53c33623fa9d6384b58946eaaab7807270 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Mon, 14 Apr 2025 15:08:54 +0300
Subject: [PATCH] [BOLT] Gadget scanner: refactor issue reporting

Remove `getAffectedRegisters` and `setOverwritingInstrs` methods from
the base `Report` class. Instead, make `Report` always represent the
brief version of the report. When an issue is detected on the first run
of the analysis, return an optional request for extra details to attach
to the report on the second run.
---
 bolt/include/bolt/Passes/PAuthGadgetScanner.h | 102 ++---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 200 ++
 .../AArch64/gs-pauth-debug-output.s   |   8 +-
 3 files changed, 187 insertions(+), 123 deletions(-)

diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h 
b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
index 3e39b64e59e0f..3b6c1f6af94a0 100644
--- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h
+++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
@@ -219,11 +219,6 @@ struct Report {
   virtual void generateReport(raw_ostream &OS,
   const BinaryContext &BC) const = 0;
 
-  // The two methods below are called by Analysis::computeDetailedInfo when
-  // iterating over the reports.
-  virtual const ArrayRef getAffectedRegisters() const { return {}; }
-  virtual void setOverwritingInstrs(const ArrayRef Instrs) {}
-
   void printBasicInfo(raw_ostream &OS, const BinaryContext &BC,
   StringRef IssueKind) const;
 };
@@ -231,27 +226,11 @@ struct Report {
 struct GadgetReport : public Report {
   // The particular kind of gadget that is detected.
   const GadgetKind &Kind;
-  // The set of registers related to this gadget report (possibly empty).
-  SmallVector AffectedRegisters;
-  // The instructions that clobber the affected registers.
-  // There is no one-to-one correspondence with AffectedRegisters: for example,
-  // the same register can be overwritten by different instructions in 
different
-  // preceding basic blocks.
-  SmallVector OverwritingInstrs;
-
-  GadgetReport(const GadgetKind &Kind, MCInstReference Location,
-   MCPhysReg AffectedRegister)
-  : Report(Location), Kind(Kind), AffectedRegisters({AffectedRegister}) {}
-
-  void generateReport(raw_ostream &OS, const BinaryContext &BC) const override;
 
-  const ArrayRef getAffectedRegisters() const override {
-return AffectedRegisters;
-  }
+  GadgetReport(const GadgetKind &Kind, MCInstReference Location)
+  : Report(Location), Kind(Kind) {}
 
-  void setOverwritingInstrs(const ArrayRef Instrs) override {
-OverwritingInstrs.assign(Instrs.begin(), Instrs.end());
-  }
+  void generateReport(raw_ostream &OS, const BinaryContext &BC) const override;
 };
 
 /// Report with a free-form message attached.
@@ -263,8 +242,75 @@ struct GenericReport : public Report {
   const BinaryContext &BC) const override;
 };
 
+/// An information about an issue collected on the slower, detailed,
+/// run of an analysis.
+class ExtraInfo {
+public:
+  virtual void print(raw_ostream &OS, const MCInstReference Location) const = 
0;
+
+  virtual ~ExtraInfo() {}
+};
+
+class ClobberingInfo : public ExtraInfo {
+  SmallVector ClobberingInstrs;
+
+public:
+  ClobberingInfo(const ArrayRef Instrs)
+  : ClobberingInstrs(Instrs) {}
+
+  void print(raw_ostream &OS, const MCInstReference Location) const override;
+};
+
+/// A brief version of a report that can be further augmented with the details.
+///
+/// It is common for a particular type of gadget detector to be tied to some
+/// specific kind of analysis. If an issue is returned by that detector, it may
+/// be further augmented with the detailed info in an analysis-specific way,
+/// or just be left as-is (f.e. if a free-form warning was reported).
+template  struct BriefReport {
+  BriefReport(std::shared_ptr Issue,
+  const std::optional RequestedDetails)
+  : Issue(Issue), RequestedDetails(RequestedDetails) {}
+
+  std::shared_ptr Issue;
+  std::optional RequestedDetails;
+};
+
+/// A detailed version of a report.
+struct DetailedReport {
+  DetailedReport(std::shared_ptr Issue,
+ std::shared_ptr Details)
+  : Issue(Issue), Details(Details) {}
+
+  std::shared_ptr Issue;
+  std::shared_ptr Details;
+};
+
 struct FunctionAnalysisResult {
-  std::vector> Diagnostics;
+  std::vector Diagnostics;
+};
+
+/// A helper class storing per-function context to be instantiated by Analysis.
+class FunctionAnalysis {
+  BinaryContext &B

[llvm-branch-commits] [clang] release/20.x: [clang] Introduce "binary" StringLiteral for #embed data (#127629) (PR #133460)

2025-04-14 Thread via llvm-branch-commits


github-actions[bot] wrote:

@Fznamznon (or anyone else). If you would like to add a note about this fix in 
the release notes (completely optional). Please reply to this comment with a 
one or two sentence description of the fix.  When you are done, please add the 
release:note label to this PR. 

https://github.com/llvm/llvm-project/pull/133460
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect authentication oracles (PR #135663)

2025-04-14 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko created 
https://github.com/llvm/llvm-project/pull/135663

Implement the detection of authentication instructions whose results can
be inspected by an attacker to know whether authentication succeeded.

As the properties of output registers of authentication instructions are
inspected, add a second set of analysis-related classes to iterate over
the instructions in reverse order.

>From 3e21f98012fa793d62d8ed588b2551e4ef757498 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Sat, 5 Apr 2025 14:54:01 +0300
Subject: [PATCH] [BOLT] Gadget scanner: detect authentication oracles

Implement the detection of authentication instructions whose results can
be inspected by an attacker to know whether authentication succeeded.

As the properties of output registers of authentication instructions are
inspected, add a second set of analysis-related classes to iterate over
the instructions in reverse order.
---
 bolt/include/bolt/Core/MCPlusBuilder.h|   3 +
 bolt/include/bolt/Passes/PAuthGadgetScanner.h |  12 +
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 537 ++
 .../AArch64/gs-pauth-authentication-oracles.s | 675 ++
 .../AArch64/gs-pauth-debug-output.s   |  78 ++
 5 files changed, 1305 insertions(+)
 create mode 100644 
bolt/test/binary-analysis/AArch64/gs-pauth-authentication-oracles.s

diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h 
b/bolt/include/bolt/Core/MCPlusBuilder.h
index 9d50036b8083b..4f895fb5f9cc5 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -622,6 +622,9 @@ class MCPlusBuilder {
   /// controlled, provided InReg and executable code are not. Please note that
   /// registers other than InReg as well as the contents of memory which is
   /// writable by the process should be considered attacker-controlled.
+  ///
+  /// The instruction should not write any values derived from InReg anywhere,
+  /// except for OutReg.
   virtual std::optional>
   analyzeAddressArithmeticsForPtrAuth(const MCInst &Inst) const {
 llvm_unreachable("not implemented");
diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h 
b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
index 3b6c1f6af94a0..2b923e362941f 100644
--- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h
+++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
@@ -261,6 +261,15 @@ class ClobberingInfo : public ExtraInfo {
   void print(raw_ostream &OS, const MCInstReference Location) const override;
 };
 
+class LeakageInfo : public ExtraInfo {
+  SmallVector LeakingInstrs;
+
+public:
+  LeakageInfo(const ArrayRef Instrs) : LeakingInstrs(Instrs) 
{}
+
+  void print(raw_ostream &OS, const MCInstReference Location) const override;
+};
+
 /// A brief version of a report that can be further augmented with the details.
 ///
 /// It is common for a particular type of gadget detector to be tied to some
@@ -302,6 +311,9 @@ class FunctionAnalysis {
   void findUnsafeUses(SmallVector> &Reports);
   void augmentUnsafeUseReports(const ArrayRef> Reports);
 
+  void findUnsafeDefs(SmallVector> &Reports);
+  void augmentUnsafeDefReports(const ArrayRef> Reports);
+
 public:
   FunctionAnalysis(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy 
AllocatorId,
bool PacRetGadgetsOnly)
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index b3081f034e8ee..f403caddf3fd8 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -712,6 +712,460 @@ SrcSafetyAnalysis::create(BinaryFunction &BF,
RegsToTrackInstsFor);
 }
 
+/// A state representing which registers are safe to be used as the destination
+/// operand of an authentication instruction.
+///
+/// Similar to SrcState, it is the analysis that should take register aliasing
+/// into account.
+///
+/// Depending on the implementation, it may be possible that an authentication
+/// instruction returns an invalid pointer on failure instead of terminating
+/// the program immediately (assuming the program will crash as soon as that
+/// pointer is dereferenced). To prevent brute-forcing the correct signature,
+/// it should be impossible for an attacker to test if a pointer is correctly
+/// signed - either the program should be terminated on authentication failure
+/// or it should be impossible to tell whether authentication succeeded or not.
+///
+/// For that reason, a restricted set of operations is allowed on any register
+/// containing a value derived from the result of an authentication instruction
+/// until that register is either wiped or checked not to contain a result of a
+/// failed authentication.
+///
+/// Specifically, the safety property for a register is computed by iterating
+/// the instructions in backward order: the source register Xn of an 
instruction
+/// Inst is safe if at least one of the following is true:
+///

[llvm-branch-commits] [clang] release/20.x: [modules] Handle friend function that was a definition but became only a declaration during AST deserialization (#132214) (PR #134232)

2025-04-14 Thread Dmitry Polukhin via llvm-branch-commits


https://github.com/dmpolukhin closed 
https://github.com/llvm/llvm-project/pull/134232
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/20.x: [clang] Introduce "binary" StringLiteral for #embed data (#127629) (PR #133460)

2025-04-14 Thread Tom Stellard via llvm-branch-commits


https://github.com/tstellar updated 
https://github.com/llvm/llvm-project/pull/133460

>From 2e7710eaffddcbb6094e32826ec6e69bb4cb1799 Mon Sep 17 00:00:00 2001
From: Mariya Podchishchaeva 
Date: Thu, 20 Mar 2025 13:02:29 +0100
Subject: [PATCH 1/2] [clang] Introduce "binary" StringLiteral for #embed data
 (#127629)

StringLiteral is used as internal data of EmbedExpr and we directly use
it as an initializer if a single EmbedExpr appears in the initializer
list of a char array. It is fast and convenient, but it is causing
problems when string literal character values are checked because #embed
data values are within a range [0-2^(char width)] but ordinary
StringLiteral is of maybe signed char type.
This PR introduces new kind of StringLiteral to hold binary data coming
from an embedded resource to mitigate these problems. The new kind of
StringLiteral is not assumed to have signed char type. The new kind of
StringLiteral also helps to prevent crashes when trying to find
StringLiteral token locations since these simply do not exist for binary
data.

Fixes https://github.com/llvm/llvm-project/issues/119256
---
 clang/include/clang/AST/Expr.h| 15 ---
 clang/lib/AST/Expr.cpp|  7 +++
 clang/lib/Parse/ParseInit.cpp |  2 +-
 clang/lib/Sema/SemaInit.cpp   |  1 +
 clang/test/Preprocessor/embed_constexpr.c | 21 +
 5 files changed, 42 insertions(+), 4 deletions(-)
 create mode 100644 clang/test/Preprocessor/embed_constexpr.c

diff --git a/clang/include/clang/AST/Expr.h b/clang/include/clang/AST/Expr.h
index 7be4022649329..06ac0f1704aa9 100644
--- a/clang/include/clang/AST/Expr.h
+++ b/clang/include/clang/AST/Expr.h
@@ -1752,7 +1752,14 @@ enum class StringLiteralKind {
   UTF8,
   UTF16,
   UTF32,
-  Unevaluated
+  Unevaluated,
+  // Binary kind of string literal is used for the data coming via #embed
+  // directive. File's binary contents is transformed to a special kind of
+  // string literal that in some cases may be used directly as an initializer
+  // and some features of classic string literals are not applicable to this
+  // kind of a string literal, for example finding a particular byte's source
+  // location for better diagnosing.
+  Binary
 };
 
 /// StringLiteral - This represents a string literal expression, e.g. "foo"
@@ -1884,6 +1891,8 @@ class StringLiteral final
   int64_t getCodeUnitS(size_t I, uint64_t BitWidth) const {
 int64_t V = getCodeUnit(I);
 if (isOrdinary() || isWide()) {
+  // Ordinary and wide string literals have types that can be signed.
+  // It is important for checking C23 constexpr initializers.
   unsigned Width = getCharByteWidth() * BitWidth;
   llvm::APInt AInt(Width, (uint64_t)V);
   V = AInt.getSExtValue();
@@ -4965,9 +4974,9 @@ class EmbedExpr final : public Expr {
   assert(EExpr && CurOffset != ULLONG_MAX &&
  "trying to dereference an invalid iterator");
   IntegerLiteral *N = EExpr->FakeChildNode;
-  StringRef DataRef = EExpr->Data->BinaryData->getBytes();
   N->setValue(*EExpr->Ctx,
-  llvm::APInt(N->getValue().getBitWidth(), DataRef[CurOffset],
+  llvm::APInt(N->getValue().getBitWidth(),
+  EExpr->Data->BinaryData->getCodeUnit(CurOffset),
   N->getType()->isSignedIntegerType()));
   // We want to return a reference to the fake child node in the
   // EmbedExpr, not the local variable N.
diff --git a/clang/lib/AST/Expr.cpp b/clang/lib/AST/Expr.cpp
index aa7e14329a21b..8571b617c70eb 100644
--- a/clang/lib/AST/Expr.cpp
+++ b/clang/lib/AST/Expr.cpp
@@ -1104,6 +1104,7 @@ unsigned StringLiteral::mapCharByteWidth(TargetInfo const 
&Target,
   switch (SK) {
   case StringLiteralKind::Ordinary:
   case StringLiteralKind::UTF8:
+  case StringLiteralKind::Binary:
 CharByteWidth = Target.getCharWidth();
 break;
   case StringLiteralKind::Wide:
@@ -1216,6 +1217,7 @@ void StringLiteral::outputString(raw_ostream &OS) const {
   switch (getKind()) {
   case StringLiteralKind::Unevaluated:
   case StringLiteralKind::Ordinary:
+  case StringLiteralKind::Binary:
 break; // no prefix.
   case StringLiteralKind::Wide:
 OS << 'L';
@@ -1332,6 +1334,11 @@ StringLiteral::getLocationOfByte(unsigned ByteNo, const 
SourceManager &SM,
  const LangOptions &Features,
  const TargetInfo &Target, unsigned 
*StartToken,
  unsigned *StartTokenByteOffset) const {
+  // No source location of bytes for binary literals since they don't come from
+  // source.
+  if (getKind() == StringLiteralKind::Binary)
+return getStrTokenLoc(0);
+
   assert((getKind() == StringLiteralKind::Ordinary ||
   getKind() == StringLiteralKind::UTF8 ||
   getKind() == StringLiteralKind::Unevaluated) &&
diff --git a/clang/lib/Parse/ParseInit.cpp b/clang/lib/Par

[llvm-branch-commits] [clang] release/20.x: [clang] Introduce "binary" StringLiteral for #embed data (#127629) (PR #133460)

2025-04-14 Thread Tom Stellard via llvm-branch-commits


https://github.com/tstellar closed 
https://github.com/llvm/llvm-project/pull/133460
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect authentication oracles (PR #135663)

2025-04-14 Thread Anatoly Trosinenko via llvm-branch-commits


atrosinenko wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/135663?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#135663** https://app.graphite.dev/github/pr/llvm/llvm-project/135663?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/135663?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#135662** https://app.graphite.dev/github/pr/llvm/llvm-project/135662?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#135661** https://app.graphite.dev/github/pr/llvm/llvm-project/135661?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#134146** https://app.graphite.dev/github/pr/llvm/llvm-project/134146?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#133461** https://app.graphite.dev/github/pr/llvm/llvm-project/133461?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#135073** https://app.graphite.dev/github/pr/llvm/llvm-project/135073?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/135663
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: refactor issue reporting (PR #135662)

2025-04-14 Thread Anatoly Trosinenko via llvm-branch-commits


atrosinenko wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/135662?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#135663** https://app.graphite.dev/github/pr/llvm/llvm-project/135663?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#135662** https://app.graphite.dev/github/pr/llvm/llvm-project/135662?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/135662?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#135661** https://app.graphite.dev/github/pr/llvm/llvm-project/135661?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#134146** https://app.graphite.dev/github/pr/llvm/llvm-project/134146?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#133461** https://app.graphite.dev/github/pr/llvm/llvm-project/133461?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#135073** https://app.graphite.dev/github/pr/llvm/llvm-project/135073?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/135662
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: use more appropriate types (NFC) (PR #135661)

2025-04-14 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko created 
https://github.com/llvm/llvm-project/pull/135661

* use more flexible `const ArrayRef` and `StringRef` types instead of
  `const std::vector &` and `const std::string &`, correspondingly,
  for function arguments
* return plain `const SrcState &` instead of `ErrorOr`
  from `SrcSafetyAnalysis::getStateBefore`, as absent state is not
  handled gracefully by any caller

>From 51373db0c000ad32a91eb4097ccc4404a6e54d25 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Mon, 14 Apr 2025 14:35:56 +0300
Subject: [PATCH] [BOLT] Gadget scanner: use more appropriate types (NFC)

* use more flexible `const ArrayRef` and `StringRef` types instead of
  `const std::vector &` and `const std::string &`, correspondingly,
  for function arguments
* return plain `const SrcState &` instead of `ErrorOr`
  from `SrcSafetyAnalysis::getStateBefore`, as absent state is not
  handled gracefully by any caller
---
 bolt/include/bolt/Passes/PAuthGadgetScanner.h |  8 +---
 bolt/lib/Passes/PAuthGadgetScanner.cpp| 39 ---
 2 files changed, 19 insertions(+), 28 deletions(-)

diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h 
b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
index 6765e2aff414f..3e39b64e59e0f 100644
--- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h
+++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
@@ -12,7 +12,6 @@
 #include "bolt/Core/BinaryContext.h"
 #include "bolt/Core/BinaryFunction.h"
 #include "bolt/Passes/BinaryPasses.h"
-#include "llvm/ADT/SmallSet.h"
 #include "llvm/Support/raw_ostream.h"
 #include 
 
@@ -199,9 +198,6 @@ raw_ostream &operator<<(raw_ostream &OS, const 
MCInstReference &);
 
 namespace PAuthGadgetScanner {
 
-class SrcSafetyAnalysis;
-struct SrcState;
-
 /// Description of a gadget kind that can be detected. Intended to be
 /// statically allocated to be attached to reports by reference.
 class GadgetKind {
@@ -210,7 +206,7 @@ class GadgetKind {
 public:
   GadgetKind(const char *Description) : Description(Description) {}
 
-  const StringRef getDescription() const { return Description; }
+  StringRef getDescription() const { return Description; }
 };
 
 /// Base report located at some instruction, without any additional 
information.
@@ -261,7 +257,7 @@ struct GadgetReport : public Report {
 /// Report with a free-form message attached.
 struct GenericReport : public Report {
   std::string Text;
-  GenericReport(MCInstReference Location, const std::string &Text)
+  GenericReport(MCInstReference Location, StringRef Text)
   : Report(Location), Text(Text) {}
   virtual void generateReport(raw_ostream &OS,
   const BinaryContext &BC) const override;
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index ad47bdff753c8..ed89471cbb8d3 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -91,14 +91,14 @@ class TrackedRegisters {
   const std::vector Registers;
   std::vector RegToIndexMapping;
 
-  static size_t getMappingSize(const std::vector &RegsToTrack) {
+  static size_t getMappingSize(const ArrayRef RegsToTrack) {
 if (RegsToTrack.empty())
   return 0;
 return 1 + *llvm::max_element(RegsToTrack);
   }
 
 public:
-  TrackedRegisters(const std::vector &RegsToTrack)
+  TrackedRegisters(const ArrayRef RegsToTrack)
   : Registers(RegsToTrack),
 RegToIndexMapping(getMappingSize(RegsToTrack), NoIndex) {
 for (unsigned I = 0; I < RegsToTrack.size(); ++I)
@@ -234,7 +234,7 @@ struct SrcState {
 
 static void printLastInsts(
 raw_ostream &OS,
-const std::vector> &LastInstWritingReg) {
+const ArrayRef> LastInstWritingReg) {
   OS << "Insts: ";
   for (unsigned I = 0; I < LastInstWritingReg.size(); ++I) {
 auto &Set = LastInstWritingReg[I];
@@ -295,7 +295,7 @@ void SrcStatePrinter::print(raw_ostream &OS, const SrcState 
&S) const {
 class SrcSafetyAnalysis {
 public:
   SrcSafetyAnalysis(BinaryFunction &BF,
-const std::vector &RegsToTrackInstsFor)
+const ArrayRef RegsToTrackInstsFor)
   : BC(BF.getBinaryContext()), NumRegs(BC.MRI->getNumRegs()),
 RegsToTrackInstsFor(RegsToTrackInstsFor) {}
 
@@ -303,11 +303,10 @@ class SrcSafetyAnalysis {
 
   static std::shared_ptr
   create(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId,
- const std::vector &RegsToTrackInstsFor);
+ const ArrayRef RegsToTrackInstsFor);
 
   virtual void run() = 0;
-  virtual ErrorOr
-  getStateBefore(const MCInst &Inst) const = 0;
+  virtual const SrcState &getStateBefore(const MCInst &Inst) const = 0;
 
 protected:
   BinaryContext &BC;
@@ -348,7 +347,7 @@ class SrcSafetyAnalysis {
   }
 
   BitVector getClobberedRegs(const MCInst &Point) const {
-BitVector Clobbered(NumRegs, false);
+BitVector Clobbered(NumRegs);
 // Assume a call can clobber all registers, including callee-saved
 // registers. There's a

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: use more appropriate types (NFC) (PR #135661)

2025-04-14 Thread Anatoly Trosinenko via llvm-branch-commits


atrosinenko wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/135661?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#135663** https://app.graphite.dev/github/pr/llvm/llvm-project/135663?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#135662** https://app.graphite.dev/github/pr/llvm/llvm-project/135662?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#135661** https://app.graphite.dev/github/pr/llvm/llvm-project/135661?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/135661?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#134146** https://app.graphite.dev/github/pr/llvm/llvm-project/134146?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#133461** https://app.graphite.dev/github/pr/llvm/llvm-project/133461?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#135073** https://app.graphite.dev/github/pr/llvm/llvm-project/135073?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/135661
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: refactor issue reporting (PR #135662)

2025-04-14 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-bolt

Author: Anatoly Trosinenko (atrosinenko)


Changes

Remove `getAffectedRegisters` and `setOverwritingInstrs` methods from
the base `Report` class. Instead, make `Report` always represent the
brief version of the report. When an issue is detected on the first run
of the analysis, return an optional request for extra details to attach
to the report on the second run.

---

Patch is 21.59 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/135662.diff


3 Files Affected:

- (modified) bolt/include/bolt/Passes/PAuthGadgetScanner.h (+71-31) 
- (modified) bolt/lib/Passes/PAuthGadgetScanner.cpp (+112-88) 
- (modified) bolt/test/binary-analysis/AArch64/gs-pauth-debug-output.s (+4-4) 


``diff
diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h 
b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
index 3e39b64e59e0f..3b6c1f6af94a0 100644
--- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h
+++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
@@ -219,11 +219,6 @@ struct Report {
   virtual void generateReport(raw_ostream &OS,
   const BinaryContext &BC) const = 0;
 
-  // The two methods below are called by Analysis::computeDetailedInfo when
-  // iterating over the reports.
-  virtual const ArrayRef getAffectedRegisters() const { return {}; }
-  virtual void setOverwritingInstrs(const ArrayRef Instrs) {}
-
   void printBasicInfo(raw_ostream &OS, const BinaryContext &BC,
   StringRef IssueKind) const;
 };
@@ -231,27 +226,11 @@ struct Report {
 struct GadgetReport : public Report {
   // The particular kind of gadget that is detected.
   const GadgetKind &Kind;
-  // The set of registers related to this gadget report (possibly empty).
-  SmallVector AffectedRegisters;
-  // The instructions that clobber the affected registers.
-  // There is no one-to-one correspondence with AffectedRegisters: for example,
-  // the same register can be overwritten by different instructions in 
different
-  // preceding basic blocks.
-  SmallVector OverwritingInstrs;
-
-  GadgetReport(const GadgetKind &Kind, MCInstReference Location,
-   MCPhysReg AffectedRegister)
-  : Report(Location), Kind(Kind), AffectedRegisters({AffectedRegister}) {}
-
-  void generateReport(raw_ostream &OS, const BinaryContext &BC) const override;
 
-  const ArrayRef getAffectedRegisters() const override {
-return AffectedRegisters;
-  }
+  GadgetReport(const GadgetKind &Kind, MCInstReference Location)
+  : Report(Location), Kind(Kind) {}
 
-  void setOverwritingInstrs(const ArrayRef Instrs) override {
-OverwritingInstrs.assign(Instrs.begin(), Instrs.end());
-  }
+  void generateReport(raw_ostream &OS, const BinaryContext &BC) const override;
 };
 
 /// Report with a free-form message attached.
@@ -263,8 +242,75 @@ struct GenericReport : public Report {
   const BinaryContext &BC) const override;
 };
 
+/// An information about an issue collected on the slower, detailed,
+/// run of an analysis.
+class ExtraInfo {
+public:
+  virtual void print(raw_ostream &OS, const MCInstReference Location) const = 
0;
+
+  virtual ~ExtraInfo() {}
+};
+
+class ClobberingInfo : public ExtraInfo {
+  SmallVector ClobberingInstrs;
+
+public:
+  ClobberingInfo(const ArrayRef Instrs)
+  : ClobberingInstrs(Instrs) {}
+
+  void print(raw_ostream &OS, const MCInstReference Location) const override;
+};
+
+/// A brief version of a report that can be further augmented with the details.
+///
+/// It is common for a particular type of gadget detector to be tied to some
+/// specific kind of analysis. If an issue is returned by that detector, it may
+/// be further augmented with the detailed info in an analysis-specific way,
+/// or just be left as-is (f.e. if a free-form warning was reported).
+template  struct BriefReport {
+  BriefReport(std::shared_ptr Issue,
+  const std::optional RequestedDetails)
+  : Issue(Issue), RequestedDetails(RequestedDetails) {}
+
+  std::shared_ptr Issue;
+  std::optional RequestedDetails;
+};
+
+/// A detailed version of a report.
+struct DetailedReport {
+  DetailedReport(std::shared_ptr Issue,
+ std::shared_ptr Details)
+  : Issue(Issue), Details(Details) {}
+
+  std::shared_ptr Issue;
+  std::shared_ptr Details;
+};
+
 struct FunctionAnalysisResult {
-  std::vector> Diagnostics;
+  std::vector Diagnostics;
+};
+
+/// A helper class storing per-function context to be instantiated by Analysis.
+class FunctionAnalysis {
+  BinaryContext &BC;
+  BinaryFunction &BF;
+  MCPlusBuilder::AllocatorIdTy AllocatorId;
+  FunctionAnalysisResult Result;
+
+  bool PacRetGadgetsOnly;
+
+  void findUnsafeUses(SmallVector> &Reports);
+  void augmentUnsafeUseReports(const ArrayRef> Reports);
+
+public:
+  FunctionAnalysis(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy 
AllocatorId,
+   bool PacRetGadgetsOnly)
+  :

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: use more appropriate types (NFC) (PR #135661)

2025-04-14 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-bolt

Author: Anatoly Trosinenko (atrosinenko)


Changes

* use more flexible `const ArrayRef` and `StringRef` types instead of
  `const std::vector &` and `const std::string &`, 
correspondingly,
  for function arguments
* return plain `const SrcState &` instead of `ErrorOr`
  from `SrcSafetyAnalysis::getStateBefore`, as absent state is not
  handled gracefully by any caller

---
Full diff: https://github.com/llvm/llvm-project/pull/135661.diff


2 Files Affected:

- (modified) bolt/include/bolt/Passes/PAuthGadgetScanner.h (+2-6) 
- (modified) bolt/lib/Passes/PAuthGadgetScanner.cpp (+17-22) 


``diff
diff --git a/bolt/include/bolt/Passes/PAuthGadgetScanner.h 
b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
index 6765e2aff414f..3e39b64e59e0f 100644
--- a/bolt/include/bolt/Passes/PAuthGadgetScanner.h
+++ b/bolt/include/bolt/Passes/PAuthGadgetScanner.h
@@ -12,7 +12,6 @@
 #include "bolt/Core/BinaryContext.h"
 #include "bolt/Core/BinaryFunction.h"
 #include "bolt/Passes/BinaryPasses.h"
-#include "llvm/ADT/SmallSet.h"
 #include "llvm/Support/raw_ostream.h"
 #include 
 
@@ -199,9 +198,6 @@ raw_ostream &operator<<(raw_ostream &OS, const 
MCInstReference &);
 
 namespace PAuthGadgetScanner {
 
-class SrcSafetyAnalysis;
-struct SrcState;
-
 /// Description of a gadget kind that can be detected. Intended to be
 /// statically allocated to be attached to reports by reference.
 class GadgetKind {
@@ -210,7 +206,7 @@ class GadgetKind {
 public:
   GadgetKind(const char *Description) : Description(Description) {}
 
-  const StringRef getDescription() const { return Description; }
+  StringRef getDescription() const { return Description; }
 };
 
 /// Base report located at some instruction, without any additional 
information.
@@ -261,7 +257,7 @@ struct GadgetReport : public Report {
 /// Report with a free-form message attached.
 struct GenericReport : public Report {
   std::string Text;
-  GenericReport(MCInstReference Location, const std::string &Text)
+  GenericReport(MCInstReference Location, StringRef Text)
   : Report(Location), Text(Text) {}
   virtual void generateReport(raw_ostream &OS,
   const BinaryContext &BC) const override;
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp 
b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index ad47bdff753c8..ed89471cbb8d3 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -91,14 +91,14 @@ class TrackedRegisters {
   const std::vector Registers;
   std::vector RegToIndexMapping;
 
-  static size_t getMappingSize(const std::vector &RegsToTrack) {
+  static size_t getMappingSize(const ArrayRef RegsToTrack) {
 if (RegsToTrack.empty())
   return 0;
 return 1 + *llvm::max_element(RegsToTrack);
   }
 
 public:
-  TrackedRegisters(const std::vector &RegsToTrack)
+  TrackedRegisters(const ArrayRef RegsToTrack)
   : Registers(RegsToTrack),
 RegToIndexMapping(getMappingSize(RegsToTrack), NoIndex) {
 for (unsigned I = 0; I < RegsToTrack.size(); ++I)
@@ -234,7 +234,7 @@ struct SrcState {
 
 static void printLastInsts(
 raw_ostream &OS,
-const std::vector> &LastInstWritingReg) {
+const ArrayRef> LastInstWritingReg) {
   OS << "Insts: ";
   for (unsigned I = 0; I < LastInstWritingReg.size(); ++I) {
 auto &Set = LastInstWritingReg[I];
@@ -295,7 +295,7 @@ void SrcStatePrinter::print(raw_ostream &OS, const SrcState 
&S) const {
 class SrcSafetyAnalysis {
 public:
   SrcSafetyAnalysis(BinaryFunction &BF,
-const std::vector &RegsToTrackInstsFor)
+const ArrayRef RegsToTrackInstsFor)
   : BC(BF.getBinaryContext()), NumRegs(BC.MRI->getNumRegs()),
 RegsToTrackInstsFor(RegsToTrackInstsFor) {}
 
@@ -303,11 +303,10 @@ class SrcSafetyAnalysis {
 
   static std::shared_ptr
   create(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId,
- const std::vector &RegsToTrackInstsFor);
+ const ArrayRef RegsToTrackInstsFor);
 
   virtual void run() = 0;
-  virtual ErrorOr
-  getStateBefore(const MCInst &Inst) const = 0;
+  virtual const SrcState &getStateBefore(const MCInst &Inst) const = 0;
 
 protected:
   BinaryContext &BC;
@@ -348,7 +347,7 @@ class SrcSafetyAnalysis {
   }
 
   BitVector getClobberedRegs(const MCInst &Point) const {
-BitVector Clobbered(NumRegs, false);
+BitVector Clobbered(NumRegs);
 // Assume a call can clobber all registers, including callee-saved
 // registers. There's a good chance that callee-saved registers will be
 // saved on the stack at some point during execution of the callee.
@@ -409,8 +408,7 @@ class SrcSafetyAnalysis {
 
   // FirstCheckerInst should belong to the same basic block, meaning
   // it was deterministically processed a few steps before this 
instruction.
-  const SrcState &StateBeforeChecker =
-  getStateBefore(*FirstCheckerInst).get();
+  const SrcState &

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: use more appropriate types (NFC) (PR #135661)

2025-04-14 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko ready_for_review 
https://github.com/llvm/llvm-project/pull/135661
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [GOFF] Add writing of section symbols (PR #133799)

2025-04-14 Thread Kai Nacke via llvm-branch-commits



@@ -0,0 +1,113 @@
+//===- MCGOFFAttributes.h - Attributes of GOFF symbols 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Defines the various attribute collections defining GOFF symbols.
+//
+//===--===//
+
+#ifndef LLVM_MC_MCGOFFATTRIBUTES_H
+#define LLVM_MC_MCGOFFATTRIBUTES_H
+
+#include "llvm/ADT/StringRef.h"
+#include "llvm/BinaryFormat/GOFF.h"
+
+namespace llvm {
+namespace GOFF {
+// An "External Symbol Definition" in the GOFF file has a type, and depending 
on
+// the type a different subset of the fields is used.
+//
+// Unlike other formats, a 2 dimensional structure is used to define the
+// location of data. For example, the equivalent of the ELF .text section is
+// made up of a Section Definition (SD) and a class (Element Definition; ED).
+// The name of the SD symbol depends on the application, while the class has 
the
+// predefined name C_CODE/C_CODE64 in AMODE31 and AMODE64 respectively.
+//
+// Data can be placed into this structure in 2 ways. First, the data (in a text
+// record) can be associated with an ED symbol. To refer to data, a Label
+// Definition (LD) is used to give an offset into the data a name. When 
binding,
+// the whole data is pulled into the resulting executable, and the addresses
+// given by the LD symbols are resolved.
+//
+// The alternative is to use a Part Definition (PR). In this case, the data (in
+// a text record) is associated with the part. When binding, only the data of
+// referenced PRs is pulled into the resulting binary.
+//
+// Both approaches are used, which means that the equivalent of a section in 
ELF
+// results in 3 GOFF symbols, either SD/ED/LD or SD/ED/PR. Moreover, certain
+// sections are fine with just defining SD/ED symbols. The SymbolMapper takes
+// care of all those details.
+
+// Attributes for SD symbols.
+struct SDAttr {
+  GOFF::ESDTaskingBehavior TaskingBehavior = GOFF::ESD_TA_Unspecified;
+  GOFF::ESDBindingScope BindingScope = GOFF::ESD_BSC_Unspecified;
+};
+
+// Attributes for ED symbols.
+struct EDAttr {
+  bool IsReadOnly = false;
+  GOFF::ESDExecutable Executable = GOFF::ESD_EXE_Unspecified;
+  GOFF::ESDAmode Amode;

redstar wrote:

I had to do a bit of research here. The Amode at the ED symbol acts as a 
default when no Amode at the LD/ER is present. The Amode at the PR seems to be 
not necessary. However, I need to check if this results in binder errors if I 
remove this.

https://github.com/llvm/llvm-project/pull/133799
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for extends and trunc (PR #132383)

2025-04-14 Thread Petar Avramovic via llvm-branch-commits



@@ -179,8 +174,7 @@ body: |
 ; CHECK: liveins: $sgpr0

petar-avramovic wrote:

[Re: line 
+159]
> This change is a code quality regression: the input has G_ANYEXT, so the high 
> half can be undefined.

fixed
See this comment 
inline on https://app.graphite.dev/github/pr/llvm/llvm-project/132383?utm_source=unchanged-line-comment";>Graphite.

https://github.com/llvm/llvm-project/pull/132383
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [lldb] Draft: test (PR #135630)

2025-04-14 Thread Matheus Izvekov via llvm-branch-commits


https://github.com/mizvekov updated 
https://github.com/llvm/llvm-project/pull/135630

>From 99af05b3d8227ffe9a43e67e700d3850e6bec665 Mon Sep 17 00:00:00 2001
From: Matheus Izvekov 
Date: Mon, 14 Apr 2025 11:56:01 -0300
Subject: [PATCH] Draft: test

With change:
1) 4m46s - 
https://buildkite.com/llvm-project/github-pull-requests/builds/168411#_
---
 lldb/DELETE.ME | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 lldb/DELETE.ME

diff --git a/lldb/DELETE.ME b/lldb/DELETE.ME
new file mode 100644
index 0..e69de29bb2d1d

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [lldb] Draft: test (PR #135630)

2025-04-14 Thread Matheus Izvekov via llvm-branch-commits


https://github.com/mizvekov created 
https://github.com/llvm/llvm-project/pull/135630

None

>From e0418788c2384e1d7bd190baa52b9bfc0035ec2f Mon Sep 17 00:00:00 2001
From: Matheus Izvekov 
Date: Mon, 14 Apr 2025 11:56:01 -0300
Subject: [PATCH] Draft: test

---
 lldb/DELETE.ME | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 lldb/DELETE.ME

diff --git a/lldb/DELETE.ME b/lldb/DELETE.ME
new file mode 100644
index 0..e69de29bb2d1d

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AArch64AsmPrinter]Place jump tables into hot/unlikely-prefixed data sections for aarch64 (PR #126018)

2025-04-14 Thread Mingming Liu via llvm-branch-commits


https://github.com/mingmingl-llvm edited 
https://github.com/llvm/llvm-project/pull/126018
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [compiler-rt] [llvm] Reentry (PR #135656)

2025-04-14 Thread Mircea Trofin via llvm-branch-commits


mtrofin wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/135656?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#135656** https://app.graphite.dev/github/pr/llvm/llvm-project/135656?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/135656?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#135651** https://app.graphite.dev/github/pr/llvm/llvm-project/135651?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#135650** https://app.graphite.dev/github/pr/llvm/llvm-project/135650?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/135656
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/20.x: [HEXAGON] Fix corner cases for hwloops pass (#135439) (PR #135657)

2025-04-14 Thread via llvm-branch-commits


https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/135657
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/20.x: [HEXAGON] Fix corner cases for hwloops pass (#135439) (PR #135657)

2025-04-14 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-hexagon

Author: None (llvmbot)


Changes

Backport da8ce56c53fe6e34809ba0b310fa90257e230a89

Requested by: @androm3da

---
Full diff: https://github.com/llvm/llvm-project/pull/135657.diff


3 Files Affected:

- (modified) llvm/lib/Target/Hexagon/HexagonHardwareLoops.cpp (+45-1) 
- (added) llvm/test/CodeGen/Hexagon/hwloop-dist-check.mir (+277) 
- (modified) llvm/test/CodeGen/Hexagon/swp-phi-start.ll (+3-2) 


``diff
diff --git a/llvm/lib/Target/Hexagon/HexagonHardwareLoops.cpp 
b/llvm/lib/Target/Hexagon/HexagonHardwareLoops.cpp
index 9334746349240..dd4b240455126 100644
--- a/llvm/lib/Target/Hexagon/HexagonHardwareLoops.cpp
+++ b/llvm/lib/Target/Hexagon/HexagonHardwareLoops.cpp
@@ -731,6 +731,11 @@ CountValue *HexagonHardwareLoops::computeCount(MachineLoop 
*Loop,
Register IVReg,
int64_t IVBump,
Comparison::Kind Cmp) const {
+  LLVM_DEBUG(llvm::dbgs() << "Loop: " << *Loop << "\n");
+  LLVM_DEBUG(llvm::dbgs() << "Initial Value: " << *Start << "\n");
+  LLVM_DEBUG(llvm::dbgs() << "End Value: " << *End << "\n");
+  LLVM_DEBUG(llvm::dbgs() << "Inc/Dec Value: " << IVBump << "\n");
+  LLVM_DEBUG(llvm::dbgs() << "Comparison: " << Cmp << "\n");
   // Cannot handle comparison EQ, i.e. while (A == B).
   if (Cmp == Comparison::EQ)
 return nullptr;
@@ -846,6 +851,7 @@ CountValue *HexagonHardwareLoops::computeCount(MachineLoop 
*Loop,
   if (IVBump < 0) {
 std::swap(Start, End);
 IVBump = -IVBump;
+std::swap(CmpLess, CmpGreater);
   }
   // Cmp may now have a wrong direction, e.g.  LEs may now be GEs.
   // Signedness, and "including equality" are preserved.
@@ -989,7 +995,45 @@ CountValue *HexagonHardwareLoops::computeCount(MachineLoop 
*Loop,
 CountSR = 0;
   }
 
-  return new CountValue(CountValue::CV_Register, CountR, CountSR);
+  const TargetRegisterClass *PredRC = &Hexagon::PredRegsRegClass;
+  Register MuxR = CountR;
+  unsigned MuxSR = CountSR;
+  // For the loop count to be valid unsigned number, CmpLess should imply
+  // Dist >= 0. Similarly, CmpGreater should imply Dist < 0. We can skip the
+  // check if the initial distance is zero and the comparison is LTu || LTEu.
+  if (!(Start->isImm() && StartV == 0 && Comparison::isUnsigned(Cmp) &&
+CmpLess) &&
+  (CmpLess || CmpGreater)) {
+// Generate:
+//   DistCheck = CMP_GT DistR,  0   --> CmpLess
+//   DistCheck = CMP_GT DistR, -1   --> CmpGreater
+Register DistCheckR = MRI->createVirtualRegister(PredRC);
+const MCInstrDesc &DistCheckD = TII->get(Hexagon::C2_cmpgti);
+BuildMI(*PH, InsertPos, DL, DistCheckD, DistCheckR)
+.addReg(DistR, 0, DistSR)
+.addImm((CmpLess) ? 0 : -1);
+
+// Generate:
+//   MUXR = MUX DistCheck, CountR, 1   --> CmpLess
+//   MUXR = MUX DistCheck, 1, CountR   --> CmpGreater
+MuxR = MRI->createVirtualRegister(IntRC);
+if (CmpLess) {
+  const MCInstrDesc &MuxD = TII->get(Hexagon::C2_muxir);
+  BuildMI(*PH, InsertPos, DL, MuxD, MuxR)
+  .addReg(DistCheckR)
+  .addReg(CountR, 0, CountSR)
+  .addImm(1);
+} else {
+  const MCInstrDesc &MuxD = TII->get(Hexagon::C2_muxri);
+  BuildMI(*PH, InsertPos, DL, MuxD, MuxR)
+  .addReg(DistCheckR)
+  .addImm(1)
+  .addReg(CountR, 0, CountSR);
+}
+MuxSR = 0;
+  }
+
+  return new CountValue(CountValue::CV_Register, MuxR, MuxSR);
 }
 
 /// Return true if the operation is invalid within hardware loop.
diff --git a/llvm/test/CodeGen/Hexagon/hwloop-dist-check.mir 
b/llvm/test/CodeGen/Hexagon/hwloop-dist-check.mir
new file mode 100644
index 0..9f8c14a314309
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/hwloop-dist-check.mir
@@ -0,0 +1,277 @@
+# RUN: llc --mtriple=hexagon -run-pass=hwloops %s -o - | FileCheck %s
+
+# CHECK-LABEL: name: f
+# CHECK: [[R1:%[0-9]+]]:predregs = C2_cmpgti [[R0:%[0-9]+]], 0
+# CHECK: [[R3:%[0-9]+]]:intregs = C2_muxir [[R1:%[0-9]+]], [[R2:%[0-9]+]], 1
+# CHECK-LABEL: name: g
+# CHECK: [[R1:%[0-9]+]]:predregs = C2_cmpgti [[R0:%[0-9]+]], 0
+# CHECK: [[R3:%[0-9]+]]:intregs = C2_muxir [[R1:%[0-9]+]], [[R2:%[0-9]+]], 1
+--- |
+  @a = dso_local global [255 x ptr] zeroinitializer, align 8
+
+  ; Function Attrs: minsize nofree norecurse nosync nounwind optsize 
memory(write, argmem: none, inaccessiblemem: none)
+  define dso_local void @f(i32 noundef %m) local_unnamed_addr #0 {
+  entry:
+%cond = tail call i32 @llvm.smax.i32(i32 %m, i32 2)
+%0 = add nsw i32 %cond, -4
+%1 = shl i32 %cond, 3
+%cgep = getelementptr i8, ptr @a, i32 %1
+%cgep36 = bitcast ptr @a to ptr
+br label %do.body
+
+  do.body:  ; preds = %do.body, %entry
+%lsr.iv1 = phi ptr [ %cgep4, %do.body ], [ %cgep, %entry ]
+%lsr.iv = phi i32 [ %lsr.iv.next, %do.body ], [ %0, %entry ]
+%sh.

[llvm-branch-commits] [llvm] release/20.x: [HEXAGON] Fix corner cases for hwloops pass (#135439) (PR #135657)

2025-04-14 Thread via llvm-branch-commits


llvmbot wrote:

@iajbar What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/135657
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [lldb] Draft: test (PR #135630)

2025-04-14 Thread Matheus Izvekov via llvm-branch-commits


https://github.com/mizvekov updated 
https://github.com/llvm/llvm-project/pull/135630

>From fbe0a6651e8e506a25f0f16d34f3995f138d15ba Mon Sep 17 00:00:00 2001
From: Matheus Izvekov 
Date: Mon, 14 Apr 2025 11:56:01 -0300
Subject: [PATCH] Draft: test

With change:
1) 4m46s - 
https://buildkite.com/llvm-project/github-pull-requests/builds/168411#_
2) 4m36s - 
https://buildkite.com/llvm-project/github-pull-requests/builds/168431#01963503-57a4-4934-9de8-f298abe3c432
---
 lldb/DELETE.ME | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 lldb/DELETE.ME

diff --git a/lldb/DELETE.ME b/lldb/DELETE.ME
new file mode 100644
index 0..e69de29bb2d1d

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [HLSL] Adding support for Root Constants in LLVM Metadata (PR #135085)

2025-04-14 Thread Finn Plummer via llvm-branch-commits





inbelic wrote:

The test name implies that we are only testing root constants. Can we either 
change the name or remove root flags

https://github.com/llvm/llvm-project/pull/135085
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [HLSL] Adding support for Root Constants in LLVM Metadata (PR #135085)

2025-04-14 Thread Finn Plummer via llvm-branch-commits



@@ -52,6 +59,45 @@ static bool parseRootFlags(LLVMContext *Ctx, 
mcdxbc::RootSignatureDesc &RSD,
   return false;
 }
 
+static bool extractMdValue(uint32_t &Value, MDNode *Node, unsigned int OpId) {

inbelic wrote:

We could use this for `parseRootFlags` as well to make it consistent.

https://github.com/llvm/llvm-project/pull/135085
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [HLSL] Adding support for Root Constants in LLVM Metadata (PR #135085)

2025-04-14 Thread Finn Plummer via llvm-branch-commits



@@ -0,0 +1,34 @@
+; RUN: opt %s -dxil-embed -dxil-globals -S -o - | FileCheck %s
+; RUN: llc %s --filetype=obj -o - | obj2yaml | FileCheck %s --check-prefix=DXC
+
+target triple = "dxil-unknown-shadermodel6.0-compute"
+
+; CHECK: @dx.rts0 = private constant [48 x i8]  c"{{.*}}", section "RTS0", 
align 4
+
+define void @main() #0 {
+entry:
+  ret void
+}
+attributes #0 = { "hlsl.numthreads"="1,1,1" "hlsl.shader"="compute" }
+
+
+!dx.rootsignatures = !{!2} ; list of function/root signature pairs
+!2 = !{ ptr @main, !3 } ; function, root signature
+!3 = !{ !4, !5 } ; list of root signature elements
+!4 = !{ !"RootFlags", i32 1 } ; 1 = allow_input_assembler_input_layout
+!5 = !{ !"RootConstants", i32 0, i32 1, i32 2, i32 3 }
+
+; DXC:  - Name:RTS0
+; DXC-NEXT:Size:48
+; DXC-NEXT:RootSignature:
+; DXC-NEXT:  Version: 2
+; DXC-NEXT:  NumStaticSamplers: 0
+; DXC-NEXT:  StaticSamplersOffset: 0

inbelic wrote:

This has a different value from the test above (0 vs 48). Presumably they 
should be the same?

https://github.com/llvm/llvm-project/pull/135085
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [HLSL] Adding support for Root Constants in LLVM Metadata (PR #135085)

2025-04-14 Thread Finn Plummer via llvm-branch-commits



@@ -94,10 +144,56 @@ static bool parse(LLVMContext *Ctx, 
mcdxbc::RootSignatureDesc &RSD,
 
 static bool verifyRootFlag(uint32_t Flags) { return (Flags & ~0xfff) == 0; }
 
+static bool verifyShaderVisibility(dxbc::ShaderVisibility Flags) {
+  switch (Flags) {
+
+  case dxbc::ShaderVisibility::All:
+  case dxbc::ShaderVisibility::Vertex:
+  case dxbc::ShaderVisibility::Hull:
+  case dxbc::ShaderVisibility::Domain:
+  case dxbc::ShaderVisibility::Geometry:
+  case dxbc::ShaderVisibility::Pixel:
+  case dxbc::ShaderVisibility::Amplification:
+  case dxbc::ShaderVisibility::Mesh:
+return true;
+  }
+
+  return false;
+}
+
+static bool verifyParameterType(dxbc::RootParameterType Flags) {
+  switch (Flags) {
+  case dxbc::RootParameterType::Constants32Bit:

inbelic wrote:

Shouldn't root flags also be here?

https://github.com/llvm/llvm-project/pull/135085
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [HLSL] Adding support for Root Constants in LLVM Metadata (PR #135085)

2025-04-14 Thread Finn Plummer via llvm-branch-commits





inbelic wrote:

I think you meant this to be a new test file and not to edit the root flags 
validation.

Can we also remove the "RootFlags" member to reduce noise.

https://github.com/llvm/llvm-project/pull/135085
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] llvm-reduce: Preserve uselistorder when writing thinlto bitcode (PR #133369)

2025-04-14 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/133369

>From f056830c8eda3fab39cf73dad981b7b7091bdeb6 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Fri, 28 Mar 2025 10:46:08 +0700
Subject: [PATCH 1/2] llvm-reduce: Preserve uselistorder when writing thinlto
 bitcode

Fixes #63621
---
 .../thinlto-preserve-uselistorder.ll  | 19 +++
 llvm/tools/llvm-reduce/ReducerWorkItem.cpp| 11 ---
 2 files changed, 27 insertions(+), 3 deletions(-)
 create mode 100644 llvm/test/tools/llvm-reduce/thinlto-preserve-uselistorder.ll

diff --git a/llvm/test/tools/llvm-reduce/thinlto-preserve-uselistorder.ll 
b/llvm/test/tools/llvm-reduce/thinlto-preserve-uselistorder.ll
new file mode 100644
index 0..2332f2d632911
--- /dev/null
+++ b/llvm/test/tools/llvm-reduce/thinlto-preserve-uselistorder.ll
@@ -0,0 +1,19 @@
+; RUN: opt --thinlto-bc --thinlto-split-lto-unit %s -o %t.0
+; RUN: llvm-reduce -write-tmp-files-as-bitcode --delta-passes=instructions 
%t.0 -o %t.1 \
+; RUN: --test %python --test-arg %p/Inputs/llvm-dis-and-filecheck.py 
--test-arg llvm-dis --test-arg FileCheck --test-arg --check-prefix=INTERESTING 
--test-arg %s
+; RUN: llvm-dis --preserve-ll-uselistorder %t.1 -o %t.2
+; RUN: FileCheck --check-prefix=RESULT %s < %t.2
+
+define i32 @func(i32 %arg0, i32 %arg1) {
+entry:
+  %add0 = add i32 %arg0, 0
+  %add1 = add i32 %add0, 0
+  %add2 = add i32 %add1, 0
+  %add3 = add i32 %arg1, 0
+  %add4 = add i32 %add2, %add3
+  ret i32 %add4
+}
+
+; INTERESTING: uselistorder i32 0
+; RESULT: uselistorder i32 0, { 0, 2, 1 }
+uselistorder i32 0, { 3, 2, 1, 0 }
diff --git a/llvm/tools/llvm-reduce/ReducerWorkItem.cpp 
b/llvm/tools/llvm-reduce/ReducerWorkItem.cpp
index 8d2675c685038..9af2e5f5fdd23 100644
--- a/llvm/tools/llvm-reduce/ReducerWorkItem.cpp
+++ b/llvm/tools/llvm-reduce/ReducerWorkItem.cpp
@@ -776,7 +776,11 @@ void ReducerWorkItem::readBitcode(MemoryBufferRef Data, 
LLVMContext &Ctx,
 }
 
 void ReducerWorkItem::writeBitcode(raw_ostream &OutStream) const {
+  const bool ShouldPreserveUseListOrder = true;
+
   if (LTOInfo && LTOInfo->IsThinLTO && LTOInfo->EnableSplitLTOUnit) {
+// FIXME: This should not depend on the pass manager. There are hidden
+// transforms that may happen inside ThinLTOBitcodeWriterPass
 PassBuilder PB;
 LoopAnalysisManager LAM;
 FunctionAnalysisManager FAM;
@@ -788,7 +792,8 @@ void ReducerWorkItem::writeBitcode(raw_ostream &OutStream) 
const {
 PB.registerLoopAnalyses(LAM);
 PB.crossRegisterProxies(LAM, FAM, CGAM, MAM);
 ModulePassManager MPM;
-MPM.addPass(ThinLTOBitcodeWriterPass(OutStream, nullptr));
+MPM.addPass(ThinLTOBitcodeWriterPass(OutStream, nullptr,
+ ShouldPreserveUseListOrder));
 MPM.run(*M, MAM);
   } else {
 std::unique_ptr Index;
@@ -797,8 +802,8 @@ void ReducerWorkItem::writeBitcode(raw_ostream &OutStream) 
const {
   Index = std::make_unique(
   buildModuleSummaryIndex(*M, nullptr, &PSI));
 }
-WriteBitcodeToFile(getModule(), OutStream,
-   /*ShouldPreserveUseListOrder=*/true, Index.get());
+WriteBitcodeToFile(getModule(), OutStream, ShouldPreserveUseListOrder,
+   Index.get());
   }
 }
 

>From f324dce27a48741d8381843ba7478cf8f78b6c06 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 14 Apr 2025 16:19:11 +0200
Subject: [PATCH 2/2] Remove fixme

---
 llvm/tools/llvm-reduce/ReducerWorkItem.cpp | 2 --
 1 file changed, 2 deletions(-)

diff --git a/llvm/tools/llvm-reduce/ReducerWorkItem.cpp 
b/llvm/tools/llvm-reduce/ReducerWorkItem.cpp
index 9af2e5f5fdd23..67da8bf1fd2bf 100644
--- a/llvm/tools/llvm-reduce/ReducerWorkItem.cpp
+++ b/llvm/tools/llvm-reduce/ReducerWorkItem.cpp
@@ -779,8 +779,6 @@ void ReducerWorkItem::writeBitcode(raw_ostream &OutStream) 
const {
   const bool ShouldPreserveUseListOrder = true;
 
   if (LTOInfo && LTOInfo->IsThinLTO && LTOInfo->EnableSplitLTOUnit) {
-// FIXME: This should not depend on the pass manager. There are hidden
-// transforms that may happen inside ThinLTOBitcodeWriterPass
 PassBuilder PB;
 LoopAnalysisManager LAM;
 FunctionAnalysisManager FAM;

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [ctxprof] Extend the notion of "cannot return" (PR #135651)

2025-04-14 Thread Mircea Trofin via llvm-branch-commits


https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/135651

>From d02d00bef28047f16f37454576050fcf15a87814 Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Mon, 14 Apr 2025 10:03:55 -0700
Subject: [PATCH] [ctxprof] Extend the notion of "cannot return"

---
 .../Instrumentation/PGOCtxProfLowering.cpp| 19 --
 .../ctx-instrumentation-invalid-roots.ll  | 25 +++
 .../PGOProfile/ctx-instrumentation.ll | 15 ++-
 3 files changed, 41 insertions(+), 18 deletions(-)

diff --git a/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp 
b/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
index f99d7b9d03e02..136225ab27cdc 100644
--- a/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
+++ b/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
@@ -9,6 +9,7 @@
 
 #include "llvm/Transforms/Instrumentation/PGOCtxProfLowering.h"
 #include "llvm/ADT/STLExtras.h"
+#include "llvm/Analysis/CFG.h"
 #include "llvm/Analysis/CtxProfAnalysis.h"
 #include "llvm/Analysis/OptimizationRemarkEmitter.h"
 #include "llvm/IR/Analysis.h"
@@ -105,6 +106,12 @@ std::pair 
getNumCountersAndCallsites(const Function &F) {
   }
   return {NumCounters, NumCallsites};
 }
+
+void emitUnsupportedRoot(const Function &F, StringRef Reason) {
+  F.getContext().emitError("[ctxprof] The function " + F.getName() +
+   " was indicated as context root but " + Reason +
+   ", which is not supported.");
+}
 } // namespace
 
 // set up tie-in with compiler-rt.
@@ -164,12 +171,8 @@ 
CtxInstrumentationLowerer::CtxInstrumentationLowerer(Module &M,
   for (const auto &BB : *F)
 for (const auto &I : BB)
   if (const auto *CB = dyn_cast(&I))
-if (CB->isMustTailCall()) {
-  M.getContext().emitError("The function " + Fname +
-   " was indicated as a context root, "
-   "but it features musttail "
-   "calls, which is not supported.");
-}
+if (CB->isMustTailCall())
+  emitUnsupportedRoot(*F, "it features musttail calls");
 }
   }
 
@@ -230,11 +233,13 @@ bool CtxInstrumentationLowerer::lowerFunction(Function 
&F) {
 
   // Probably pointless to try to do anything here, unlikely to be
   // performance-affecting.
-  if (F.doesNotReturn()) {
+  if (!llvm::canReturn(F)) {
 for (auto &BB : F)
   for (auto &I : make_early_inc_range(BB))
 if (isa(&I))
   I.eraseFromParent();
+if (ContextRootSet.contains(&F))
+  emitUnsupportedRoot(F, "it does not return");
 return true;
   }
 
diff --git 
a/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll 
b/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll
index 454780153b823..b5ceb4602c60b 100644
--- a/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll
+++ b/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll
@@ -1,17 +1,22 @@
-; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=good \
-; RUN:   -profile-context-root=bad \
-; RUN:   -S < %s 2>&1 | FileCheck %s
+; RUN: split-file %s %t
+; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=the_func -S %t/musttail.ll -o - 2>&1 | FileCheck %s
+; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=the_func -S %t/unreachable.ll -o - 2>&1 | FileCheck %s
+; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=the_func -S %t/noreturn.ll -o - 2>&1 | FileCheck %s
 
+;--- musttail.ll
 declare void @foo()
 
-define void @good() {
-  call void @foo()
-  ret void
-}
-
-define void @bad() {
+define void @the_func() {
   musttail call void @foo()
   ret void
 }
+;--- unreachable.ll
+define void @the_func() {
+  unreachable
+}
+;--- noreturn.ll
+define void @the_func() noreturn {
+  unreachable
+}
 
-; CHECK: error: The function bad was indicated as a context root, but it 
features musttail calls, which is not supported.
+; CHECK: error: [ctxprof] The function the_func was indicated as context root
diff --git a/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll 
b/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll
index 6b2f25a585ec3..6afa37ef286f5 100644
--- a/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll
+++ b/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll
@@ -18,7 +18,7 @@ declare void @bar()
 ; LOWERING: @[[GLOB4:[0-9]+]] = internal global { ptr, ptr, ptr, ptr, i8 } 
zeroinitializer
 ; LOWERING: @[[GLOB5:[0-9]+]] = internal global { ptr, ptr, ptr, ptr, i8 } 
zeroinitializer
 ; LOWERING: @[[GLOB6:[0-9]+]] = internal global { ptr, ptr, ptr, ptr, i8 } 
zeroinitializer
-; LOWERING: @[[GLOB7:[0-9]+]] = internal global { ptr, ptr, ptr, ptr, i8 } { 
ptr null, ptr null, ptr inttoptr (i64 1 to ptr), ptr null, i8 0 }
+; LOWERING: @[[GLOB7:[0-9]+]] = intern

[llvm-branch-commits] [llvm] [ctxprof] Extend the notion of "cannot return" (PR #135651)

2025-04-14 Thread Mircea Trofin via llvm-branch-commits


https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/135651

>From 9642055c61eb43c7d821924fbbe180bc52a3d0d6 Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Mon, 14 Apr 2025 10:03:55 -0700
Subject: [PATCH] [ctxprof] Extend the notion of "cannot return"

---
 .../llvm/Transforms/IPO/FunctionAttrs.h   |  2 --
 .../Instrumentation/PGOCtxProfLowering.cpp| 19 --
 .../ctx-instrumentation-invalid-roots.ll  | 25 +++
 .../PGOProfile/ctx-instrumentation.ll | 15 ++-
 4 files changed, 41 insertions(+), 20 deletions(-)

diff --git a/llvm/include/llvm/Transforms/IPO/FunctionAttrs.h 
b/llvm/include/llvm/Transforms/IPO/FunctionAttrs.h
index 3a2c09afbebd3..6a21ff616d506 100644
--- a/llvm/include/llvm/Transforms/IPO/FunctionAttrs.h
+++ b/llvm/include/llvm/Transforms/IPO/FunctionAttrs.h
@@ -30,8 +30,6 @@ class Module;
 /// Returns the memory access properties of this copy of the function.
 MemoryEffects computeFunctionBodyMemoryAccess(Function &F, AAResults &AAR);
 
-bool canReturn(const Function &F);
-
 /// Propagate function attributes for function summaries along the index's
 /// callgraph during thinlink
 bool thinLTOPropagateFunctionAttrs(
diff --git a/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp 
b/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
index f99d7b9d03e02..136225ab27cdc 100644
--- a/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
+++ b/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
@@ -9,6 +9,7 @@
 
 #include "llvm/Transforms/Instrumentation/PGOCtxProfLowering.h"
 #include "llvm/ADT/STLExtras.h"
+#include "llvm/Analysis/CFG.h"
 #include "llvm/Analysis/CtxProfAnalysis.h"
 #include "llvm/Analysis/OptimizationRemarkEmitter.h"
 #include "llvm/IR/Analysis.h"
@@ -105,6 +106,12 @@ std::pair 
getNumCountersAndCallsites(const Function &F) {
   }
   return {NumCounters, NumCallsites};
 }
+
+void emitUnsupportedRoot(const Function &F, StringRef Reason) {
+  F.getContext().emitError("[ctxprof] The function " + F.getName() +
+   " was indicated as context root but " + Reason +
+   ", which is not supported.");
+}
 } // namespace
 
 // set up tie-in with compiler-rt.
@@ -164,12 +171,8 @@ 
CtxInstrumentationLowerer::CtxInstrumentationLowerer(Module &M,
   for (const auto &BB : *F)
 for (const auto &I : BB)
   if (const auto *CB = dyn_cast(&I))
-if (CB->isMustTailCall()) {
-  M.getContext().emitError("The function " + Fname +
-   " was indicated as a context root, "
-   "but it features musttail "
-   "calls, which is not supported.");
-}
+if (CB->isMustTailCall())
+  emitUnsupportedRoot(*F, "it features musttail calls");
 }
   }
 
@@ -230,11 +233,13 @@ bool CtxInstrumentationLowerer::lowerFunction(Function 
&F) {
 
   // Probably pointless to try to do anything here, unlikely to be
   // performance-affecting.
-  if (F.doesNotReturn()) {
+  if (!llvm::canReturn(F)) {
 for (auto &BB : F)
   for (auto &I : make_early_inc_range(BB))
 if (isa(&I))
   I.eraseFromParent();
+if (ContextRootSet.contains(&F))
+  emitUnsupportedRoot(F, "it does not return");
 return true;
   }
 
diff --git 
a/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll 
b/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll
index 454780153b823..b5ceb4602c60b 100644
--- a/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll
+++ b/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll
@@ -1,17 +1,22 @@
-; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=good \
-; RUN:   -profile-context-root=bad \
-; RUN:   -S < %s 2>&1 | FileCheck %s
+; RUN: split-file %s %t
+; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=the_func -S %t/musttail.ll -o - 2>&1 | FileCheck %s
+; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=the_func -S %t/unreachable.ll -o - 2>&1 | FileCheck %s
+; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=the_func -S %t/noreturn.ll -o - 2>&1 | FileCheck %s
 
+;--- musttail.ll
 declare void @foo()
 
-define void @good() {
-  call void @foo()
-  ret void
-}
-
-define void @bad() {
+define void @the_func() {
   musttail call void @foo()
   ret void
 }
+;--- unreachable.ll
+define void @the_func() {
+  unreachable
+}
+;--- noreturn.ll
+define void @the_func() noreturn {
+  unreachable
+}
 
-; CHECK: error: The function bad was indicated as a context root, but it 
features musttail calls, which is not supported.
+; CHECK: error: [ctxprof] The function the_func was indicated as context root
diff --git a/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll 
b/llvm/test/Transforms/PGOProfile/ct

[llvm-branch-commits] [llvm] [ctxprof] Extend the notion of "cannot return" (PR #135651)

2025-04-14 Thread Mircea Trofin via llvm-branch-commits


https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/135651

>From ea629230ae6202ed34122cecb7ebce20ccffad19 Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Mon, 14 Apr 2025 10:03:55 -0700
Subject: [PATCH] [ctxprof] Extend the notion of "cannot return"

---
 .../Instrumentation/PGOCtxProfLowering.cpp| 19 --
 .../ctx-instrumentation-invalid-roots.ll  | 25 +++
 .../PGOProfile/ctx-instrumentation.ll | 15 ++-
 3 files changed, 41 insertions(+), 18 deletions(-)

diff --git a/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp 
b/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
index f99d7b9d03e02..136225ab27cdc 100644
--- a/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
+++ b/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
@@ -9,6 +9,7 @@
 
 #include "llvm/Transforms/Instrumentation/PGOCtxProfLowering.h"
 #include "llvm/ADT/STLExtras.h"
+#include "llvm/Analysis/CFG.h"
 #include "llvm/Analysis/CtxProfAnalysis.h"
 #include "llvm/Analysis/OptimizationRemarkEmitter.h"
 #include "llvm/IR/Analysis.h"
@@ -105,6 +106,12 @@ std::pair 
getNumCountersAndCallsites(const Function &F) {
   }
   return {NumCounters, NumCallsites};
 }
+
+void emitUnsupportedRoot(const Function &F, StringRef Reason) {
+  F.getContext().emitError("[ctxprof] The function " + F.getName() +
+   " was indicated as context root but " + Reason +
+   ", which is not supported.");
+}
 } // namespace
 
 // set up tie-in with compiler-rt.
@@ -164,12 +171,8 @@ 
CtxInstrumentationLowerer::CtxInstrumentationLowerer(Module &M,
   for (const auto &BB : *F)
 for (const auto &I : BB)
   if (const auto *CB = dyn_cast(&I))
-if (CB->isMustTailCall()) {
-  M.getContext().emitError("The function " + Fname +
-   " was indicated as a context root, "
-   "but it features musttail "
-   "calls, which is not supported.");
-}
+if (CB->isMustTailCall())
+  emitUnsupportedRoot(*F, "it features musttail calls");
 }
   }
 
@@ -230,11 +233,13 @@ bool CtxInstrumentationLowerer::lowerFunction(Function 
&F) {
 
   // Probably pointless to try to do anything here, unlikely to be
   // performance-affecting.
-  if (F.doesNotReturn()) {
+  if (!llvm::canReturn(F)) {
 for (auto &BB : F)
   for (auto &I : make_early_inc_range(BB))
 if (isa(&I))
   I.eraseFromParent();
+if (ContextRootSet.contains(&F))
+  emitUnsupportedRoot(F, "it does not return");
 return true;
   }
 
diff --git 
a/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll 
b/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll
index 454780153b823..b5ceb4602c60b 100644
--- a/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll
+++ b/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll
@@ -1,17 +1,22 @@
-; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=good \
-; RUN:   -profile-context-root=bad \
-; RUN:   -S < %s 2>&1 | FileCheck %s
+; RUN: split-file %s %t
+; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=the_func -S %t/musttail.ll -o - 2>&1 | FileCheck %s
+; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=the_func -S %t/unreachable.ll -o - 2>&1 | FileCheck %s
+; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=the_func -S %t/noreturn.ll -o - 2>&1 | FileCheck %s
 
+;--- musttail.ll
 declare void @foo()
 
-define void @good() {
-  call void @foo()
-  ret void
-}
-
-define void @bad() {
+define void @the_func() {
   musttail call void @foo()
   ret void
 }
+;--- unreachable.ll
+define void @the_func() {
+  unreachable
+}
+;--- noreturn.ll
+define void @the_func() noreturn {
+  unreachable
+}
 
-; CHECK: error: The function bad was indicated as a context root, but it 
features musttail calls, which is not supported.
+; CHECK: error: [ctxprof] The function the_func was indicated as context root
diff --git a/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll 
b/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll
index 6b2f25a585ec3..6afa37ef286f5 100644
--- a/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll
+++ b/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll
@@ -18,7 +18,7 @@ declare void @bar()
 ; LOWERING: @[[GLOB4:[0-9]+]] = internal global { ptr, ptr, ptr, ptr, i8 } 
zeroinitializer
 ; LOWERING: @[[GLOB5:[0-9]+]] = internal global { ptr, ptr, ptr, ptr, i8 } 
zeroinitializer
 ; LOWERING: @[[GLOB6:[0-9]+]] = internal global { ptr, ptr, ptr, ptr, i8 } 
zeroinitializer
-; LOWERING: @[[GLOB7:[0-9]+]] = internal global { ptr, ptr, ptr, ptr, i8 } { 
ptr null, ptr null, ptr inttoptr (i64 1 to ptr), ptr null, i8 0 }
+; LOWERING: @[[GLOB7:[0-9]+]] = intern

[llvm-branch-commits] [compiler-rt] [llvm] Reentry (PR #135656)

2025-04-14 Thread Mircea Trofin via llvm-branch-commits


https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/135656

>From 978d61c0a92cd2a66c64c8f5daa1a3f30c18df77 Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Mon, 14 Apr 2025 07:19:58 -0700
Subject: [PATCH] Reentry

---
 .../lib/ctx_profile/CtxInstrProfiling.cpp | 151 --
 .../tests/CtxInstrProfilingTest.cpp   | 115 -
 .../llvm/ProfileData/CtxInstrContextNode.h|   6 +-
 .../Instrumentation/PGOCtxProfLowering.cpp|  82 ++
 .../PGOProfile/ctx-instrumentation.ll |   4 +-
 5 files changed, 269 insertions(+), 89 deletions(-)

diff --git a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp 
b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
index 2d173f0fcb19a..2e26541c1acea 100644
--- a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
+++ b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
@@ -41,7 +41,44 @@ Arena *FlatCtxArena = nullptr;
 
 // Set to true when we enter a root, and false when we exit - regardless if 
this
 // thread collects a contextual profile for that root.
-__thread bool IsUnderContext = false;
+__thread int UnderContextRefCount = 0;
+__thread void *volatile EnteredContextAddress = 0;
+
+void onFunctionEntered(void *Address) {
+  UnderContextRefCount += (Address == EnteredContextAddress);
+  assert(UnderContextRefCount > 0);
+}
+
+void onFunctionExited(void *Address) {
+  UnderContextRefCount -= (Address == EnteredContextAddress);
+  assert(UnderContextRefCount >= 0);
+}
+
+// Returns true if it was entered the first time
+bool rootEnterIsFirst(void* Address) {
+  bool Ret = true;
+  if (!EnteredContextAddress) {
+EnteredContextAddress = Address;
+assert(UnderContextRefCount == 0);
+Ret = true;
+  }
+  onFunctionEntered(Address);
+  return Ret;
+}
+
+// Return true if this also exits the root.
+bool exitsRoot(void* Address) {
+  onFunctionExited(Address);
+  if (UnderContextRefCount == 0) {
+EnteredContextAddress = nullptr;
+return true;
+  }
+  return false;
+
+}
+
+bool hasEnteredARoot() { return UnderContextRefCount > 0; }
+
 __sanitizer::atomic_uint8_t ProfilingStarted = {};
 
 __sanitizer::atomic_uintptr_t RootDetector = {};
@@ -287,62 +324,65 @@ ContextRoot *FunctionData::getOrAllocateContextRoot() {
   return Root;
 }
 
-ContextNode *tryStartContextGivenRoot(ContextRoot *Root, GUID Guid,
-  uint32_t Counters, uint32_t Callsites)
-SANITIZER_NO_THREAD_SAFETY_ANALYSIS {
-  IsUnderContext = true;
-  __sanitizer::atomic_fetch_add(&Root->TotalEntries, 1,
-__sanitizer::memory_order_relaxed);
+ContextNode *tryStartContextGivenRoot(
+ContextRoot *Root, void *EntryAddress, GUID Guid, uint32_t Counters,
+uint32_t Callsites) SANITIZER_NO_THREAD_SAFETY_ANALYSIS {
+
+  if (rootEnterIsFirst(EntryAddress))
+__sanitizer::atomic_fetch_add(&Root->TotalEntries, 1,
+  __sanitizer::memory_order_relaxed);
   if (!Root->FirstMemBlock) {
 setupContext(Root, Guid, Counters, Callsites);
   }
   if (Root->Taken.TryLock()) {
+assert(__llvm_ctx_profile_current_context_root == nullptr);
 __llvm_ctx_profile_current_context_root = Root;
 onContextEnter(*Root->FirstNode);
 return Root->FirstNode;
   }
   // If this thread couldn't take the lock, return scratch context.
-  __llvm_ctx_profile_current_context_root = nullptr;
   return TheScratchContext;
 }
 
+ContextNode *getOrStartContextOutsideCollection(FunctionData &Data,
+ContextRoot *OwnCtxRoot,
+void *Callee, GUID Guid,
+uint32_t NumCounters,
+uint32_t NumCallsites) {
+  // This must only be called when __llvm_ctx_profile_current_context_root is
+  // null.
+  assert(__llvm_ctx_profile_current_context_root == nullptr);
+  // OwnCtxRoot is Data.CtxRoot. Since it's volatile, and is used by the 
caller,
+  // pre-load it.
+  assert(Data.CtxRoot == OwnCtxRoot);
+  // If we have a root detector, try sampling.
+  // Otherwise - regardless if we started profiling or not, if Data.CtxRoot is
+  // allocated, try starting a context tree - basically, as-if
+  // __llvm_ctx_profile_start_context were called.
+  if (auto *RAD = getRootDetector())
+RAD->sample();
+  else if (reinterpret_cast(OwnCtxRoot) > 1)
+return tryStartContextGivenRoot(OwnCtxRoot, Data.EntryAddress, Guid,
+NumCounters, NumCallsites);
+
+  // If we didn't start profiling, or if we are under a context, just not
+  // collecting, return the scratch buffer.
+  if (hasEnteredARoot() ||
+  !__sanitizer::atomic_load_relaxed(&ProfilingStarted))
+return TheScratchContext;
+  return markAsScratch(
+  onContextEnter(*getFlatProfile(Data, Callee, Guid, NumCounters)));
+}
+
 ContextNode *getUnhandledContext(FunctionData &Data, void

[llvm-branch-commits] [compiler-rt] [llvm] Reentry (PR #135656)

2025-04-14 Thread Mircea Trofin via llvm-branch-commits


https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/135656

>From 978d61c0a92cd2a66c64c8f5daa1a3f30c18df77 Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Mon, 14 Apr 2025 07:19:58 -0700
Subject: [PATCH] Reentry

---
 .../lib/ctx_profile/CtxInstrProfiling.cpp | 151 --
 .../tests/CtxInstrProfilingTest.cpp   | 115 -
 .../llvm/ProfileData/CtxInstrContextNode.h|   6 +-
 .../Instrumentation/PGOCtxProfLowering.cpp|  82 ++
 .../PGOProfile/ctx-instrumentation.ll |   4 +-
 5 files changed, 269 insertions(+), 89 deletions(-)

diff --git a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp 
b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
index 2d173f0fcb19a..2e26541c1acea 100644
--- a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
+++ b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
@@ -41,7 +41,44 @@ Arena *FlatCtxArena = nullptr;
 
 // Set to true when we enter a root, and false when we exit - regardless if 
this
 // thread collects a contextual profile for that root.
-__thread bool IsUnderContext = false;
+__thread int UnderContextRefCount = 0;
+__thread void *volatile EnteredContextAddress = 0;
+
+void onFunctionEntered(void *Address) {
+  UnderContextRefCount += (Address == EnteredContextAddress);
+  assert(UnderContextRefCount > 0);
+}
+
+void onFunctionExited(void *Address) {
+  UnderContextRefCount -= (Address == EnteredContextAddress);
+  assert(UnderContextRefCount >= 0);
+}
+
+// Returns true if it was entered the first time
+bool rootEnterIsFirst(void* Address) {
+  bool Ret = true;
+  if (!EnteredContextAddress) {
+EnteredContextAddress = Address;
+assert(UnderContextRefCount == 0);
+Ret = true;
+  }
+  onFunctionEntered(Address);
+  return Ret;
+}
+
+// Return true if this also exits the root.
+bool exitsRoot(void* Address) {
+  onFunctionExited(Address);
+  if (UnderContextRefCount == 0) {
+EnteredContextAddress = nullptr;
+return true;
+  }
+  return false;
+
+}
+
+bool hasEnteredARoot() { return UnderContextRefCount > 0; }
+
 __sanitizer::atomic_uint8_t ProfilingStarted = {};
 
 __sanitizer::atomic_uintptr_t RootDetector = {};
@@ -287,62 +324,65 @@ ContextRoot *FunctionData::getOrAllocateContextRoot() {
   return Root;
 }
 
-ContextNode *tryStartContextGivenRoot(ContextRoot *Root, GUID Guid,
-  uint32_t Counters, uint32_t Callsites)
-SANITIZER_NO_THREAD_SAFETY_ANALYSIS {
-  IsUnderContext = true;
-  __sanitizer::atomic_fetch_add(&Root->TotalEntries, 1,
-__sanitizer::memory_order_relaxed);
+ContextNode *tryStartContextGivenRoot(
+ContextRoot *Root, void *EntryAddress, GUID Guid, uint32_t Counters,
+uint32_t Callsites) SANITIZER_NO_THREAD_SAFETY_ANALYSIS {
+
+  if (rootEnterIsFirst(EntryAddress))
+__sanitizer::atomic_fetch_add(&Root->TotalEntries, 1,
+  __sanitizer::memory_order_relaxed);
   if (!Root->FirstMemBlock) {
 setupContext(Root, Guid, Counters, Callsites);
   }
   if (Root->Taken.TryLock()) {
+assert(__llvm_ctx_profile_current_context_root == nullptr);
 __llvm_ctx_profile_current_context_root = Root;
 onContextEnter(*Root->FirstNode);
 return Root->FirstNode;
   }
   // If this thread couldn't take the lock, return scratch context.
-  __llvm_ctx_profile_current_context_root = nullptr;
   return TheScratchContext;
 }
 
+ContextNode *getOrStartContextOutsideCollection(FunctionData &Data,
+ContextRoot *OwnCtxRoot,
+void *Callee, GUID Guid,
+uint32_t NumCounters,
+uint32_t NumCallsites) {
+  // This must only be called when __llvm_ctx_profile_current_context_root is
+  // null.
+  assert(__llvm_ctx_profile_current_context_root == nullptr);
+  // OwnCtxRoot is Data.CtxRoot. Since it's volatile, and is used by the 
caller,
+  // pre-load it.
+  assert(Data.CtxRoot == OwnCtxRoot);
+  // If we have a root detector, try sampling.
+  // Otherwise - regardless if we started profiling or not, if Data.CtxRoot is
+  // allocated, try starting a context tree - basically, as-if
+  // __llvm_ctx_profile_start_context were called.
+  if (auto *RAD = getRootDetector())
+RAD->sample();
+  else if (reinterpret_cast(OwnCtxRoot) > 1)
+return tryStartContextGivenRoot(OwnCtxRoot, Data.EntryAddress, Guid,
+NumCounters, NumCallsites);
+
+  // If we didn't start profiling, or if we are under a context, just not
+  // collecting, return the scratch buffer.
+  if (hasEnteredARoot() ||
+  !__sanitizer::atomic_load_relaxed(&ProfilingStarted))
+return TheScratchContext;
+  return markAsScratch(
+  onContextEnter(*getFlatProfile(Data, Callee, Guid, NumCounters)));
+}
+
 ContextNode *getUnhandledContext(FunctionData &Data, void

[llvm-branch-commits] [llvm] [ctxprof] Extend the notion of "cannot return" (PR #135651)

2025-04-14 Thread Mircea Trofin via llvm-branch-commits


https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/135651

>From ea629230ae6202ed34122cecb7ebce20ccffad19 Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Mon, 14 Apr 2025 10:03:55 -0700
Subject: [PATCH] [ctxprof] Extend the notion of "cannot return"

---
 .../Instrumentation/PGOCtxProfLowering.cpp| 19 --
 .../ctx-instrumentation-invalid-roots.ll  | 25 +++
 .../PGOProfile/ctx-instrumentation.ll | 15 ++-
 3 files changed, 41 insertions(+), 18 deletions(-)

diff --git a/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp 
b/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
index f99d7b9d03e02..136225ab27cdc 100644
--- a/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
+++ b/llvm/lib/Transforms/Instrumentation/PGOCtxProfLowering.cpp
@@ -9,6 +9,7 @@
 
 #include "llvm/Transforms/Instrumentation/PGOCtxProfLowering.h"
 #include "llvm/ADT/STLExtras.h"
+#include "llvm/Analysis/CFG.h"
 #include "llvm/Analysis/CtxProfAnalysis.h"
 #include "llvm/Analysis/OptimizationRemarkEmitter.h"
 #include "llvm/IR/Analysis.h"
@@ -105,6 +106,12 @@ std::pair 
getNumCountersAndCallsites(const Function &F) {
   }
   return {NumCounters, NumCallsites};
 }
+
+void emitUnsupportedRoot(const Function &F, StringRef Reason) {
+  F.getContext().emitError("[ctxprof] The function " + F.getName() +
+   " was indicated as context root but " + Reason +
+   ", which is not supported.");
+}
 } // namespace
 
 // set up tie-in with compiler-rt.
@@ -164,12 +171,8 @@ 
CtxInstrumentationLowerer::CtxInstrumentationLowerer(Module &M,
   for (const auto &BB : *F)
 for (const auto &I : BB)
   if (const auto *CB = dyn_cast(&I))
-if (CB->isMustTailCall()) {
-  M.getContext().emitError("The function " + Fname +
-   " was indicated as a context root, "
-   "but it features musttail "
-   "calls, which is not supported.");
-}
+if (CB->isMustTailCall())
+  emitUnsupportedRoot(*F, "it features musttail calls");
 }
   }
 
@@ -230,11 +233,13 @@ bool CtxInstrumentationLowerer::lowerFunction(Function 
&F) {
 
   // Probably pointless to try to do anything here, unlikely to be
   // performance-affecting.
-  if (F.doesNotReturn()) {
+  if (!llvm::canReturn(F)) {
 for (auto &BB : F)
   for (auto &I : make_early_inc_range(BB))
 if (isa(&I))
   I.eraseFromParent();
+if (ContextRootSet.contains(&F))
+  emitUnsupportedRoot(F, "it does not return");
 return true;
   }
 
diff --git 
a/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll 
b/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll
index 454780153b823..b5ceb4602c60b 100644
--- a/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll
+++ b/llvm/test/Transforms/PGOProfile/ctx-instrumentation-invalid-roots.ll
@@ -1,17 +1,22 @@
-; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=good \
-; RUN:   -profile-context-root=bad \
-; RUN:   -S < %s 2>&1 | FileCheck %s
+; RUN: split-file %s %t
+; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=the_func -S %t/musttail.ll -o - 2>&1 | FileCheck %s
+; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=the_func -S %t/unreachable.ll -o - 2>&1 | FileCheck %s
+; RUN: not opt -passes=ctx-instr-gen,ctx-instr-lower 
-profile-context-root=the_func -S %t/noreturn.ll -o - 2>&1 | FileCheck %s
 
+;--- musttail.ll
 declare void @foo()
 
-define void @good() {
-  call void @foo()
-  ret void
-}
-
-define void @bad() {
+define void @the_func() {
   musttail call void @foo()
   ret void
 }
+;--- unreachable.ll
+define void @the_func() {
+  unreachable
+}
+;--- noreturn.ll
+define void @the_func() noreturn {
+  unreachable
+}
 
-; CHECK: error: The function bad was indicated as a context root, but it 
features musttail calls, which is not supported.
+; CHECK: error: [ctxprof] The function the_func was indicated as context root
diff --git a/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll 
b/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll
index 6b2f25a585ec3..6afa37ef286f5 100644
--- a/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll
+++ b/llvm/test/Transforms/PGOProfile/ctx-instrumentation.ll
@@ -18,7 +18,7 @@ declare void @bar()
 ; LOWERING: @[[GLOB4:[0-9]+]] = internal global { ptr, ptr, ptr, ptr, i8 } 
zeroinitializer
 ; LOWERING: @[[GLOB5:[0-9]+]] = internal global { ptr, ptr, ptr, ptr, i8 } 
zeroinitializer
 ; LOWERING: @[[GLOB6:[0-9]+]] = internal global { ptr, ptr, ptr, ptr, i8 } 
zeroinitializer
-; LOWERING: @[[GLOB7:[0-9]+]] = internal global { ptr, ptr, ptr, ptr, i8 } { 
ptr null, ptr null, ptr inttoptr (i64 1 to ptr), ptr null, i8 0 }
+; LOWERING: @[[GLOB7:[0-9]+]] = intern

[llvm-branch-commits] [llvm] llvm-reduce: Preserve uselistorder when writing thinlto bitcode (PR #133369)

2025-04-14 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

### Merge activity

* **Apr 14, 2:29 PM EDT**: A user started a stack merge that includes this pull 
request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/133369).


https://github.com/llvm/llvm-project/pull/133369
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [mlir] [MLIR][ArmSVE] Add an ArmSVE dialect operation which maps to svusmmla (PR #135634)

2025-04-14 Thread Momchil Velikov via llvm-branch-commits


https://github.com/momchil-velikov created 
https://github.com/llvm/llvm-project/pull/135634

Supersedes https://github.com/llvm/llvm-project/pull/135358

>From 71e2f13ad5922bf93961c5d81fd9d1f5899c80b0 Mon Sep 17 00:00:00 2001
From: Momchil Velikov 
Date: Thu, 10 Apr 2025 14:38:27 +
Subject: [PATCH] [MLIR][ArmSVE] Add an ArmSVE dialect operation which maps to
 `svusmmla`

---
 mlir/include/mlir/Dialect/ArmSVE/IR/ArmSVE.td | 32 +++
 .../Transforms/LegalizeForLLVMExport.cpp  |  4 +++
 .../Dialect/ArmSVE/legalize-for-llvm.mlir | 12 +++
 mlir/test/Dialect/ArmSVE/roundtrip.mlir   | 11 +++
 mlir/test/Target/LLVMIR/arm-sve.mlir  | 12 +++
 5 files changed, 71 insertions(+)

diff --git a/mlir/include/mlir/Dialect/ArmSVE/IR/ArmSVE.td 
b/mlir/include/mlir/Dialect/ArmSVE/IR/ArmSVE.td
index 1a59062ccc93d..da2a8f89b4cfd 100644
--- a/mlir/include/mlir/Dialect/ArmSVE/IR/ArmSVE.td
+++ b/mlir/include/mlir/Dialect/ArmSVE/IR/ArmSVE.td
@@ -273,6 +273,34 @@ def UmmlaOp : ArmSVE_Op<"ummla",
 "$acc `,` $src1 `,` $src2 attr-dict `:` type($src1) `to` type($dst)";
 }
 
+def UsmmlaOp : ArmSVE_Op<"usmmla", [Pure,
+AllTypesMatch<["src1", "src2"]>,
+AllTypesMatch<["acc", "dst"]>]> {
+  let summary = "Matrix-matrix multiply and accumulate op";
+  let description = [{
+USMMLA: Unsigned by signed integer matrix multiply-accumulate.
+
+The unsigned by signed integer matrix multiply-accumulate operation
+multiplies the 2×8 matrix of unsigned 8-bit integer values held
+the first source vector by the 8×2 matrix of signed 8-bit integer
+values in the second source vector. The resulting 2×2 widened 32-bit
+integer matrix product is then added to the 32-bit integer matrix
+accumulator.
+
+Source:
+https://developer.arm.com/documentation/100987/
+  }];
+  // Supports (vector<16xi8>, vector<16xi8>) -> (vector<4xi32>)
+  let arguments = (ins
+  ScalableVectorOfLengthAndType<[4], [I32]>:$acc,
+  ScalableVectorOfLengthAndType<[16], [I8]>:$src1,
+  ScalableVectorOfLengthAndType<[16], [I8]>:$src2
+  );
+  let results = (outs ScalableVectorOfLengthAndType<[4], [I32]>:$dst);
+  let assemblyFormat =
+"$acc `,` $src1 `,` $src2 attr-dict `:` type($src1) `to` type($dst)";
+}
+
 class SvboolTypeConstraint : TypesMatchWith<
   "expected corresponding svbool type widened to [16]xi1",
   lhsArg, rhsArg,
@@ -568,6 +596,10 @@ def SmmlaIntrOp :
   ArmSVE_IntrBinaryOverloadedOp<"smmla">,
   Arguments<(ins AnyScalableVectorOfAnyRank, AnyScalableVectorOfAnyRank, 
AnyScalableVectorOfAnyRank)>;
 
+def UsmmlaIntrOp :
+  ArmSVE_IntrBinaryOverloadedOp<"usmmla">,
+  Arguments<(ins AnyScalableVectorOfAnyRank, AnyScalableVectorOfAnyRank, 
AnyScalableVectorOfAnyRank)>;
+
 def SdotIntrOp :
   ArmSVE_IntrBinaryOverloadedOp<"sdot">,
   Arguments<(ins AnyScalableVectorOfAnyRank, AnyScalableVectorOfAnyRank, 
AnyScalableVectorOfAnyRank)>;
diff --git a/mlir/lib/Dialect/ArmSVE/Transforms/LegalizeForLLVMExport.cpp 
b/mlir/lib/Dialect/ArmSVE/Transforms/LegalizeForLLVMExport.cpp
index fe13ed03356b2..b1846e15196fc 100644
--- a/mlir/lib/Dialect/ArmSVE/Transforms/LegalizeForLLVMExport.cpp
+++ b/mlir/lib/Dialect/ArmSVE/Transforms/LegalizeForLLVMExport.cpp
@@ -24,6 +24,7 @@ using SdotOpLowering = OneToOneConvertToLLVMPattern;
 using SmmlaOpLowering = OneToOneConvertToLLVMPattern;
 using UdotOpLowering = OneToOneConvertToLLVMPattern;
 using UmmlaOpLowering = OneToOneConvertToLLVMPattern;
+using UsmmlaOpLowering = OneToOneConvertToLLVMPattern;
 using DupQLaneLowering =
 OneToOneConvertToLLVMPattern;
 using ScalableMaskedAddIOpLowering =
@@ -194,6 +195,7 @@ void mlir::populateArmSVELegalizeForLLVMExportPatterns(
SmmlaOpLowering,
UdotOpLowering,
UmmlaOpLowering,
+   UsmmlaOpLowering,
DupQLaneLowering,
ScalableMaskedAddIOpLowering,
ScalableMaskedAddFOpLowering,
@@ -222,6 +224,7 @@ void mlir::configureArmSVELegalizeForExportTarget(
 SmmlaIntrOp,
 UdotIntrOp,
 UmmlaIntrOp,
+UsmmlaIntrOp,
 DupQLaneIntrOp,
 ScalableMaskedAddIIntrOp,
 ScalableMaskedAddFIntrOp,
@@ -242,6 +245,7 @@ void mlir::configureArmSVELegalizeForExportTarget(
   SmmlaOp,
   UdotOp,
   UmmlaOp,
+  UsmmlaOp,
   DupQLaneOp,
   ScalableMaskedAddIOp,
   ScalableMaskedAddFOp,
diff --git a/mlir/test/Dialect/ArmSVE/legalize-for-llvm.mlir 
b/mlir/test/Dialect/ArmSVE/legalize-for-llvm.mlir
index 5d044517e0ea8..47587aa26506c 100644
--- a/mlir/test/Dialect/ArmSVE/legalize-for-llvm.mlir
+++ b/mlir/test/Dialect/ArmSVE/legalize-for-llvm.mlir
@@ -48,6 +48,18 @@ f

[llvm-branch-commits] [mlir] [MLIR][ArmSVE] Add initial lowering of vector.contract to SVE `*MMLA` instructions (PR #135636)

2025-04-14 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-mlir

Author: Momchil Velikov (momchil-velikov)


Changes

Supersedes https://github.com/llvm/llvm-project/pull/135359

---

Patch is 77.36 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/135636.diff


16 Files Affected:

- (modified) mlir/include/mlir/Conversion/Passes.td (+4) 
- (modified) mlir/include/mlir/Dialect/ArmSVE/Transforms/Transforms.h (+3) 
- (modified) mlir/lib/Conversion/VectorToLLVM/CMakeLists.txt (+1) 
- (modified) mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp (+7) 
- (modified) 
mlir/lib/Dialect/ArmNeon/Transforms/LowerContractionToSMMLAPattern.cpp (+4-1) 
- (modified) mlir/lib/Dialect/ArmSVE/Transforms/CMakeLists.txt (+1) 
- (added) 
mlir/lib/Dialect/ArmSVE/Transforms/LowerContractionToSVEI8MMPattern.cpp (+304) 
- (added) mlir/test/Dialect/Vector/CPU/ArmSVE/vector-smmla.mlir (+94) 
- (added) mlir/test/Dialect/Vector/CPU/ArmSVE/vector-summla.mlir (+85) 
- (added) mlir/test/Dialect/Vector/CPU/ArmSVE/vector-ummla.mlir (+94) 
- (added) mlir/test/Dialect/Vector/CPU/ArmSVE/vector-usmmla.mlir (+95) 
- (added) 
mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/contraction-smmla-4x8x4.mlir 
(+117) 
- (added) 
mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/contraction-smmla-8x8x8-vs2.mlir
 (+159) 
- (added) 
mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/contraction-summla-4x8x4.mlir 
(+118) 
- (added) 
mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/contraction-ummla-4x8x4.mlir 
(+119) 
- (added) 
mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/contraction-usmmla-4x8x4.mlir 
(+117) 


``diff
diff --git a/mlir/include/mlir/Conversion/Passes.td 
b/mlir/include/mlir/Conversion/Passes.td
index bbba495e613b2..930d8b44abca0 100644
--- a/mlir/include/mlir/Conversion/Passes.td
+++ b/mlir/include/mlir/Conversion/Passes.td
@@ -1406,6 +1406,10 @@ def ConvertVectorToLLVMPass : 
Pass<"convert-vector-to-llvm"> {
"bool", /*default=*/"false",
"Enables the use of ArmSVE dialect while lowering the vector "
"dialect.">,
+Option<"armI8MM", "enable-arm-i8mm",
+   "bool", /*default=*/"false",
+   "Enables the use of Arm FEAT_I8MM instructions while lowering "
+   "the vector dialect.">,
 Option<"x86Vector", "enable-x86vector",
"bool", /*default=*/"false",
"Enables the use of X86Vector dialect while lowering the vector "
diff --git a/mlir/include/mlir/Dialect/ArmSVE/Transforms/Transforms.h 
b/mlir/include/mlir/Dialect/ArmSVE/Transforms/Transforms.h
index 8665c8224cc45..232e2be29e574 100644
--- a/mlir/include/mlir/Dialect/ArmSVE/Transforms/Transforms.h
+++ b/mlir/include/mlir/Dialect/ArmSVE/Transforms/Transforms.h
@@ -20,6 +20,9 @@ class RewritePatternSet;
 void populateArmSVELegalizeForLLVMExportPatterns(
 const LLVMTypeConverter &converter, RewritePatternSet &patterns);
 
+void populateLowerContractionToSVEI8MMPatternPatterns(
+RewritePatternSet &patterns);
+
 /// Configure the target to support lowering ArmSVE ops to ops that map to LLVM
 /// intrinsics.
 void configureArmSVELegalizeForExportTarget(LLVMConversionTarget &target);
diff --git a/mlir/lib/Conversion/VectorToLLVM/CMakeLists.txt 
b/mlir/lib/Conversion/VectorToLLVM/CMakeLists.txt
index 330474a718e30..8e2620029c354 100644
--- a/mlir/lib/Conversion/VectorToLLVM/CMakeLists.txt
+++ b/mlir/lib/Conversion/VectorToLLVM/CMakeLists.txt
@@ -35,6 +35,7 @@ add_mlir_conversion_library(MLIRVectorToLLVMPass
   MLIRVectorToLLVM
 
   MLIRArmNeonDialect
+  MLIRArmNeonTransforms
   MLIRArmSVEDialect
   MLIRArmSVETransforms
   MLIRAMXDialect
diff --git a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp 
b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp
index 7082b92c95d1d..1e6c8122b1d0e 100644
--- a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp
+++ b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp
@@ -14,6 +14,7 @@
 #include "mlir/Dialect/AMX/Transforms.h"
 #include "mlir/Dialect/Arith/IR/Arith.h"
 #include "mlir/Dialect/ArmNeon/ArmNeonDialect.h"
+#include "mlir/Dialect/ArmNeon/Transforms.h"
 #include "mlir/Dialect/ArmSVE/IR/ArmSVEDialect.h"
 #include "mlir/Dialect/ArmSVE/Transforms/Transforms.h"
 #include "mlir/Dialect/Func/IR/FuncOps.h"
@@ -82,6 +83,12 @@ void ConvertVectorToLLVMPass::runOnOperation() {
 populateVectorStepLoweringPatterns(patterns);
 populateVectorRankReducingFMAPattern(patterns);
 populateVectorGatherLoweringPatterns(patterns);
+if (armI8MM) {
+  if (armNeon)
+arm_neon::populateLowerContractionToSMMLAPatternPatterns(patterns);
+  if (armSVE)
+populateLowerContractionToSVEI8MMPatternPatterns(patterns);
+}
 (void)applyPatternsGreedily(getOperation(), std::move(patterns));
   }
 
diff --git 
a/mlir/lib/Dialect/ArmNeon/Transforms/LowerContractionToSMMLAPattern.cpp 
b/mlir/lib/Dialect/ArmNeon/Transforms/LowerContractionToSMMLAPattern.cpp
index 2a1271dfd6bdf..e807b

[llvm-branch-commits] [llvm] [HLSL] Adding support for Root Constants in LLVM Metadata (PR #135085)

2025-04-14 Thread Finn Plummer via llvm-branch-commits



@@ -52,6 +59,45 @@ static bool parseRootFlags(LLVMContext *Ctx, 
mcdxbc::RootSignatureDesc &RSD,
   return false;
 }
 
+static bool extractMdValue(uint32_t &Value, MDNode *Node, unsigned int OpId) {

inbelic wrote:

Maybe we could rename this to `extractMdIntValue` or the like? We will 
eventually have `float` args

https://github.com/llvm/llvm-project/pull/135085
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [GOFF] Add writing of section symbols (PR #133799)

2025-04-14 Thread via llvm-branch-commits



@@ -2759,6 +2762,29 @@ MCSection 
*TargetLoweringObjectFileXCOFF::getSectionForLSDA(
 
//===--===//
 TargetLoweringObjectFileGOFF::TargetLoweringObjectFileGOFF() = default;
 
+void TargetLoweringObjectFileGOFF::getModuleMetadata(Module &M) {
+  // Construct the default names for the root SD and the ADA PR symbol.
+  StringRef FileName = sys::path::stem(M.getSourceFileName());
+  if (FileName.size() > 1 && FileName.starts_with('<') &&
+  FileName.ends_with('>'))
+FileName = FileName.substr(1, FileName.size() - 2);
+  DefaultRootSDName = Twine(FileName).concat("#C").str();

AidoP wrote:

Thank you, that's very interesting. The documentation seems to suggest that the 
binding scope attribute only applies to LDs. Interestingly AMBLIST doesn't seem 
to display it (N/A is shown).

It seems like there are a few undocumented fields and behaviours being relied 
upon now. Is there anything being done or are there any plans for IBM to update 
the doc?

https://github.com/llvm/llvm-project/pull/133799
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [llvm] [Github][CI] Upload .ninja_log as an artifact (PR #135539)

2025-04-14 Thread Aiden Grossman via llvm-branch-commits



@@ -33,6 +33,8 @@ function at-exit {
 
   mkdir -p artifacts
   ccache --print-stats > artifacts/ccache_stats.txt
+  cp "${BUILD_DIR}"/.ninja_log artifacts/.ninja_log
+  ls artifacts/

boomanaiden154 wrote:

Leftover from testing. I've removed it. The github actions workflow should 
print out the file list.

https://github.com/llvm/llvm-project/pull/135539
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for extends and trunc (PR #132383)

2025-04-14 Thread Petar Avramovic via llvm-branch-commits



@@ -215,8 +207,8 @@ body: |
 ; CHECK: liveins: $sgpr0
 ; CHECK-NEXT: {{  $}}
 ; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
-; CHECK-NEXT: [[TRUNC:%[0-9]+]]:sgpr(s1) = G_TRUNC [[COPY]](s32)
-; CHECK-NEXT: [[ANYEXT:%[0-9]+]]:sgpr(s64) = G_ANYEXT [[TRUNC]](s1)
+; CHECK-NEXT: [[DEF:%[0-9]+]]:sgpr(s32) = G_IMPLICIT_DEF
+; CHECK-NEXT: [[MV:%[0-9]+]]:sgpr(s64) = G_MERGE_VALUES [[COPY]](s32), 
[[DEF]](s32)

petar-avramovic wrote:

Not sure if this is correct spot from the original comment but

> Isn't this a correctness regression? I'm not entirely certain because I 
> remember there was some weirdness around what G_TRUNC means semantically. Can 
> you explain why there is no need for a trunc or bitwise and or something like 
> that in the output?

G_TRUNC and G_ANYEXT are no-op with the exception when one operand is vcc. Here 
we have uniform S1 - trunc + anyext is no-op.
Trunc to vcc is clear high bits, then compare
Anyext from vcc is select

> Note that anyext_s1_to_s32_vgpr does leave a G_AND, so either that test shows 
> a code quality issue or this test is incorrect.

anyext_s1_to_s32_vgpr we need to lower vgpr trunc to vcc. And is from clearing 
high bits for icmp

https://github.com/llvm/llvm-project/pull/132383
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/20.x: [modules] Handle friend function that was a definition but became only a declaration during AST deserialization (#132214) (PR #134232)

2025-04-14 Thread Dmitry Polukhin via llvm-branch-commits

dmpolukhin wrote:

> > @nikic what do you mean by ABI change in this case? It doesn't change ABI 
> > of generated code, moreover it doesn't even change PCM serialized format 
> > because it is in memory only filed and attribute.
> 
> It changes the ABI of libclang-cpp, by changing the layout of an exported 
> type.

It looks a bit strange requirement to me and significantly reduces ability to 
fix regression in release compilers. But if it is the release requirement, this 
fix cannot be cherry-pick to clang-20 so I abandon this PR.

https://github.com/llvm/llvm-project/pull/134232
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: add RegBankLegalize rules for bit shifts and sext-inreg (PR #132385)

2025-04-14 Thread Petar Avramovic via llvm-branch-commits


https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/132385

>From 2fdf172213d449b78bc6de1ac20d493adda29dbc Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Mon, 14 Apr 2025 16:35:19 +0200
Subject: [PATCH] AMDGPU/GlobalISel: add RegBankLegalize rules for bit shifts
 and sext-inreg

Uniform S16 shifts have to be extended to S32 using appropriate Extend
before lowering to S32 instruction.
Uniform packed V2S16 are lowered to SGPR S32 instructions,
other option is to use VALU packed V2S16 and ReadAnyLane.
For uniform S32 and S64 and divergent S16, S32, S64 and V2S16 there are
instructions available.
---
 .../Target/AMDGPU/AMDGPURegBankLegalize.cpp   |   2 +-
 .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 107 ++
 .../AMDGPU/AMDGPURegBankLegalizeHelper.h  |   5 +
 .../AMDGPU/AMDGPURegBankLegalizeRules.cpp |  43 +++-
 .../AMDGPU/AMDGPURegBankLegalizeRules.h   |  11 ++
 llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll   |  10 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll   | 187 +-
 .../AMDGPU/GlobalISel/regbankselect-ashr.mir  |   6 +-
 .../AMDGPU/GlobalISel/regbankselect-lshr.mir  |  17 +-
 .../GlobalISel/regbankselect-sext-inreg.mir   |  24 +--
 .../AMDGPU/GlobalISel/regbankselect-shl.mir   |   6 +-
 .../CodeGen/AMDGPU/GlobalISel/sext_inreg.ll   |  34 ++--
 llvm/test/CodeGen/AMDGPU/GlobalISel/shl.ll|  10 +-
 13 files changed, 311 insertions(+), 151 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
index 9544c9f43eeaf..15584f16a0638 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalize.cpp
@@ -310,7 +310,7 @@ bool 
AMDGPURegBankLegalize::runOnMachineFunction(MachineFunction &MF) {
 // Opcodes that support pretty much all combinations of reg banks and LLTs
 // (except S1). There is no point in writing rules for them.
 if (Opc == AMDGPU::G_BUILD_VECTOR || Opc == AMDGPU::G_UNMERGE_VALUES ||
-Opc == AMDGPU::G_MERGE_VALUES) {
+Opc == AMDGPU::G_MERGE_VALUES || Opc == G_BITCAST) {
   RBLHelper.applyMappingTrivial(*MI);
   continue;
 }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
index 59cd23847311c..9f240c8e6a7a7 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
@@ -14,11 +14,13 @@
 #include "AMDGPURegBankLegalizeHelper.h"
 #include "AMDGPUGlobalISelUtils.h"
 #include "AMDGPUInstrInfo.h"
+#include "AMDGPURegBankLegalizeRules.h"
 #include "AMDGPURegisterBankInfo.h"
 #include "GCNSubtarget.h"
 #include "MCTargetDesc/AMDGPUMCTargetDesc.h"
 #include "llvm/CodeGen/GlobalISel/GenericMachineInstrs.h"
 #include "llvm/CodeGen/GlobalISel/MachineIRBuilder.h"
+#include "llvm/CodeGen/MachineInstr.h"
 #include "llvm/CodeGen/MachineUniformityAnalysis.h"
 #include "llvm/IR/IntrinsicsAMDGPU.h"
 #include "llvm/Support/AMDGPUAddrSpace.h"
@@ -166,6 +168,59 @@ void RegBankLegalizeHelper::lowerVccExtToSel(MachineInstr 
&MI) {
   MI.eraseFromParent();
 }
 
+std::pair RegBankLegalizeHelper::unpackZExt(Register Reg) {
+  auto PackedS32 = B.buildBitcast(SgprRB_S32, Reg);
+  auto Mask = B.buildConstant(SgprRB_S32, 0x);
+  auto Lo = B.buildAnd(SgprRB_S32, PackedS32, Mask);
+  auto Hi = B.buildLShr(SgprRB_S32, PackedS32, B.buildConstant(SgprRB_S32, 
16));
+  return {Lo.getReg(0), Hi.getReg(0)};
+}
+
+std::pair RegBankLegalizeHelper::unpackSExt(Register Reg) {
+  auto PackedS32 = B.buildBitcast(SgprRB_S32, Reg);
+  auto Lo = B.buildSExtInReg(SgprRB_S32, PackedS32, 16);
+  auto Hi = B.buildAShr(SgprRB_S32, PackedS32, B.buildConstant(SgprRB_S32, 
16));
+  return {Lo.getReg(0), Hi.getReg(0)};
+}
+
+std::pair RegBankLegalizeHelper::unpackAExt(Register Reg) {
+  auto PackedS32 = B.buildBitcast(SgprRB_S32, Reg);
+  auto Lo = PackedS32;
+  auto Hi = B.buildLShr(SgprRB_S32, PackedS32, B.buildConstant(SgprRB_S32, 
16));
+  return {Lo.getReg(0), Hi.getReg(0)};
+}
+
+void RegBankLegalizeHelper::lowerUnpack(MachineInstr &MI) {
+  Register Lo, Hi;
+  switch (MI.getOpcode()) {
+  case AMDGPU::G_SHL: {
+auto [Val0, Val1] = unpackAExt(MI.getOperand(1).getReg());
+auto [Amt0, Amt1] = unpackAExt(MI.getOperand(2).getReg());
+Lo = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val0, Amt0}).getReg(0);
+Hi = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val1, Amt1}).getReg(0);
+break;
+  }
+  case AMDGPU::G_LSHR: {
+auto [Val0, Val1] = unpackZExt(MI.getOperand(1).getReg());
+auto [Amt0, Amt1] = unpackZExt(MI.getOperand(2).getReg());
+Lo = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val0, Amt0}).getReg(0);
+Hi = B.buildInstr(MI.getOpcode(), {SgprRB_S32}, {Val1, Amt1}).getReg(0);
+break;
+  }
+  case AMDGPU::G_ASHR: {
+auto [Val0, Val1] = unpackSExt(MI.getOperand(1).getReg());
+auto [Amt0, Amt1] = unpackSExt(MI.get

[llvm-branch-commits] [clang] 7034995 - [clang] Handle Binary StingLiteral kind in one more place (#132201)

2025-04-14 Thread Tom Stellard via llvm-branch-commits


Author: Mariya Podchishchaeva
Date: 2025-04-14T12:26:02-07:00
New Revision: 7034995f102967c6f28c2d7d04913608853050ac

URL: 
https://github.com/llvm/llvm-project/commit/7034995f102967c6f28c2d7d04913608853050ac
DIFF: 
https://github.com/llvm/llvm-project/commit/7034995f102967c6f28c2d7d04913608853050ac.diff

LOG: [clang] Handle Binary StingLiteral kind in one more place (#132201)

The bots are upset by https://github.com/llvm/llvm-project/pull/127629 .
Fix that.

Added: 


Modified: 
clang/lib/Sema/SemaExprCXX.cpp

Removed: 




diff  --git a/clang/lib/Sema/SemaExprCXX.cpp b/clang/lib/Sema/SemaExprCXX.cpp
index 1e39d69e8b230..c6621402adfc9 100644
--- a/clang/lib/Sema/SemaExprCXX.cpp
+++ b/clang/lib/Sema/SemaExprCXX.cpp
@@ -4143,6 +4143,7 @@ Sema::IsStringLiteralToNonConstPointerConversion(Expr 
*From, QualType ToType) {
 // We don't allow UTF literals to be implicitly converted
 break;
   case StringLiteralKind::Ordinary:
+  case StringLiteralKind::Binary:
 return (ToPointeeType->getKind() == BuiltinType::Char_U ||
 ToPointeeType->getKind() == BuiltinType::Char_S);
   case StringLiteralKind::Wide:



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [HLSL] Adding support for Root Constants in LLVM Metadata (PR #135085)

2025-04-14 Thread via llvm-branch-commits


https://github.com/joaosaffran updated 
https://github.com/llvm/llvm-project/pull/135085

>From 9b59d0108f6b23c039e2c417247216862073cd4b Mon Sep 17 00:00:00 2001
From: joaosaffran 
Date: Wed, 9 Apr 2025 21:05:58 +
Subject: [PATCH 1/3] adding support for root constants in metadata generation

---
 llvm/lib/Target/DirectX/DXILRootSignature.cpp | 120 +-
 llvm/lib/Target/DirectX/DXILRootSignature.h   |   6 +-
 .../RootSignature-Flags-Validation-Error.ll   |   7 +-
 .../RootSignature-RootConstants.ll|  34 +
 ...ature-ShaderVisibility-Validation-Error.ll |  20 +++
 5 files changed, 182 insertions(+), 5 deletions(-)
 create mode 100644 
llvm/test/CodeGen/DirectX/ContainerData/RootSignature-RootConstants.ll
 create mode 100644 
llvm/test/CodeGen/DirectX/ContainerData/RootSignature-ShaderVisibility-Validation-Error.ll

diff --git a/llvm/lib/Target/DirectX/DXILRootSignature.cpp 
b/llvm/lib/Target/DirectX/DXILRootSignature.cpp
index 412ab7765a7ae..7686918b0fc75 100644
--- a/llvm/lib/Target/DirectX/DXILRootSignature.cpp
+++ b/llvm/lib/Target/DirectX/DXILRootSignature.cpp
@@ -40,6 +40,13 @@ static bool reportError(LLVMContext *Ctx, Twine Message,
   return true;
 }
 
+static bool reportValueError(LLVMContext *Ctx, Twine ParamName, uint32_t Value,
+ DiagnosticSeverity Severity = DS_Error) {
+  Ctx->diagnose(DiagnosticInfoGeneric(
+  "Invalid value for " + ParamName + ": " + Twine(Value), Severity));
+  return true;
+}
+
 static bool parseRootFlags(LLVMContext *Ctx, mcdxbc::RootSignatureDesc &RSD,
MDNode *RootFlagNode) {
 
@@ -52,6 +59,45 @@ static bool parseRootFlags(LLVMContext *Ctx, 
mcdxbc::RootSignatureDesc &RSD,
   return false;
 }
 
+static bool extractMdValue(uint32_t &Value, MDNode *Node, unsigned int OpId) {
+
+  auto *CI = mdconst::extract(Node->getOperand(OpId));
+  if (CI == nullptr)
+return true;
+
+  Value = CI->getZExtValue();
+  return false;
+}
+
+static bool parseRootConstants(LLVMContext *Ctx, mcdxbc::RootSignatureDesc 
&RSD,
+   MDNode *RootFlagNode) {
+
+  if (RootFlagNode->getNumOperands() != 5)
+return reportError(Ctx, "Invalid format for RootConstants Element");
+
+  mcdxbc::RootParameter NewParameter;
+  NewParameter.Header.ParameterType = dxbc::RootParameterType::Constants32Bit;
+
+  uint32_t SV;
+  if (extractMdValue(SV, RootFlagNode, 1))
+return reportError(Ctx, "Invalid value for ShaderVisibility");
+
+  NewParameter.Header.ShaderVisibility = (dxbc::ShaderVisibility)SV;
+
+  if (extractMdValue(NewParameter.Constants.ShaderRegister, RootFlagNode, 2))
+return reportError(Ctx, "Invalid value for ShaderRegister");
+
+  if (extractMdValue(NewParameter.Constants.RegisterSpace, RootFlagNode, 3))
+return reportError(Ctx, "Invalid value for RegisterSpace");
+
+  if (extractMdValue(NewParameter.Constants.Num32BitValues, RootFlagNode, 4))
+return reportError(Ctx, "Invalid value for Num32BitValues");
+
+  RSD.Parameters.push_back(NewParameter);
+
+  return false;
+}
+
 static bool parseRootSignatureElement(LLVMContext *Ctx,
   mcdxbc::RootSignatureDesc &RSD,
   MDNode *Element) {
@@ -62,12 +108,16 @@ static bool parseRootSignatureElement(LLVMContext *Ctx,
   RootSignatureElementKind ElementKind =
   StringSwitch(ElementText->getString())
   .Case("RootFlags", RootSignatureElementKind::RootFlags)
+  .Case("RootConstants", RootSignatureElementKind::RootConstants)
   .Default(RootSignatureElementKind::Error);
 
   switch (ElementKind) {
 
   case RootSignatureElementKind::RootFlags:
 return parseRootFlags(Ctx, RSD, Element);
+  case RootSignatureElementKind::RootConstants:
+return parseRootConstants(Ctx, RSD, Element);
+break;
   case RootSignatureElementKind::Error:
 return reportError(Ctx, "Invalid Root Signature Element: " +
 ElementText->getString());
@@ -94,10 +144,56 @@ static bool parse(LLVMContext *Ctx, 
mcdxbc::RootSignatureDesc &RSD,
 
 static bool verifyRootFlag(uint32_t Flags) { return (Flags & ~0xfff) == 0; }
 
+static bool verifyShaderVisibility(dxbc::ShaderVisibility Flags) {
+  switch (Flags) {
+
+  case dxbc::ShaderVisibility::All:
+  case dxbc::ShaderVisibility::Vertex:
+  case dxbc::ShaderVisibility::Hull:
+  case dxbc::ShaderVisibility::Domain:
+  case dxbc::ShaderVisibility::Geometry:
+  case dxbc::ShaderVisibility::Pixel:
+  case dxbc::ShaderVisibility::Amplification:
+  case dxbc::ShaderVisibility::Mesh:
+return true;
+  }
+
+  return false;
+}
+
+static bool verifyParameterType(dxbc::RootParameterType Flags) {
+  switch (Flags) {
+  case dxbc::RootParameterType::Constants32Bit:
+return true;
+  }
+
+  return false;
+}
+
+static bool verifyVersion(uint32_t Version) {
+  return (Version == 1 || Version == 2);
+}
+
 static bool validate(LLVMContext *Ctx, const mcdxbc::RootSignatureDesc

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: refactor issue reporting (PR #135662)

2025-04-14 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko ready_for_review 
https://github.com/llvm/llvm-project/pull/135662
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [GOFF] Add writing of section symbols (PR #133799)

2025-04-14 Thread Kai Nacke via llvm-branch-commits



@@ -0,0 +1,113 @@
+//===- MCGOFFAttributes.h - Attributes of GOFF symbols 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Defines the various attribute collections defining GOFF symbols.
+//
+//===--===//
+
+#ifndef LLVM_MC_MCGOFFATTRIBUTES_H
+#define LLVM_MC_MCGOFFATTRIBUTES_H
+
+#include "llvm/ADT/StringRef.h"
+#include "llvm/BinaryFormat/GOFF.h"
+
+namespace llvm {
+namespace GOFF {
+// An "External Symbol Definition" in the GOFF file has a type, and depending 
on
+// the type a different subset of the fields is used.
+//
+// Unlike other formats, a 2 dimensional structure is used to define the
+// location of data. For example, the equivalent of the ELF .text section is
+// made up of a Section Definition (SD) and a class (Element Definition; ED).
+// The name of the SD symbol depends on the application, while the class has 
the
+// predefined name C_CODE/C_CODE64 in AMODE31 and AMODE64 respectively.
+//
+// Data can be placed into this structure in 2 ways. First, the data (in a text
+// record) can be associated with an ED symbol. To refer to data, a Label
+// Definition (LD) is used to give an offset into the data a name. When 
binding,
+// the whole data is pulled into the resulting executable, and the addresses
+// given by the LD symbols are resolved.
+//
+// The alternative is to use a Part Definition (PR). In this case, the data (in
+// a text record) is associated with the part. When binding, only the data of
+// referenced PRs is pulled into the resulting binary.
+//
+// Both approaches are used, which means that the equivalent of a section in 
ELF
+// results in 3 GOFF symbols, either SD/ED/LD or SD/ED/PR. Moreover, certain
+// sections are fine with just defining SD/ED symbols. The SymbolMapper takes
+// care of all those details.
+
+// Attributes for SD symbols.
+struct SDAttr {
+  GOFF::ESDTaskingBehavior TaskingBehavior = GOFF::ESD_TA_Unspecified;
+  GOFF::ESDBindingScope BindingScope = GOFF::ESD_BSC_Unspecified;
+};
+
+// Attributes for ED symbols.
+struct EDAttr {
+  bool IsReadOnly = false;
+  GOFF::ESDExecutable Executable = GOFF::ESD_EXE_Unspecified;
+  GOFF::ESDAmode Amode;

redstar wrote:

No binding problems, so it seems sage to make this change. That said, amblist 
shows the Amode on PR and ED symbols.

https://github.com/llvm/llvm-project/pull/133799
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Gadget scanner: detect authentication oracles (PR #135663)

2025-04-14 Thread Anatoly Trosinenko via llvm-branch-commits


https://github.com/atrosinenko ready_for_review 
https://github.com/llvm/llvm-project/pull/135663
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] a141e58 - [llvm][CodeGen] avoid repeated interval calculation in window scheduler (#132352)

2025-04-14 Thread Tom Stellard via llvm-branch-commits


Author: Hua Tian
Date: 2025-04-14T12:34:31-07:00
New Revision: a141e58685fd8d07b13751dcce0d2fca27b93848

URL: 
https://github.com/llvm/llvm-project/commit/a141e58685fd8d07b13751dcce0d2fca27b93848
DIFF: 
https://github.com/llvm/llvm-project/commit/a141e58685fd8d07b13751dcce0d2fca27b93848.diff

LOG: [llvm][CodeGen] avoid repeated interval calculation in window scheduler 
(#132352)

Some new registers are reused when replacing some old ones in
certain use case of ModuloScheduleExpander. It is necessary to
avoid repeated interval calculations for these registers.

(cherry picked from commit 7e65944292278cc245e36cc6ca971654d584012d)

Added: 
llvm/test/CodeGen/AArch64/aarch64-swp-ws-live-intervals.mir

Modified: 
llvm/include/llvm/CodeGen/ModuloSchedule.h
llvm/lib/CodeGen/ModuloSchedule.cpp

Removed: 




diff  --git a/llvm/include/llvm/CodeGen/ModuloSchedule.h 
b/llvm/include/llvm/CodeGen/ModuloSchedule.h
index e2fb1daf9fb4e..64598ce449a44 100644
--- a/llvm/include/llvm/CodeGen/ModuloSchedule.h
+++ b/llvm/include/llvm/CodeGen/ModuloSchedule.h
@@ -188,9 +188,6 @@ class ModuloScheduleExpander {
   /// Instructions to change when emitting the final schedule.
   InstrChangesTy InstrChanges;
 
-  /// Record the registers that need to compute live intervals.
-  SmallVector NoIntervalRegs;
-
   void generatePipelinedLoop();
   void generateProlog(unsigned LastStage, MachineBasicBlock *KernelBB,
   ValueMapTy *VRMap, MBBVectorTy &PrologBBs);
@@ -214,7 +211,6 @@ class ModuloScheduleExpander {
   void addBranches(MachineBasicBlock &PreheaderBB, MBBVectorTy &PrologBBs,
MachineBasicBlock *KernelBB, MBBVectorTy &EpilogBBs,
ValueMapTy *VRMap);
-  void calculateIntervals();
   bool computeDelta(MachineInstr &MI, unsigned &Delta);
   void updateMemOperands(MachineInstr &NewMI, MachineInstr &OldMI,
  unsigned Num);

diff  --git a/llvm/lib/CodeGen/ModuloSchedule.cpp 
b/llvm/lib/CodeGen/ModuloSchedule.cpp
index fcfb3c1f7073d..7792a0eaa285b 100644
--- a/llvm/lib/CodeGen/ModuloSchedule.cpp
+++ b/llvm/lib/CodeGen/ModuloSchedule.cpp
@@ -181,10 +181,6 @@ void ModuloScheduleExpander::generatePipelinedLoop() {
   // Add branches between prolog and epilog blocks.
   addBranches(*Preheader, PrologBBs, KernelBB, EpilogBBs, VRMap);
 
-  // The intervals of newly created virtual registers are calculated after the
-  // kernel expansion.
-  calculateIntervals();
-
   delete[] VRMap;
   delete[] VRMapPhi;
 }
@@ -546,10 +542,8 @@ void ModuloScheduleExpander::generateExistingPhis(
   if (VRMap[LastStageNum - np - 1].count(LoopVal))
 PhiOp2 = VRMap[LastStageNum - np - 1][LoopVal];
 
-  if (IsLast && np == NumPhis - 1) {
+  if (IsLast && np == NumPhis - 1)
 replaceRegUsesAfterLoop(Def, NewReg, BB, MRI);
-NoIntervalRegs.push_back(NewReg);
-  }
   continue;
 }
   }
@@ -589,10 +583,8 @@ void ModuloScheduleExpander::generateExistingPhis(
   // Check if we need to rename any uses that occurs after the loop. The
   // register to replace depends on whether the Phi is scheduled in the
   // epilog.
-  if (IsLast && np == NumPhis - 1) {
+  if (IsLast && np == NumPhis - 1)
 replaceRegUsesAfterLoop(Def, NewReg, BB, MRI);
-NoIntervalRegs.push_back(NewReg);
-  }
 
   // In the kernel, a dependent Phi uses the value from this Phi.
   if (InKernel)
@@ -612,10 +604,8 @@ void ModuloScheduleExpander::generateExistingPhis(
 if (NumStages == 0 && IsLast) {
   auto &CurStageMap = VRMap[CurStageNum];
   auto It = CurStageMap.find(LoopVal);
-  if (It != CurStageMap.end()) {
+  if (It != CurStageMap.end())
 replaceRegUsesAfterLoop(Def, It->second, BB, MRI);
-NoIntervalRegs.push_back(It->second);
-  }
 }
   }
 }
@@ -735,10 +725,8 @@ void ModuloScheduleExpander::generatePhis(
 rewriteScheduledInstr(NewBB, InstrMap, CurStageNum, np, &*BBI, Def,
   NewReg);
 }
-if (IsLast && np == NumPhis - 1) {
+if (IsLast && np == NumPhis - 1)
   replaceRegUsesAfterLoop(Def, NewReg, BB, MRI);
-  NoIntervalRegs.push_back(NewReg);
-}
   }
 }
   }
@@ -950,14 +938,6 @@ void ModuloScheduleExpander::addBranches(MachineBasicBlock 
&PreheaderBB,
   }
 }
 
-/// Some registers are generated during the kernel expansion. We calculate the
-/// live intervals of these registers after the expansion.
-void ModuloScheduleExpander::calculateIntervals() {
-  for (Register Reg : NoIntervalRegs)
-LIS.createAndComputeVirtRegInterval(Reg);
-  NoIntervalRegs.clear();
-}
-
 /// Return true if we can compute the amount the instruction changes
 /// during each iteration. Set Delta to the amount of the change.
 bool Modu

1 2 >

1 - 100 of 167 matches

Mail list logo