[llvm-branch-commits] [X86][NewPM] Port lower-amx-intrinsics to NewPM (PR #165113)
https://github.com/boomanaiden154 updated https://github.com/llvm/llvm-project/pull/165113 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [X86][NewPM] Port lower-amx-intrinsics to NewPM (PR #165113)
https://github.com/boomanaiden154 updated https://github.com/llvm/llvm-project/pull/165113 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LVer][profcheck] explicitly set unknown branch weights for the versioned/unversioned selector (PR #164507)
@@ -109,8 +110,13 @@ void LoopVersioning::versionLoop( // Insert the conditional branch based on the result of the memchecks. Instruction *OrigTerm = RuntimeCheckBB->getTerminator(); Builder.SetInsertPoint(OrigTerm); - Builder.CreateCondBr(RuntimeCheck, NonVersionedLoop->getLoopPreheader(), - VersionedLoop->getLoopPreheader()); + auto *BI = + Builder.CreateCondBr(RuntimeCheck, NonVersionedLoop->getLoopPreheader(), + VersionedLoop->getLoopPreheader()); + // We don't know what the probability of executing the versioned vs the + // unversioned variants is. + setExplicitlyUnknownBranchWeightsIfProfiled( + *BI, *BI->getParent()->getParent(), DEBUG_TYPE); fhahn wrote: Actually, looks like the argument can be removed alltogether https://github.com/llvm/llvm-project/pull/166028 https://github.com/llvm/llvm-project/pull/164507 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LVer][profcheck] explicitly set unknown branch weights for the versioned/unversioned selector (PR #164507)
@@ -109,8 +110,13 @@ void LoopVersioning::versionLoop( // Insert the conditional branch based on the result of the memchecks. Instruction *OrigTerm = RuntimeCheckBB->getTerminator(); Builder.SetInsertPoint(OrigTerm); - Builder.CreateCondBr(RuntimeCheck, NonVersionedLoop->getLoopPreheader(), - VersionedLoop->getLoopPreheader()); + auto *BI = + Builder.CreateCondBr(RuntimeCheck, NonVersionedLoop->getLoopPreheader(), + VersionedLoop->getLoopPreheader()); + // We don't know what the probability of executing the versioned vs the + // unversioned variants is. + setExplicitlyUnknownBranchWeightsIfProfiled( + *BI, *BI->getParent()->getParent(), DEBUG_TYPE); fhahn wrote: Or not, looks like InstCombine passes disconnected instructions https://github.com/llvm/llvm-project/pull/164507 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] Add Testing Configuration for LLVM libc (PR #165120)
https://github.com/boomanaiden154 updated https://github.com/llvm/llvm-project/pull/165120 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] Add Testing Configuration for LLVM libc (PR #165120)
https://github.com/boomanaiden154 updated https://github.com/llvm/llvm-project/pull/165120 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LVer][profcheck] explicitly set unknown branch weights for the versioned/unversioned selector (PR #164507)
@@ -109,8 +110,13 @@ void LoopVersioning::versionLoop( // Insert the conditional branch based on the result of the memchecks. Instruction *OrigTerm = RuntimeCheckBB->getTerminator(); Builder.SetInsertPoint(OrigTerm); - Builder.CreateCondBr(RuntimeCheck, NonVersionedLoop->getLoopPreheader(), - VersionedLoop->getLoopPreheader()); + auto *BI = + Builder.CreateCondBr(RuntimeCheck, NonVersionedLoop->getLoopPreheader(), + VersionedLoop->getLoopPreheader()); + // We don't know what the probability of executing the versioned vs the + // unversioned variants is. + setExplicitlyUnknownBranchWeightsIfProfiled( + *BI, *BI->getParent()->getParent(), DEBUG_TYPE); fhahn wrote: ```suggestion *BI, *BI->getFunction(), DEBUG_TYPE); ``` https://github.com/llvm/llvm-project/pull/164507 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [X86][NewPM] Port lower-amx-intrinsics to NewPM (PR #165113)
@@ -179,7 +179,18 @@ FunctionPass *createX86LowerAMXTypeLegacyPass();
/// The pass transforms amx intrinsics to scalar operation if the function has
/// optnone attribute or it is O0.
-FunctionPass *createX86LowerAMXIntrinsicsPass();
+class X86LowerAMXIntrinsicsPass
+: public PassInfoMixin {
+private:
+ const TargetMachine *TM;
+
+public:
+ X86LowerAMXIntrinsicsPass(const TargetMachine *TM) : TM(TM) {}
+ PreservedAnalyses run(Function &F, FunctionAnalysisManager &FAM);
+ static bool isRequired() { return true; }
boomanaiden154 wrote:
I'm not sure we should be using a backend pass enabled at O0 to remove
redundant stack load/stores if those are supposed to be cleaned up in the
middle end.
Either way, this patch just intends to port the pass, not fix any latent issues
or clean up any latent tech debt of this sort.
https://github.com/llvm/llvm-project/pull/165113
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [Instrumentor] Allow printing a runtime stub (PR #138978)
https://github.com/kevinsala updated
https://github.com/llvm/llvm-project/pull/138978
>From 371483a750a456459d054a56787b40e946ab2890 Mon Sep 17 00:00:00 2001
From: Kevin Sala
Date: Tue, 6 May 2025 22:48:41 -0700
Subject: [PATCH 1/2] [Instrumentor] Allow printing a runtime stub
---
.../llvm/Transforms/IPO/Instrumentor.h| 16 ++
.../Transforms/IPO/InstrumentorStubPrinter.h | 32 +++
llvm/lib/Transforms/IPO/CMakeLists.txt| 1 +
llvm/lib/Transforms/IPO/Instrumentor.cpp | 3 +
.../IPO/InstrumentorStubPrinter.cpp | 210 ++
.../Instrumentor/bad_rt_config.json | 105 +
.../Instrumentor/default_config.json | 2 +
.../Instrumentation/Instrumentor/default_rt | 37 +++
.../Instrumentor/generate_bad_rt.ll | 3 +
.../Instrumentor/generate_rt.ll | 2 +
.../Instrumentor/load_store_config.json | 2 +-
.../load_store_noreplace_config.json | 2 +-
.../Instrumentor/rt_config.json | 105 +
13 files changed, 518 insertions(+), 2 deletions(-)
create mode 100644 llvm/include/llvm/Transforms/IPO/InstrumentorStubPrinter.h
create mode 100644 llvm/lib/Transforms/IPO/InstrumentorStubPrinter.cpp
create mode 100644 llvm/test/Instrumentation/Instrumentor/bad_rt_config.json
create mode 100644 llvm/test/Instrumentation/Instrumentor/default_rt
create mode 100644 llvm/test/Instrumentation/Instrumentor/generate_bad_rt.ll
create mode 100644 llvm/test/Instrumentation/Instrumentor/generate_rt.ll
create mode 100644 llvm/test/Instrumentation/Instrumentor/rt_config.json
diff --git a/llvm/include/llvm/Transforms/IPO/Instrumentor.h
b/llvm/include/llvm/Transforms/IPO/Instrumentor.h
index 26445d221d00f..e6d5f717072a2 100644
--- a/llvm/include/llvm/Transforms/IPO/Instrumentor.h
+++ b/llvm/include/llvm/Transforms/IPO/Instrumentor.h
@@ -116,6 +116,18 @@ struct IRTCallDescription {
InstrumentorIRBuilderTy &IIRB, const DataLayout &DL,
InstrumentationCaches &ICaches);
+ /// Create a string representation of the function declaration in C. Two
+ /// strings are returned: the function definition with direct arguments and
+ /// the function with any indirect argument.
+ std::pair
+ createCSignature(const InstrumentationConfig &IConf) const;
+
+ /// Create a string representation of the function definition in C. The
+ /// function body implements a stub and only prints the passed arguments. Two
+ /// strings are returned: the function definition with direct arguments and
+ /// the function with any indirect argument.
+ std::pair createCBodies() const;
+
/// Return whether the \p IRTA argument can be replaced.
bool isReplacable(IRTArg &IRTA) const {
return (IRTA.Flags & (IRTArg::REPLACABLE | IRTArg::REPLACABLE_CUSTOM));
@@ -334,6 +346,9 @@ struct InstrumentationConfig {
InstrumentationConfig() : SS(StringAllocator) {
RuntimePrefix = BaseConfigurationOption::getStringOption(
*this, "runtime_prefix", "The runtime API prefix.", "__instrumentor_");
+RuntimeStubsFile = BaseConfigurationOption::getStringOption(
+*this, "runtime_stubs_file",
+"The file into which runtime stubs should be written.", "");
TargetRegex = BaseConfigurationOption::getStringOption(
*this, "target_regex",
"Regular expression to be matched against the module target. "
@@ -380,6 +395,7 @@ struct InstrumentationConfig {
/// The base configuration options.
BaseConfigurationOption *RuntimePrefix;
+ BaseConfigurationOption *RuntimeStubsFile;
BaseConfigurationOption *TargetRegex;
BaseConfigurationOption *HostEnabled;
BaseConfigurationOption *GPUEnabled;
diff --git a/llvm/include/llvm/Transforms/IPO/InstrumentorStubPrinter.h
b/llvm/include/llvm/Transforms/IPO/InstrumentorStubPrinter.h
new file mode 100644
index 0..6e1e24d5fef9e
--- /dev/null
+++ b/llvm/include/llvm/Transforms/IPO/InstrumentorStubPrinter.h
@@ -0,0 +1,32 @@
+//===- Transforms/IPO/InstrumentorStubPrinter.h
---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// A generator of Instrumentor's runtime stubs.
+//
+//===--===//
+
+#ifndef LLVM_TRANSFORMS_IPO_INSTRUMENTOR_STUB_PRINTER_H
+#define LLVM_TRANSFORMS_IPO_INSTRUMENTOR_STUB_PRINTER_H
+
+#include "llvm/ADT/StringRef.h"
+#include "llvm/IR/Module.h"
+#include "llvm/Transforms/IPO/Instrumentor.h"
+
+namespace llvm {
+namespace instrumentor {
+
+/// Print a runtime stub file with the implementation of the instrumentation
+/// runtime functions corresponding to the instrumentation opportunities
+/// enabled.
+void pri
[llvm-branch-commits] Revert "[X86] Narrow BT/BTC/BTR/BTS compare + RMW patterns on very large integers (#165540)" (PR #165979)
https://github.com/vitalybuka edited https://github.com/llvm/llvm-project/pull/165979 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] Revert "[X86] Narrow BT/BTC/BTR/BTS compare + RMW patterns on very large integers (#165540)" (PR #165979)
https://github.com/vitalybuka created https://github.com/llvm/llvm-project/pull/165979 This reverts commit a55a7207c7e4d98dad32e8d53dd5964ee833edd9. ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] Revert "[X86] Narrow BT/BTC/BTR/BTS compare + RMW patterns on very large integers (#165540)" (PR #165979)
llvmbot wrote:
@llvm/pr-subscribers-backend-x86
Author: Vitaly Buka (vitalybuka)
Changes
This reverts commit a55a7207c7e4d98dad32e8d53dd5964ee833edd9.
---
Patch is 338.85 KiB, truncated to 20.00 KiB below, full version:
https://github.com/llvm/llvm-project/pull/165979.diff
2 Files Affected:
- (modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+2-112)
- (modified) llvm/test/CodeGen/X86/bittest-big-integer.ll (+6331-994)
``diff
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 6f75a2eb7075a..c5fb5535d0057 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -53344,80 +53344,6 @@ static SDValue combineMaskedStore(SDNode *N,
SelectionDAG &DAG,
return SDValue();
}
-// Look for a RMW operation that only touches one bit of a larger than legal
-// type and fold it to a BTC/BTR/BTS pattern acting on a single i32 sub value.
-static SDValue narrowBitOpRMW(StoreSDNode *St, const SDLoc &DL,
- SelectionDAG &DAG,
- const X86Subtarget &Subtarget) {
- using namespace SDPatternMatch;
-
- // Only handle normal stores and its chain was a matching normal load.
- auto *Ld = dyn_cast(St->getChain());
- if (!ISD::isNormalStore(St) || !St->isSimple() || !Ld ||
- !ISD::isNormalLoad(Ld) || !Ld->isSimple() ||
- Ld->getBasePtr() != St->getBasePtr() ||
- Ld->getOffset() != St->getOffset())
-return SDValue();
-
- SDValue LoadVal(Ld, 0);
- SDValue StoredVal = St->getValue();
- EVT VT = StoredVal.getValueType();
-
- // Only narrow larger than legal scalar integers.
- if (!VT.isScalarInteger() ||
- VT.getSizeInBits() <= (Subtarget.is64Bit() ? 64 : 32))
-return SDValue();
-
- // BTR: X & ~(1 << ShAmt)
- // BTS: X | (1 << ShAmt)
- // BTC: X ^ (1 << ShAmt)
- SDValue ShAmt;
- if (!StoredVal.hasOneUse() ||
- !(sd_match(StoredVal, m_And(m_Specific(LoadVal),
- m_Not(m_Shl(m_One(), m_Value(ShAmt) ||
-sd_match(StoredVal,
- m_Or(m_Specific(LoadVal), m_Shl(m_One(), m_Value(ShAmt ||
-sd_match(StoredVal,
- m_Xor(m_Specific(LoadVal), m_Shl(m_One(), m_Value(ShAmt))
-return SDValue();
-
- // Ensure the shift amount is in bounds.
- KnownBits KnownAmt = DAG.computeKnownBits(ShAmt);
- if (KnownAmt.getMaxValue().uge(VT.getSizeInBits()))
-return SDValue();
-
- // Split the shift into an alignment shift that moves the active i32 block to
- // the bottom bits for truncation and a modulo shift that can act on the i32.
- EVT AmtVT = ShAmt.getValueType();
- SDValue AlignAmt = DAG.getNode(ISD::AND, DL, AmtVT, ShAmt,
- DAG.getSignedConstant(-32LL, DL, AmtVT));
- SDValue ModuloAmt =
- DAG.getNode(ISD::AND, DL, AmtVT, ShAmt, DAG.getConstant(31, DL, AmtVT));
-
- // Compute the byte offset for the i32 block that is changed by the RMW.
- // combineTruncate will adjust the load for us in a similar way.
- EVT PtrVT = St->getBasePtr().getValueType();
- SDValue PtrBitOfs = DAG.getZExtOrTrunc(AlignAmt, DL, PtrVT);
- SDValue PtrByteOfs = DAG.getNode(ISD::SRL, DL, PtrVT, PtrBitOfs,
- DAG.getShiftAmountConstant(3, PtrVT, DL));
- SDValue NewPtr = DAG.getMemBasePlusOffset(St->getBasePtr(), PtrByteOfs, DL,
-SDNodeFlags::NoUnsignedWrap);
-
- // Reconstruct the BTC/BTR/BTS pattern for the i32 block and store.
- SDValue X = DAG.getNode(ISD::SRL, DL, VT, LoadVal, AlignAmt);
- X = DAG.getNode(ISD::TRUNCATE, DL, MVT::i32, X);
-
- SDValue Mask =
- DAG.getNode(ISD::SHL, DL, MVT::i32, DAG.getConstant(1, DL, MVT::i32),
- DAG.getZExtOrTrunc(ModuloAmt, DL, MVT::i8));
- if (StoredVal.getOpcode() == ISD::AND)
-Mask = DAG.getNOT(DL, Mask, MVT::i32);
-
- SDValue Res = DAG.getNode(StoredVal.getOpcode(), DL, MVT::i32, X, Mask);
- return DAG.getStore(St->getChain(), DL, Res, NewPtr, St->getPointerInfo(),
- Align(), St->getMemOperand()->getFlags());
-}
-
static SDValue combineStore(SDNode *N, SelectionDAG &DAG,
TargetLowering::DAGCombinerInfo &DCI,
const X86Subtarget &Subtarget) {
@@ -53644,9 +53570,6 @@ static SDValue combineStore(SDNode *N, SelectionDAG
&DAG,
}
}
- if (SDValue R = narrowBitOpRMW(St, dl, DAG, Subtarget))
-return R;
-
// Convert store(cmov(load(p), x, CC), p) to cstore(x, p, CC)
// store(cmov(x, load(p), CC), p) to cstore(x, p, InvertCC)
if ((VT == MVT::i16 || VT == MVT::i32 || VT == MVT::i64) &&
@@ -54579,9 +54502,8 @@ static SDValue combineTruncate(SDNode *N, SelectionDAG
&DAG,
// truncation, see if we can convert the shift into a pointer offset instead.
// Limit this to normal (non-ext) scalar integer loads.
if (SrcVT.isScalarInteger() && Src.getOpcode() == ISD::SRL
[llvm-branch-commits] [llvm] [openmp] [OpenMP][Offload] Add offload runtime support for dyn_groupprivate clause (PR #152831)
https://github.com/kevinsala updated
https://github.com/llvm/llvm-project/pull/152831
>From fa3c7425ae9e5ffea83841f2be61b0f494b99038 Mon Sep 17 00:00:00 2001
From: Kevin Sala
Date: Fri, 8 Aug 2025 11:25:14 -0700
Subject: [PATCH 1/4] [OpenMP][Offload] Add offload runtime support for
dyn_groupprivate clause
---
offload/DeviceRTL/include/DeviceTypes.h | 4 +
offload/DeviceRTL/include/Interface.h | 2 +-
offload/DeviceRTL/include/State.h | 2 +-
offload/DeviceRTL/src/Kernel.cpp | 14 +-
offload/DeviceRTL/src/State.cpp | 48 +-
offload/include/Shared/APITypes.h | 6 +-
offload/include/Shared/Environment.h | 4 +-
offload/include/device.h | 3 +
offload/include/omptarget.h | 7 +-
offload/libomptarget/OpenMP/API.cpp | 14 ++
offload/libomptarget/device.cpp | 6 +
offload/libomptarget/exports | 1 +
.../amdgpu/dynamic_hsa/hsa_ext_amd.h | 1 +
offload/plugins-nextgen/amdgpu/src/rtl.cpp| 34 +++--
.../common/include/PluginInterface.h | 33 +++-
.../common/src/PluginInterface.cpp| 86 ---
.../plugins-nextgen/cuda/dynamic_cuda/cuda.h | 1 +
offload/plugins-nextgen/cuda/src/rtl.cpp | 37 +++--
offload/plugins-nextgen/host/src/rtl.cpp | 4 +-
.../offloading/dyn_groupprivate_strict.cpp| 141 ++
openmp/runtime/src/include/omp.h.var | 10 ++
openmp/runtime/src/kmp_csupport.cpp | 9 ++
openmp/runtime/src/kmp_stub.cpp | 16 ++
23 files changed, 418 insertions(+), 65 deletions(-)
create mode 100644 offload/test/offloading/dyn_groupprivate_strict.cpp
diff --git a/offload/DeviceRTL/include/DeviceTypes.h
b/offload/DeviceRTL/include/DeviceTypes.h
index 2e5d92380f040..a43b506d6879e 100644
--- a/offload/DeviceRTL/include/DeviceTypes.h
+++ b/offload/DeviceRTL/include/DeviceTypes.h
@@ -163,4 +163,8 @@ typedef enum omp_allocator_handle_t {
///}
+enum omp_access_t {
+ omp_access_cgroup = 0,
+};
+
#endif
diff --git a/offload/DeviceRTL/include/Interface.h
b/offload/DeviceRTL/include/Interface.h
index c4bfaaa2404b4..672afea206785 100644
--- a/offload/DeviceRTL/include/Interface.h
+++ b/offload/DeviceRTL/include/Interface.h
@@ -222,7 +222,7 @@ struct KernelEnvironmentTy;
int8_t __kmpc_is_spmd_exec_mode();
int32_t __kmpc_target_init(KernelEnvironmentTy &KernelEnvironment,
- KernelLaunchEnvironmentTy &KernelLaunchEnvironment);
+ KernelLaunchEnvironmentTy *KernelLaunchEnvironment);
void __kmpc_target_deinit();
diff --git a/offload/DeviceRTL/include/State.h
b/offload/DeviceRTL/include/State.h
index db396dae6e445..17c3c6f2d3e42 100644
--- a/offload/DeviceRTL/include/State.h
+++ b/offload/DeviceRTL/include/State.h
@@ -116,7 +116,7 @@ extern Local ThreadStates;
/// Initialize the state machinery. Must be called by all threads.
void init(bool IsSPMD, KernelEnvironmentTy &KernelEnvironment,
- KernelLaunchEnvironmentTy &KernelLaunchEnvironment);
+ KernelLaunchEnvironmentTy *KernelLaunchEnvironment);
/// Return the kernel and kernel launch environment associated with the current
/// kernel. The former is static and contains compile time information that
diff --git a/offload/DeviceRTL/src/Kernel.cpp b/offload/DeviceRTL/src/Kernel.cpp
index 467e44a65276c..58e9a09105a76 100644
--- a/offload/DeviceRTL/src/Kernel.cpp
+++ b/offload/DeviceRTL/src/Kernel.cpp
@@ -34,8 +34,8 @@ enum OMPTgtExecModeFlags : unsigned char {
};
static void
-inititializeRuntime(bool IsSPMD, KernelEnvironmentTy &KernelEnvironment,
-KernelLaunchEnvironmentTy &KernelLaunchEnvironment) {
+initializeRuntime(bool IsSPMD, KernelEnvironmentTy &KernelEnvironment,
+ KernelLaunchEnvironmentTy *KernelLaunchEnvironment) {
// Order is important here.
synchronize::init(IsSPMD);
mapping::init(IsSPMD);
@@ -80,17 +80,17 @@ extern "C" {
/// \param Ident Source location identification, can be NULL.
///
int32_t __kmpc_target_init(KernelEnvironmentTy &KernelEnvironment,
- KernelLaunchEnvironmentTy &KernelLaunchEnvironment)
{
+ KernelLaunchEnvironmentTy *KernelLaunchEnvironment)
{
ConfigurationEnvironmentTy &Configuration = KernelEnvironment.Configuration;
bool IsSPMD = Configuration.ExecMode & OMP_TGT_EXEC_MODE_SPMD;
bool UseGenericStateMachine = Configuration.UseGenericStateMachine;
if (IsSPMD) {
-inititializeRuntime(/*IsSPMD=*/true, KernelEnvironment,
-KernelLaunchEnvironment);
+initializeRuntime(/*IsSPMD=*/true, KernelEnvironment,
+ KernelLaunchEnvironment);
synchronize::threadsAligned(atomic::relaxed);
} else {
-inititializeRuntime(/*IsSPMD=*/false, KernelEnvironment,
-KernelLaunchEnv
