https://github.com/jayfoad approved this pull request.
https://github.com/llvm/llvm-project/pull/146054
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
@@ -392,6 +394,55 @@ void
AMDGPURegBankCombinerImpl::applyCanonicalizeZextShiftAmt(
MI.eraseFromParent();
}
+bool AMDGPURegBankCombinerImpl::lowerUniformBFX(MachineInstr &MI) const {
+ assert(MI.getOpcode() == TargetOpcode::G_UBFX ||
+ MI.getOpcode() == TargetOpcod
@@ -392,6 +394,55 @@ void
AMDGPURegBankCombinerImpl::applyCanonicalizeZextShiftAmt(
MI.eraseFromParent();
}
+bool AMDGPURegBankCombinerImpl::lowerUniformBFX(MachineInstr &MI) const {
+ assert(MI.getOpcode() == TargetOpcode::G_UBFX ||
+ MI.getOpcode() == TargetOpcod
https://github.com/jayfoad commented:
No test changes? Is it possible to test any of this? We have
`regbankcombiner-*` tests for some things.
https://github.com/llvm/llvm-project/pull/141589
___
llvm-branch-commits mailing list
llvm-branch-commits@lis
@@ -392,6 +394,55 @@ void
AMDGPURegBankCombinerImpl::applyCanonicalizeZextShiftAmt(
MI.eraseFromParent();
}
+bool AMDGPURegBankCombinerImpl::lowerUniformBFX(MachineInstr &MI) const {
+ assert(MI.getOpcode() == TargetOpcode::G_UBFX ||
+ MI.getOpcode() == TargetOpcod
https://github.com/jayfoad edited
https://github.com/llvm/llvm-project/pull/141589
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
@@ -392,6 +394,55 @@ void
AMDGPURegBankCombinerImpl::applyCanonicalizeZextShiftAmt(
MI.eraseFromParent();
}
+bool AMDGPURegBankCombinerImpl::lowerUniformBFX(MachineInstr &MI) const {
+ assert(MI.getOpcode() == TargetOpcode::G_UBFX ||
+ MI.getOpcode() == TargetOpcod
jayfoad wrote:
Does this also handle the case where _all_ of the values ORed together are
shifted, like `(setcc ((x >> c0 | x >> c1 | ...) & mask))` ?
https://github.com/llvm/llvm-project/pull/146054
___
llvm-branch-commits mailing list
llvm-branch-co
@@ -140,3 +140,92 @@ bool CombinerHelper::matchCanonicalizeFCmp(const
MachineInstr &MI,
return false;
}
+
+bool CombinerHelper::combineMergedBFXCompare(MachineInstr &MI) const {
+ const GICmp *Cmp = cast(&MI);
+
+ ICmpInst::Predicate CC = Cmp->getCond();
+ if (CC != CmpI
@@ -28909,13 +28909,99 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc
&DL, SDValue N0, SDValue N1,
return SDValue();
}
+static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG,
+ const TargetLowering &TLI) {
+ // Match a pattern suc
@@ -28909,13 +28909,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc
&DL, SDValue N0, SDValue N1,
return SDValue();
}
+static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG,
+ const TargetLowering &TLI) {
+ // Match a pattern suc
@@ -0,0 +1,553 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
UTC_ARGS: --version 5
+; RUN: llc -O3 -mtriple=amdgcn -mcpu=fiji %s -o - | FileCheck %s
--check-prefixes=GFX8,DAGISEL-GFX9
jayfoad wrote:
```suggestion
; RUN: llc -
@@ -0,0 +1,553 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
UTC_ARGS: --version 5
jayfoad wrote:
Nit: I think "workitem-intrinsic-opts" sounds better
https://github.com/llvm/llvm-project/pull/146053
_
https://github.com/jayfoad edited
https://github.com/llvm/llvm-project/pull/146053
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
https://github.com/jayfoad approved this pull request.
LGTM with nits
https://github.com/llvm/llvm-project/pull/146053
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-c
@@ -28909,13 +28909,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc
&DL, SDValue N0, SDValue N1,
return SDValue();
}
+static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG,
+ const TargetLowering &TLI) {
+ // Match a pattern suc
@@ -28909,13 +28909,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc
&DL, SDValue N0, SDValue N1,
return SDValue();
}
+static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG,
+ const TargetLowering &TLI) {
+ // Match a pattern suc
@@ -28909,13 +28909,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc
&DL, SDValue N0, SDValue N1,
return SDValue();
}
+static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG,
+ const TargetLowering &TLI) {
+ // Match a pattern suc
@@ -28909,13 +28909,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc
&DL, SDValue N0, SDValue N1,
return SDValue();
}
+static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG,
+ const TargetLowering &TLI) {
+ // Match a pattern suc
@@ -28909,13 +28909,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc
&DL, SDValue N0, SDValue N1,
return SDValue();
}
+static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG,
+ const TargetLowering &TLI) {
+ // Match a pattern suc
jayfoad wrote:
> This is not a simplifying pass, it is making the IR more complicated. We have
> to do hacks like this to prevent later more profitable combines from needing
> to parse out expanded IR:
Fair enough, makes sense. I just want to make sure the justification is
properly understood
jayfoad wrote:
I don't understand the high level motivation here. "Normal"
combining/simplification order is to visit the operands of an instruction
before you visit the instruction itself. That way the "visit" function can
assume that the operands have already been simplified. GlobalISel comb
@@ -380,7 +477,8 @@ bool SIFoldOperandsImpl::canUseImmWithOpSel(FoldCandidate
&Fold) const {
return true;
}
-bool SIFoldOperandsImpl::tryFoldImmWithOpSel(FoldCandidate &Fold) const {
+bool SIFoldOperandsImpl::tryFoldImmWithOpSel(FoldCandidate &Fold,
+
@@ -25,52 +25,151 @@ using namespace llvm;
namespace {
-struct FoldCandidate {
- MachineInstr *UseMI;
+/// Track a value we may want to fold into downstream users, applying
+/// subregister extracts along the way.
+struct FoldableDef {
union {
-MachineOperand *OpToFol
https://github.com/jayfoad commented:
The idea seems good. I haven't reviewed it all in detail.
https://github.com/llvm/llvm-project/pull/140608
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mail
@@ -25,52 +25,151 @@ using namespace llvm;
namespace {
-struct FoldCandidate {
- MachineInstr *UseMI;
+/// Track a value we may want to fold into downstream users, applying
+/// subregister extracts along the way.
+struct FoldableDef {
union {
-MachineOperand *OpToFol
@@ -25,52 +25,151 @@ using namespace llvm;
namespace {
-struct FoldCandidate {
- MachineInstr *UseMI;
+/// Track a value we may want to fold into downstream users, applying
+/// subregister extracts along the way.
+struct FoldableDef {
union {
-MachineOperand *OpToFol
https://github.com/jayfoad edited
https://github.com/llvm/llvm-project/pull/140608
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
@@ -874,6 +874,30 @@ unsigned GISelValueTracking::computeNumSignBits(Register R,
SrcTy.getScalarSizeInBits());
break;
}
+ case TargetOpcode::G_SHUFFLE_VECTOR: {
+// Collect the minimum number of sign bits that are shared by ever
@@ -874,6 +874,30 @@ unsigned GISelValueTracking::computeNumSignBits(Register R,
SrcTy.getScalarSizeInBits());
break;
}
+ case TargetOpcode::G_SHUFFLE_VECTOR: {
+// Collect the minimum number of sign bits that are shared by ever
https://github.com/jayfoad edited
https://github.com/llvm/llvm-project/pull/139505
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
https://github.com/jayfoad requested changes to this pull request.
https://github.com/llvm/llvm-project/pull/139503
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-comm
@@ -864,6 +864,16 @@ unsigned GISelValueTracking::computeNumSignBits(Register R,
return TyBits - 1; // Every always-zero bit is a sign bit.
break;
}
+ case TargetOpcode::G_ASHR: {
+Register Src1 = MI.getOperand(1).getReg();
+Register Src2 = MI.getOperand(2)
@@ -19,7 +19,7 @@ body: |
; GFX12-NEXT: {{ $}}
; GFX12-NEXT: renamable $vgpr0 = GLOBAL_LOAD_DWORD_SADDR renamable
$sgpr2_sgpr3, killed $vgpr0, 0, 0, implicit $exec :: (load (s32), addrspace 1)
; GFX12-NEXT: GLOBAL_INV 16, implicit $exec
-; GFX12-NEXT: S_WAIT_L
@@ -2130,13 +2140,14 @@ void
SIInsertWaitcnts::updateEventWaitcntAfter(MachineInstr &Inst,
ScoreBrackets->updateByEvent(TII, TRI, MRI, LDS_ACCESS, Inst);
}
} else if (TII->isFLAT(Inst)) {
-// TODO: Track this properly.
-if (isCacheInvOrWBInst(Inst))
+if
@@ -19,7 +19,7 @@ body: |
; GFX12-NEXT: {{ $}}
; GFX12-NEXT: renamable $vgpr0 = GLOBAL_LOAD_DWORD_SADDR renamable
$sgpr2_sgpr3, killed $vgpr0, 0, 0, implicit $exec :: (load (s32), addrspace 1)
; GFX12-NEXT: GLOBAL_INV 16, implicit $exec
-; GFX12-NEXT: S_WAIT_L
https://github.com/jayfoad edited
https://github.com/llvm/llvm-project/pull/135340
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
https://github.com/jayfoad approved this pull request.
Patch looks OK to me, unless you are still worried about the global_inv loadcnt
decrement ordering thing.
Removing unnecessary waits at a function call boundary can be done as a
separate optimization.
https://github.com/llvm/llvm-project/
@@ -2294,10 +2294,14 @@ collectPromotionCandidates(MemorySSA *MSSA,
AliasAnalysis *AA, Loop *L) {
AliasSetTracker AST(BatchAA);
auto IsPotentiallyPromotable = [L](const Instruction *I) {
-if (const auto *SI = dyn_cast(I))
- return L->isLoopInvariant(SI->getPointe
https://github.com/jayfoad approved this pull request.
https://github.com/llvm/llvm-project/pull/130041
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
@@ -258,6 +258,14 @@ def sext_trunc_sextload : GICombineRule<
[{ return Helper.matchSextTruncSextLoad(*${d}); }]),
(apply [{ Helper.applySextTruncSextLoad(*${d}); }])>;
+def sext_trunc_sextinreg : GICombineRule<
+ (defs root:$dst),
+ (match (G_SEXT_INREG $sir, $sr
jayfoad wrote:
> > Then I think the poison-propagating rules for this intrinsic should be
> > documented. They're not obvious and "it just does what the underlying
> > instruction does" is no longer sufficient.
>
> We're currently not propagating poison for these intrinsics, and this patch
>
@@ -258,6 +258,14 @@ def sext_trunc_sextload : GICombineRule<
[{ return Helper.matchSextTruncSextLoad(*${d}); }]),
(apply [{ Helper.applySextTruncSextLoad(*${d}); }])>;
+def sext_trunc_sextinreg : GICombineRule<
+ (defs root:$dst),
+ (match (G_SEXT_INREG $sir, $sr
@@ -258,6 +258,14 @@ def sext_trunc_sextload : GICombineRule<
[{ return Helper.matchSextTruncSextLoad(*${d}); }]),
(apply [{ Helper.applySextTruncSextLoad(*${d}); }])>;
+def sext_trunc_sextinreg : GICombineRule<
+ (defs root:$dst),
+ (match (G_SEXT_INREG $sir, $sr
jayfoad wrote:
> > Same kind of objection as #131287: as a general strategy, "replace unused
> > inputs with poison"
>
> Repeating from the other review, but this is not the case. Poison does not
> unconditionally fold through intrinsics. This is specific to an operand for
> an intrinsic. It
jayfoad wrote:
> We do it when the semantics allow it.
My concern is that it is not obvious when the semantics allow it, when you have
a plethora of undocumented target intrinsics. I guess the grown-up solution is
to document them properly.
https://github.com/llvm/llvm-project/pull/131287
___
jayfoad wrote:
Same kind of objection as #131287: as a general strategy, "replace unused
inputs with poison" seems incompatible with "propagate poison from arguments to
result". @nunoplopes any thoughts on this?
https://github.com/llvm/llvm-project/pull/131288
_
jayfoad wrote:
I have a conceptual objection: I don't think we can do both of these things:
1. Replace unused inputs of all intrinsics with poison
2. Propagate poison from any argument, for all intrinsics
So how should we handle this in general? Is it better to replace unused inputs
with "free
https://github.com/jayfoad approved this pull request.
https://github.com/llvm/llvm-project/pull/131286
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
jayfoad wrote:
> We cannot mark these as nocallback or nosync.
I think we can and should mark these as `nocallback`. I don't know how well it
is documented, but I think in practice `nocallback` means that the intrinsic
does not call back into user code **synchronously, in the current thread,
jayfoad wrote:
This needs motivation. What does `norecurse` mean for an intrinsic and how does
it differ from `nocallback`? What sort of intrinsic would be `norecurse` but
not `nocallback`?
https://github.com/llvm/llvm-project/pull/125015
___
llvm-br
https://github.com/jayfoad approved this pull request.
https://github.com/llvm/llvm-project/pull/124531
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
@@ -1035,8 +1035,10 @@ bool PeepholeOptimizer::findNextSource(RegSubRegPair
RegSubReg,
return false;
// Insert the Def -> Use entry for the recently found source.
- ValueTrackerResult CurSrcRes = RewriteMap.lookup(CurSrcPair);
- if (CurSrcRes.isValid()
https://github.com/jayfoad approved this pull request.
https://github.com/llvm/llvm-project/pull/123943
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
jayfoad wrote:
> I observed something while porting this pass. The analysis LiveIntervals
> (LIS) uses the SlotIndexes (SI). There is no explicit use of SI in this pass.
> If we have to preserve LIS, it required us to preserve SI as well. When I
> initially failed to preserve SI, the following
@@ -1949,6 +1949,13 @@ bool SITargetLowering::isExtractSubvectorCheap(EVT
ResVT, EVT SrcVT,
return Index == 0;
}
+bool SITargetLowering::isExtractVecEltCheap(EVT VT, unsigned Index) const {
+ // TODO: This should be more aggressive, particular for 16-bit element
+ // vect
@@ -1949,6 +1949,13 @@ bool SITargetLowering::isExtractSubvectorCheap(EVT
ResVT, EVT SrcVT,
return Index == 0;
}
+bool SITargetLowering::isExtractVecEltCheap(EVT VT, unsigned Index) const {
+ // TODO: This should be more aggressive, particular for 16-bit element
+ // vect
https://github.com/jayfoad approved this pull request.
LGTM
https://github.com/llvm/llvm-project/pull/122460
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
https://github.com/jayfoad edited
https://github.com/llvm/llvm-project/pull/122460
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
@@ -308,6 +308,9 @@ def StackProtectStrong : EnumAttr<"sspstrong",
IntersectPreserve, [FnAttr]>;
/// Function was called in a scope requiring strict floating point semantics.
def StrictFP : EnumAttr<"strictfp", IntersectPreserve, [FnAttr]>;
+/// Function is a floating point o
jayfoad wrote:
Why doesn't this fall out naturally from splitting the 64-bit add into 32-bit
parts and then simplifying each part? Do we leave it as a 64-bit add all the
way until final instruction selection?
https://github.com/llvm/llvm-project/pull/122049
jayfoad wrote:
> If we have out of bounds indexing, these will now clamp down to
> a low bit which may CSE with the operations on the low half of the
> wave.
Should mention that in the comment in the code. It was not clear to me why you
would want to clamp constants.
https://github.com/llvm/ll
https://github.com/jayfoad created
https://github.com/llvm/llvm-project/pull/116653
This falls out naturally after inlining finishType into its only remaining use.
>From 4140bc772f5930807cb2ea5b4b2aa945c57b699c Mon Sep 17 00:00:00 2001
From: Jay Foad
Date: Mon, 18 Nov 2024 16:36:33 +
Subje
@@ -1494,7 +1494,8 @@ def FeatureISAVersion9_5_Common : FeatureSet<
[FeatureFP8Insts,
FeatureFP8ConversionInsts,
FeatureCvtFP8VOP1Bug,
- FeatureGFX950Insts
+ FeatureGFX950Insts,
+ FeatureAddressableLocalMemorySize163840
jayfoad wrote:
This means
@@ -1110,6 +1110,13 @@ def FeatureRequiresCOV6 :
SubtargetFeature<"requires-cov6",
"Target Requires Code Object V6"
>;
+def FeatureXF32Insts : SubtargetFeature<"xf32-insts",
+ "HasXF32Insts",
+ "true",
+ "Has instructions that support xf32 format, such as "
+ "v_mfm
https://github.com/jayfoad approved this pull request.
> Change existing code to match what LLVM-IR version is doing
Yeah, looks reasonable to me.
https://github.com/llvm/llvm-project/pull/112866
___
llvm-branch-commits mailing list
llvm-branch-commit
https://github.com/jayfoad approved this pull request.
https://github.com/llvm/llvm-project/pull/110470
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
https://github.com/jayfoad approved this pull request.
LGTM, thanks! Please also backport to release/19.x.
https://github.com/llvm/llvm-project/pull/110256
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/c
@@ -1911,7 +1911,7 @@ bool AMDGPUDAGToDAGISel::SelectScratchSAddr(SDNode
*Parent, SDValue Addr,
0);
}
- Offset = CurDAG->getTargetConstant(COffsetVal, DL, MVT::i16);
+ Offset = CurDAG->getTargetConstant(COffsetVal, DL, MVT::i32);
jayfo
jayfoad wrote:
> > Is this PR a fix for a regression or a critical issue?
>
> No, I believe it has been broken for about 3 years (since
> [d7e03df](https://github.com/llvm/llvm-project/commit/d7e03df719464354b20a845b7853be57da863924))
> but it was only reported to me recently.
>
> I guess thi
jayfoad wrote:
> Is this PR a fix for a regression or a critical issue?
No, I believe it has been broken for about 3 years (since
d7e03df719464354b20a845b7853be57da863924) but it was only reported to me
recently.
I guess this means it is not appropriate for 19.1.0.
https://github.com/llvm/ll
jayfoad wrote:
> > This sounds sketchy to me. Is it really valid to enter a second call inside
> > another call's CALLSEQ markers, but only if we avoid adding a second nested
> > set of markers? It feels like attacking the symptom of the issue, but not
> > the root cause. (I'm not certain it's
https://github.com/jayfoad edited
https://github.com/llvm/llvm-project/pull/106977
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
jayfoad wrote:
This is a backport of #105831.
https://github.com/llvm/llvm-project/pull/106977
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
https://github.com/jayfoad created
https://github.com/llvm/llvm-project/pull/106977
SMUL_LOHI and UMUL_LOHI are different operations because the high part of the
result is different, so it is not OK to optimize the signed version to
MUL_U24/MULHI_U24 or the unsigned version to MUL_I24/MULHI_I2
https://github.com/jayfoad milestoned
https://github.com/llvm/llvm-project/pull/106977
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
jayfoad wrote:
I'm not sure if I should have done three different backport requests for the
three commits. It could be confusing if they get squash-and-merged onto the
release branch.
https://github.com/llvm/llvm-project/pull/105808
___
llvm-branch-c
jayfoad wrote:
### Merge activity
* **Aug 22, 6:34 AM EDT**: @jayfoad started a stack merge that includes this
pull request via
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/105549).
https://github.com/llvm/llvm-project/pull/105549
__
https://github.com/jayfoad edited
https://github.com/llvm/llvm-project/pull/105549
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
@@ -4371,8 +4375,10 @@ define amdgpu_kernel void
@global_sextload_v64i16_to_v64i32(ptr addrspace(1) %ou
; GCN-NOHSA-SI-NEXT:buffer_store_dwordx4 v[8:11], off, s[0:3], 0 offset:48
; GCN-NOHSA-SI-NEXT:buffer_store_dwordx4 v[4:7], off, s[0:3], 0
; GCN-NOHSA-SI-NEXT:bu
@@ -754,13 +754,21 @@ define amdgpu_kernel void
@constant_load_v16i16_align2(ptr addrspace(4) %ptr0) #
; GFX12-NEXT:global_load_u16 v6, v8, s[0:1] offset:8
; GFX12-NEXT:global_load_u16 v5, v8, s[0:1] offset:4
; GFX12-NEXT:global_load_u16 v4, v8, s[0:1]
+; GFX12-NEX
@@ -953,6 +953,12 @@ def FeatureRequiredExportPriority :
SubtargetFeature<"required-export-priority",
"Export priority must be explicitly manipulated on GFX11.5"
>;
+def FeatureVmemWriteVgprInOrder : SubtargetFeature<"vmem-write-vgpr-in-order",
jayfoad wrot
@@ -1778,11 +1778,12 @@ bool
SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr &MI,
if (IsVGPR) {
// RAW always needs an s_waitcnt. WAW needs an s_waitcnt unless the
// previous write and this write are the same type of VMEM
-
https://github.com/jayfoad ready_for_review
https://github.com/llvm/llvm-project/pull/105550
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
https://github.com/jayfoad ready_for_review
https://github.com/llvm/llvm-project/pull/105549
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
jayfoad wrote:
> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is
> open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/105550?utm_source=stack-comment-downstack-mergeability-warning";
jayfoad wrote:
> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is
> open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/105549?utm_source=stack-comment-downstack-mergeability-warning";
https://github.com/jayfoad created
https://github.com/llvm/llvm-project/pull/105550
When a loop contains a VMEM load whose result is only used outside the
loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for
vmcnt will be required inside the loop anyway, because VMEM instruct
https://github.com/jayfoad created
https://github.com/llvm/llvm-project/pull/105549
Fix SIInsertWaitcnts to account for this by adding extra waits to avoid
WAW dependencies.
>From 9a2103df4094af38f59e1adce5414b94672e6d6e Mon Sep 17 00:00:00 2001
From: Jay Foad
Date: Wed, 21 Aug 2024 16:23:49 +
https://github.com/jayfoad approved this pull request.
https://github.com/llvm/llvm-project/pull/105472
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
https://github.com/jayfoad approved this pull request.
LGTM for backporting.
https://github.com/llvm/llvm-project/pull/102446
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-b
@@ -54,11 +54,11 @@ define i32 @abs_nonpoison(i32 %arg) {
; FAST-NEXT: Cost Model: Found an estimated cost of 80 for instruction:
%V16I32 = call <16 x i32> @llvm.abs.v16i32(<16 x i32> undef, i1 false)
; FAST-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I16
@@ -0,0 +1,366 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii
-passes=slp-vectorizer,instcombine %s | FileCheck -check-prefixes=GCN,GFX7 %s
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=fiji
https://github.com/jayfoad edited
https://github.com/llvm/llvm-project/pull/100513
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
https://github.com/jayfoad approved this pull request.
LGTM.
https://github.com/llvm/llvm-project/pull/100513
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
jayfoad wrote:
> Looks worse for x86 without the fabs check. Not sure if this is useful for
> any targets.
Seems unlikely that this would ever be profitable in the ordered case, since
you can implement that with pretty simple integer checks on the exponent field.
(Check that it isn't 0 and is
https://github.com/jayfoad approved this pull request.
Makes sense to me.
For the ordered case I think this would only be profitable if fabs is free
_and_ you don't have integer "test"-style instructions.
https://github.com/llvm/llvm-project/pull/100390
https://github.com/jayfoad approved this pull request.
LGTM.
https://github.com/llvm/llvm-project/pull/96163
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
@@ -34,18 +34,17 @@ entry:
}
define amdgpu_kernel void @test_llvm_amdgcn_fdot2_bf16_bf16_dpp(
-; SDAG-GFX11-LABEL: test_llvm_amdgcn_fdot2_bf16_bf16_dpp:
-; SDAG-GFX11: ; %bb.0: ; %entry
-; SDAG-GFX11-NEXT:s_load_b128 s[0:3], s[0:1], 0x24
-; SDAG-GFX11-NEXT:s_wait
https://github.com/jayfoad edited
https://github.com/llvm/llvm-project/pull/96163
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
1 - 100 of 183 matches
Mail list logo