from:"Jay Foad via llvm\-branch\-commits"

[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)

2025-07-01 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request. https://github.com/llvm/llvm-project/pull/146054 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] Move S_BFE lowering into RegBankCombiner (PR #141589)

2025-07-01 Thread Jay Foad via llvm-branch-commits

@@ -392,6 +394,55 @@ void AMDGPURegBankCombinerImpl::applyCanonicalizeZextShiftAmt( MI.eraseFromParent(); } +bool AMDGPURegBankCombinerImpl::lowerUniformBFX(MachineInstr &MI) const { + assert(MI.getOpcode() == TargetOpcode::G_UBFX || + MI.getOpcode() == TargetOpcod

[llvm-branch-commits] [llvm] [AMDGPU] Move S_BFE lowering into RegBankCombiner (PR #141589)

2025-07-01 Thread Jay Foad via llvm-branch-commits

@@ -392,6 +394,55 @@ void AMDGPURegBankCombinerImpl::applyCanonicalizeZextShiftAmt( MI.eraseFromParent(); } +bool AMDGPURegBankCombinerImpl::lowerUniformBFX(MachineInstr &MI) const { + assert(MI.getOpcode() == TargetOpcode::G_UBFX || + MI.getOpcode() == TargetOpcod

[llvm-branch-commits] [llvm] [AMDGPU] Move S_BFE lowering into RegBankCombiner (PR #141589)

2025-07-01 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad commented: No test changes? Is it possible to test any of this? We have `regbankcombiner-*` tests for some things. https://github.com/llvm/llvm-project/pull/141589 ___ llvm-branch-commits mailing list llvm-branch-commits@lis

[llvm-branch-commits] [llvm] [AMDGPU] Move S_BFE lowering into RegBankCombiner (PR #141589)

2025-07-01 Thread Jay Foad via llvm-branch-commits

@@ -392,6 +394,55 @@ void AMDGPURegBankCombinerImpl::applyCanonicalizeZextShiftAmt( MI.eraseFromParent(); } +bool AMDGPURegBankCombinerImpl::lowerUniformBFX(MachineInstr &MI) const { + assert(MI.getOpcode() == TargetOpcode::G_UBFX || + MI.getOpcode() == TargetOpcod

[llvm-branch-commits] [llvm] [AMDGPU] Move S_BFE lowering into RegBankCombiner (PR #141589)

2025-07-01 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad edited https://github.com/llvm/llvm-project/pull/141589 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] Move S_BFE lowering into RegBankCombiner (PR #141589)

2025-07-01 Thread Jay Foad via llvm-branch-commits

@@ -392,6 +394,55 @@ void AMDGPURegBankCombinerImpl::applyCanonicalizeZextShiftAmt( MI.eraseFromParent(); } +bool AMDGPURegBankCombinerImpl::lowerUniformBFX(MachineInstr &MI) const { + assert(MI.getOpcode() == TargetOpcode::G_UBFX || + MI.getOpcode() == TargetOpcod

[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)

2025-06-30 Thread Jay Foad via llvm-branch-commits

jayfoad wrote: Does this also handle the case where _all_ of the values ORed together are shifted, like `(setcc ((x >> c0 | x >> c1 | ...) & mask))` ? https://github.com/llvm/llvm-project/pull/146054 ___ llvm-branch-commits mailing list llvm-branch-co

[llvm-branch-commits] [llvm] [GISel] Combine compare of bitfield extracts or'd together. (PR #146055)

2025-06-30 Thread Jay Foad via llvm-branch-commits

@@ -140,3 +140,92 @@ bool CombinerHelper::matchCanonicalizeFCmp(const MachineInstr &MI, return false; } + +bool CombinerHelper::combineMergedBFXCompare(MachineInstr &MI) const { + const GICmp *Cmp = cast(&MI); + + ICmpInst::Predicate CC = Cmp->getCond(); + if (CC != CmpI

[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)

2025-06-30 Thread Jay Foad via llvm-branch-commits

@@ -28909,13 +28909,99 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc &DL, SDValue N0, SDValue N1, return SDValue(); } +static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG, + const TargetLowering &TLI) { + // Match a pattern suc

[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)

2025-06-30 Thread Jay Foad via llvm-branch-commits

@@ -28909,13 +28909,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc &DL, SDValue N0, SDValue N1, return SDValue(); } +static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG, + const TargetLowering &TLI) { + // Match a pattern suc

[llvm-branch-commits] [llvm] [AMDGPU] Add tests for workgroup/workitem intrinsic optimizations (PR #146053)

2025-06-27 Thread Jay Foad via llvm-branch-commits

@@ -0,0 +1,553 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc -O3 -mtriple=amdgcn -mcpu=fiji %s -o - | FileCheck %s --check-prefixes=GFX8,DAGISEL-GFX9 jayfoad wrote: ```suggestion ; RUN: llc -

[llvm-branch-commits] [llvm] [AMDGPU] Add tests for workgroup/workitem intrinsic optimizations (PR #146053)

2025-06-27 Thread Jay Foad via llvm-branch-commits

@@ -0,0 +1,553 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 jayfoad wrote: Nit: I think "workitem-intrinsic-opts" sounds better https://github.com/llvm/llvm-project/pull/146053 _

[llvm-branch-commits] [llvm] [AMDGPU] Add tests for workgroup/workitem intrinsic optimizations (PR #146053)

2025-06-27 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad edited https://github.com/llvm/llvm-project/pull/146053 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] Add tests for workgroup/workitem intrinsic optimizations (PR #146053)

2025-06-27 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request. LGTM with nits https://github.com/llvm/llvm-project/pull/146053 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-c

[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)

2025-06-27 Thread Jay Foad via llvm-branch-commits

@@ -28909,13 +28909,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc &DL, SDValue N0, SDValue N1, return SDValue(); } +static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG, + const TargetLowering &TLI) { + // Match a pattern suc

[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)

2025-06-27 Thread Jay Foad via llvm-branch-commits

@@ -28909,13 +28909,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc &DL, SDValue N0, SDValue N1, return SDValue(); } +static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG, + const TargetLowering &TLI) { + // Match a pattern suc

[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)

2025-06-27 Thread Jay Foad via llvm-branch-commits

@@ -28909,13 +28909,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc &DL, SDValue N0, SDValue N1, return SDValue(); } +static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG, + const TargetLowering &TLI) { + // Match a pattern suc

[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)

2025-06-27 Thread Jay Foad via llvm-branch-commits

@@ -28909,13 +28909,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc &DL, SDValue N0, SDValue N1, return SDValue(); } +static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG, + const TargetLowering &TLI) { + // Match a pattern suc

[llvm-branch-commits] [llvm] [DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (PR #146054)

2025-06-27 Thread Jay Foad via llvm-branch-commits

@@ -28909,13 +28909,97 @@ SDValue DAGCombiner::SimplifySelectCC(const SDLoc &DL, SDValue N0, SDValue N1, return SDValue(); } +static SDValue matchMergedBFX(SDValue Root, SelectionDAG &DAG, + const TargetLowering &TLI) { + // Match a pattern suc

[llvm-branch-commits] [llvm] [AMDGPU] Use reverse iteration in CodeGenPrepare (PR #145484)

2025-06-25 Thread Jay Foad via llvm-branch-commits

jayfoad wrote: > This is not a simplifying pass, it is making the IR more complicated. We have > to do hacks like this to prevent later more profitable combines from needing > to parse out expanded IR: Fair enough, makes sense. I just want to make sure the justification is properly understood

[llvm-branch-commits] [llvm] [AMDGPU] Use reverse iteration in CodeGenPrepare (PR #145484)

2025-06-24 Thread Jay Foad via llvm-branch-commits

jayfoad wrote: I don't understand the high level motivation here. "Normal" combining/simplification order is to visit the operands of an instruction before you visit the instruction itself. That way the "visit" function can assume that the operands have already been simplified. GlobalISel comb

[llvm-branch-commits] [llvm] AMDGPU: Fix tracking subreg defs when folding through reg_sequence (PR #140608)

2025-05-28 Thread Jay Foad via llvm-branch-commits

@@ -380,7 +477,8 @@ bool SIFoldOperandsImpl::canUseImmWithOpSel(FoldCandidate &Fold) const { return true; } -bool SIFoldOperandsImpl::tryFoldImmWithOpSel(FoldCandidate &Fold) const { +bool SIFoldOperandsImpl::tryFoldImmWithOpSel(FoldCandidate &Fold, +

[llvm-branch-commits] [llvm] AMDGPU: Fix tracking subreg defs when folding through reg_sequence (PR #140608)

2025-05-28 Thread Jay Foad via llvm-branch-commits

@@ -25,52 +25,151 @@ using namespace llvm; namespace { -struct FoldCandidate { - MachineInstr *UseMI; +/// Track a value we may want to fold into downstream users, applying +/// subregister extracts along the way. +struct FoldableDef { union { -MachineOperand *OpToFol

[llvm-branch-commits] [llvm] AMDGPU: Fix tracking subreg defs when folding through reg_sequence (PR #140608)

2025-05-28 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad commented: The idea seems good. I haven't reviewed it all in detail. https://github.com/llvm/llvm-project/pull/140608 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mail

[llvm-branch-commits] [llvm] AMDGPU: Fix tracking subreg defs when folding through reg_sequence (PR #140608)

2025-05-28 Thread Jay Foad via llvm-branch-commits

@@ -25,52 +25,151 @@ using namespace llvm; namespace { -struct FoldCandidate { - MachineInstr *UseMI; +/// Track a value we may want to fold into downstream users, applying +/// subregister extracts along the way. +struct FoldableDef { union { -MachineOperand *OpToFol

[llvm-branch-commits] [llvm] AMDGPU: Fix tracking subreg defs when folding through reg_sequence (PR #140608)

2025-05-28 Thread Jay Foad via llvm-branch-commits

@@ -25,52 +25,151 @@ using namespace llvm; namespace { -struct FoldCandidate { - MachineInstr *UseMI; +/// Track a value we may want to fold into downstream users, applying +/// subregister extracts along the way. +struct FoldableDef { union { -MachineOperand *OpToFol

[llvm-branch-commits] [llvm] AMDGPU: Fix tracking subreg defs when folding through reg_sequence (PR #140608)

2025-05-28 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad edited https://github.com/llvm/llvm-project/pull/140608 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [GlobalISel] Add computeNumSignBits for G_SHUFFLE_VECTOR (PR #139505)

2025-05-12 Thread Jay Foad via llvm-branch-commits

@@ -874,6 +874,30 @@ unsigned GISelValueTracking::computeNumSignBits(Register R, SrcTy.getScalarSizeInBits()); break; } + case TargetOpcode::G_SHUFFLE_VECTOR: { +// Collect the minimum number of sign bits that are shared by ever

[llvm-branch-commits] [llvm] [GlobalISel] Add computeNumSignBits for G_SHUFFLE_VECTOR (PR #139505)

2025-05-12 Thread Jay Foad via llvm-branch-commits

@@ -874,6 +874,30 @@ unsigned GISelValueTracking::computeNumSignBits(Register R, SrcTy.getScalarSizeInBits()); break; } + case TargetOpcode::G_SHUFFLE_VECTOR: { +// Collect the minimum number of sign bits that are shared by ever

[llvm-branch-commits] [llvm] [GlobalISel] Add computeNumSignBits for G_SHUFFLE_VECTOR (PR #139505)

2025-05-12 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad edited https://github.com/llvm/llvm-project/pull/139505 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [GlobalISel] Add computeNumSignBits for ASHR (PR #139503)

2025-05-12 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad requested changes to this pull request. https://github.com/llvm/llvm-project/pull/139503 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-comm

[llvm-branch-commits] [llvm] [GlobalISel] Add computeNumSignBits for ASHR (PR #139503)

2025-05-12 Thread Jay Foad via llvm-branch-commits

@@ -864,6 +864,16 @@ unsigned GISelValueTracking::computeNumSignBits(Register R, return TyBits - 1; // Every always-zero bit is a sign bit. break; } + case TargetOpcode::G_ASHR: { +Register Src1 = MI.getOperand(1).getReg(); +Register Src2 = MI.getOperand(2)

[llvm-branch-commits] [llvm] [AMDGPU][InsertWaitCnts] Track global_wb/inv/wbinv (PR #135340)

2025-04-22 Thread Jay Foad via llvm-branch-commits

@@ -19,7 +19,7 @@ body: | ; GFX12-NEXT: {{ $}} ; GFX12-NEXT: renamable $vgpr0 = GLOBAL_LOAD_DWORD_SADDR renamable $sgpr2_sgpr3, killed $vgpr0, 0, 0, implicit $exec :: (load (s32), addrspace 1) ; GFX12-NEXT: GLOBAL_INV 16, implicit $exec -; GFX12-NEXT: S_WAIT_L

[llvm-branch-commits] [llvm] [AMDGPU][InsertWaitCnts] Track global_wb/inv/wbinv (PR #135340)

2025-04-22 Thread Jay Foad via llvm-branch-commits

@@ -2130,13 +2140,14 @@ void SIInsertWaitcnts::updateEventWaitcntAfter(MachineInstr &Inst, ScoreBrackets->updateByEvent(TII, TRI, MRI, LDS_ACCESS, Inst); } } else if (TII->isFLAT(Inst)) { -// TODO: Track this properly. -if (isCacheInvOrWBInst(Inst)) +if

[llvm-branch-commits] [llvm] [AMDGPU][InsertWaitCnts] Track global_wb/inv/wbinv (PR #135340)

2025-04-22 Thread Jay Foad via llvm-branch-commits

@@ -19,7 +19,7 @@ body: | ; GFX12-NEXT: {{ $}} ; GFX12-NEXT: renamable $vgpr0 = GLOBAL_LOAD_DWORD_SADDR renamable $sgpr2_sgpr3, killed $vgpr0, 0, 0, implicit $exec :: (load (s32), addrspace 1) ; GFX12-NEXT: GLOBAL_INV 16, implicit $exec -; GFX12-NEXT: S_WAIT_L

[llvm-branch-commits] [llvm] [AMDGPU][InsertWaitCnts] Track global_wb/inv/wbinv (PR #135340)

2025-04-22 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad edited https://github.com/llvm/llvm-project/pull/135340 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][InsertWaitCnts] Track global_wb/inv/wbinv (PR #135340)

2025-04-22 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request. Patch looks OK to me, unless you are still worried about the global_inv loadcnt decrement ordering thing. Removing unnecessary waits at a function call boundary can be done as a separate optimization. https://github.com/llvm/llvm-project/

[llvm-branch-commits] [llvm] LICM: Avoid looking at use list of constant data (PR #134690)

2025-04-07 Thread Jay Foad via llvm-branch-commits

@@ -2294,10 +2294,14 @@ collectPromotionCandidates(MemorySSA *MSSA, AliasAnalysis *AA, Loop *L) { AliasSetTracker AST(BatchAA); auto IsPotentiallyPromotable = [L](const Instruction *I) { -if (const auto *SI = dyn_cast(I)) - return L->isLoopInvariant(SI->getPointe

[llvm-branch-commits] [llvm] [AMDGPU] Support image_bvh8_intersect_ray instruction and intrinsic. (PR #130041)

2025-03-18 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request. https://github.com/llvm/llvm-project/pull/130041 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) (PR #131312)

2025-03-14 Thread Jay Foad via llvm-branch-commits

@@ -258,6 +258,14 @@ def sext_trunc_sextload : GICombineRule< [{ return Helper.matchSextTruncSextLoad(*${d}); }]), (apply [{ Helper.applySextTruncSextLoad(*${d}); }])>; +def sext_trunc_sextinreg : GICombineRule< + (defs root:$dst), + (match (G_SEXT_INREG $sir, $sr

[llvm-branch-commits] [llvm] AMDGPU: Replace unused permlane inputs with poison instead of undef (PR #131288)

2025-03-14 Thread Jay Foad via llvm-branch-commits

jayfoad wrote: > > Then I think the poison-propagating rules for this intrinsic should be > > documented. They're not obvious and "it just does what the underlying > > instruction does" is no longer sufficient. > > We're currently not propagating poison for these intrinsics, and this patch >

[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) (PR #131312)

2025-03-14 Thread Jay Foad via llvm-branch-commits

@@ -258,6 +258,14 @@ def sext_trunc_sextload : GICombineRule< [{ return Helper.matchSextTruncSextLoad(*${d}); }]), (apply [{ Helper.applySextTruncSextLoad(*${d}); }])>; +def sext_trunc_sextinreg : GICombineRule< + (defs root:$dst), + (match (G_SEXT_INREG $sir, $sr

[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) (PR #131312)

2025-03-14 Thread Jay Foad via llvm-branch-commits

@@ -258,6 +258,14 @@ def sext_trunc_sextload : GICombineRule< [{ return Helper.matchSextTruncSextLoad(*${d}); }]), (apply [{ Helper.applySextTruncSextLoad(*${d}); }])>; +def sext_trunc_sextinreg : GICombineRule< + (defs root:$dst), + (match (G_SEXT_INREG $sir, $sr

[llvm-branch-commits] [llvm] AMDGPU: Replace unused permlane inputs with poison instead of undef (PR #131288)

2025-03-14 Thread Jay Foad via llvm-branch-commits

jayfoad wrote: > > Same kind of objection as #131287: as a general strategy, "replace unused > > inputs with poison" > > Repeating from the other review, but this is not the case. Poison does not > unconditionally fold through intrinsics. This is specific to an operand for > an intrinsic. It

[llvm-branch-commits] [llvm] AMDGPU: Replace unused update.dpp inputs with poison instead of undef (PR #131287)

2025-03-14 Thread Jay Foad via llvm-branch-commits

jayfoad wrote: > We do it when the semantics allow it. My concern is that it is not obvious when the semantics allow it, when you have a plethora of undocumented target intrinsics. I guess the grown-up solution is to document them properly. https://github.com/llvm/llvm-project/pull/131287 ___

[llvm-branch-commits] [llvm] AMDGPU: Replace unused permlane inputs with poison instead of undef (PR #131288)

2025-03-14 Thread Jay Foad via llvm-branch-commits

jayfoad wrote: Same kind of objection as #131287: as a general strategy, "replace unused inputs with poison" seems incompatible with "propagate poison from arguments to result". @nunoplopes any thoughts on this? https://github.com/llvm/llvm-project/pull/131288 _

[llvm-branch-commits] [llvm] AMDGPU: Replace unused update.dpp inputs with poison instead of undef (PR #131287)

2025-03-14 Thread Jay Foad via llvm-branch-commits

jayfoad wrote: I have a conceptual objection: I don't think we can do both of these things: 1. Replace unused inputs of all intrinsics with poison 2. Propagate poison from any argument, for all intrinsics So how should we handle this in general? Is it better to replace unused inputs with "free

[llvm-branch-commits] [llvm] AMDGPU: Replace unused export inputs with poison instead of undef (PR #131286)

2025-03-14 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request. https://github.com/llvm/llvm-project/pull/131286 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Mark sendmsg intrinsics as norecurse (PR #125016)

2025-01-30 Thread Jay Foad via llvm-branch-commits

jayfoad wrote: > We cannot mark these as nocallback or nosync. I think we can and should mark these as `nocallback`. I don't know how well it is documented, but I think in practice `nocallback` means that the intrinsic does not call back into user code **synchronously, in the current thread,

[llvm-branch-commits] [llvm] TableGen: Add intrinsic property for norecurse (PR #125015)

2025-01-30 Thread Jay Foad via llvm-branch-commits

jayfoad wrote: This needs motivation. What does `norecurse` mean for an intrinsic and how does it differ from `nocallback`? What sort of intrinsic would be `norecurse` but not `nocallback`? https://github.com/llvm/llvm-project/pull/125015 ___ llvm-br

[llvm-branch-commits] [llvm] PeepholeOpt: Avoid double map lookup (PR #124531)

2025-01-27 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request. https://github.com/llvm/llvm-project/pull/124531 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] PeepholeOpt: Avoid double map lookup (PR #124531)

2025-01-27 Thread Jay Foad via llvm-branch-commits

@@ -1035,8 +1035,10 @@ bool PeepholeOptimizer::findNextSource(RegSubRegPair RegSubReg, return false; // Insert the Def -> Use entry for the recently found source. - ValueTrackerResult CurSrcRes = RewriteMap.lookup(CurSrcPair); - if (CurSrcRes.isValid()

[llvm-branch-commits] [llvm] PeepholeOpt: Remove check for subreg index on a def operand (PR #123943)

2025-01-22 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request. https://github.com/llvm/llvm-project/pull/123943 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][NewPM] Port SILowerControlFlow pass into NPM. (PR #123045)

2025-01-15 Thread Jay Foad via llvm-branch-commits

jayfoad wrote: > I observed something while porting this pass. The analysis LiveIntervals > (LIS) uses the SlotIndexes (SI). There is no explicit use of SI in this pass. > If we have to preserve LIS, it required us to preserve SI as well. When I > initially failed to preserve SI, the following

[llvm-branch-commits] [llvm] AMDGPU: Implement isExtractVecEltCheap (PR #122460)

2025-01-14 Thread Jay Foad via llvm-branch-commits

@@ -1949,6 +1949,13 @@ bool SITargetLowering::isExtractSubvectorCheap(EVT ResVT, EVT SrcVT, return Index == 0; } +bool SITargetLowering::isExtractVecEltCheap(EVT VT, unsigned Index) const { + // TODO: This should be more aggressive, particular for 16-bit element + // vect

[llvm-branch-commits] [llvm] AMDGPU: Implement isExtractVecEltCheap (PR #122460)

2025-01-14 Thread Jay Foad via llvm-branch-commits

@@ -1949,6 +1949,13 @@ bool SITargetLowering::isExtractSubvectorCheap(EVT ResVT, EVT SrcVT, return Index == 0; } +bool SITargetLowering::isExtractVecEltCheap(EVT VT, unsigned Index) const { + // TODO: This should be more aggressive, particular for 16-bit element + // vect

[llvm-branch-commits] [llvm] AMDGPU: Implement isExtractVecEltCheap (PR #122460)

2025-01-14 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/122460 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Implement isExtractVecEltCheap (PR #122460)

2025-01-14 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad edited https://github.com/llvm/llvm-project/pull/122460 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [llvm] [IR] Add FPOperation intrinsic property (PR #122313)

2025-01-09 Thread Jay Foad via llvm-branch-commits

@@ -308,6 +308,9 @@ def StackProtectStrong : EnumAttr<"sspstrong", IntersectPreserve, [FnAttr]>; /// Function was called in a scope requiring strict floating point semantics. def StrictFP : EnumAttr<"strictfp", IntersectPreserve, [FnAttr]>; +/// Function is a floating point o

[llvm-branch-commits] [llvm] AMDGPU: Reduce 64-bit add width if low bits are known 0 (PR #122049)

2025-01-08 Thread Jay Foad via llvm-branch-commits

jayfoad wrote: Why doesn't this fall out naturally from splitting the 64-bit add into 32-bit parts and then simplifying each part? Do we leave it as a 64-bit add all the way until final instruction selection? https://github.com/llvm/llvm-project/pull/122049

[llvm-branch-commits] [llvm] AMDGPU: Simplify demanded bits on readlane/writeline index arguments (PR #117963)

2024-11-28 Thread Jay Foad via llvm-branch-commits

jayfoad wrote: > If we have out of bounds indexing, these will now clamp down to > a low bit which may CSE with the operations on the low half of the > wave. Should mention that in the comment in the code. It was not clear to me why you would want to clamp constants. https://github.com/llvm/ll

[llvm-branch-commits] [llvm] [Linker] Remove a use of StructType::setBody. NFC. (PR #116653)

2024-11-18 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad created https://github.com/llvm/llvm-project/pull/116653 This falls out naturally after inlining finishType into its only remaining use. >From 4140bc772f5930807cb2ea5b4b2aa945c57b699c Mon Sep 17 00:00:00 2001 From: Jay Foad Date: Mon, 18 Nov 2024 16:36:33 + Subje

[llvm-branch-commits] [llvm] AMDGPU: Increase the LDS size to support to 160 KB for gfx950 (PR #116309)

2024-11-15 Thread Jay Foad via llvm-branch-commits

@@ -1494,7 +1494,8 @@ def FeatureISAVersion9_5_Common : FeatureSet< [FeatureFP8Insts, FeatureFP8ConversionInsts, FeatureCvtFP8VOP1Bug, - FeatureGFX950Insts + FeatureGFX950Insts, + FeatureAddressableLocalMemorySize163840 jayfoad wrote: This means

[llvm-branch-commits] [llvm] [AMDGPU] Introduce a "new" target feature `xf32-insts` (PR #115214)

2024-11-07 Thread Jay Foad via llvm-branch-commits

@@ -1110,6 +1110,13 @@ def FeatureRequiresCOV6 : SubtargetFeature<"requires-cov6", "Target Requires Code Object V6" >; +def FeatureXF32Insts : SubtargetFeature<"xf32-insts", + "HasXF32Insts", + "true", + "Has instructions that support xf32 format, such as " + "v_mfm

[llvm-branch-commits] [llvm] MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi (PR #112866)

2024-10-18 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request. > Change existing code to match what LLVM-IR version is doing Yeah, looks reasonable to me. https://github.com/llvm/llvm-project/pull/112866 ___ llvm-branch-commits mailing list llvm-branch-commit

[llvm-branch-commits] [llvm] release/19.x: AMDGPU: Fix inst-selection of large scratch offsets with sgpr base (#110256) (PR #110470)

2024-09-30 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request. https://github.com/llvm/llvm-project/pull/110470 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Fix inst-selection of large scratch offsets with sgpr base (PR #110256)

2024-09-27 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request. LGTM, thanks! Please also backport to release/19.x. https://github.com/llvm/llvm-project/pull/110256 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/c

[llvm-branch-commits] [llvm] AMDGPU: Fix inst-selection of large scratch offsets with sgpr base (PR #110256)

2024-09-27 Thread Jay Foad via llvm-branch-commits

@@ -1911,7 +1911,7 @@ bool AMDGPUDAGToDAGISel::SelectScratchSAddr(SDNode *Parent, SDValue Addr, 0); } - Offset = CurDAG->getTargetConstant(COffsetVal, DL, MVT::i16); + Offset = CurDAG->getTargetConstant(COffsetVal, DL, MVT::i32); jayfo

[llvm-branch-commits] [llvm] [AMDGPU] Fix sign confusion in performMulLoHiCombine (PR #106977)

2024-09-10 Thread Jay Foad via llvm-branch-commits

jayfoad wrote: > > Is this PR a fix for a regression or a critical issue? > > No, I believe it has been broken for about 3 years (since > [d7e03df](https://github.com/llvm/llvm-project/commit/d7e03df719464354b20a845b7853be57da863924)) > but it was only reported to me recently. > > I guess thi

[llvm-branch-commits] [llvm] [AMDGPU] Fix sign confusion in performMulLoHiCombine (PR #106977)

2024-09-10 Thread Jay Foad via llvm-branch-commits

jayfoad wrote: > Is this PR a fix for a regression or a critical issue? No, I believe it has been broken for about 3 years (since d7e03df719464354b20a845b7853be57da863924) but it was only reported to me recently. I guess this means it is not appropriate for 19.1.0. https://github.com/llvm/ll

[llvm-branch-commits] [llvm] [X86] Avoid generating nested CALLSEQ for TLS pointer function arguments (PR #106965)

2024-09-05 Thread Jay Foad via llvm-branch-commits

jayfoad wrote: > > This sounds sketchy to me. Is it really valid to enter a second call inside > > another call's CALLSEQ markers, but only if we avoid adding a second nested > > set of markers? It feels like attacking the symptom of the issue, but not > > the root cause. (I'm not certain it's

[llvm-branch-commits] [llvm] [AMDGPU] Fix sign confusion in performMulLoHiCombine (PR #106977)

2024-09-02 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad edited https://github.com/llvm/llvm-project/pull/106977 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] Fix sign confusion in performMulLoHiCombine (PR #106977)

2024-09-02 Thread Jay Foad via llvm-branch-commits

jayfoad wrote: This is a backport of #105831. https://github.com/llvm/llvm-project/pull/106977 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] Fix sign confusion in performMulLoHiCombine (PR #106977)

2024-09-02 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad created https://github.com/llvm/llvm-project/pull/106977 SMUL_LOHI and UMUL_LOHI are different operations because the high part of the result is different, so it is not OK to optimize the signed version to MUL_U24/MULHI_U24 or the unsigned version to MUL_I24/MULHI_I2

[llvm-branch-commits] [llvm] [AMDGPU] Fix sign confusion in performMulLoHiCombine (PR #106977)

2024-09-02 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad milestoned https://github.com/llvm/llvm-project/pull/106977 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/19.x: [AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 (#105550) (PR #105808)

2024-08-23 Thread Jay Foad via llvm-branch-commits

jayfoad wrote: I'm not sure if I should have done three different backport requests for the three commits. It could be confusing if they get squash-and-merged onto the release branch. https://github.com/llvm/llvm-project/pull/105808 ___ llvm-branch-c

[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM loads can write VGPR results out of order (PR #105549)

2024-08-22 Thread Jay Foad via llvm-branch-commits

jayfoad wrote: ### Merge activity * **Aug 22, 6:34 AM EDT**: @jayfoad started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/105549). https://github.com/llvm/llvm-project/pull/105549 __

[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM loads can write VGPR results out of order (PR #105549)

2024-08-22 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad edited https://github.com/llvm/llvm-project/pull/105549 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)

2024-08-22 Thread Jay Foad via llvm-branch-commits

@@ -4371,8 +4375,10 @@ define amdgpu_kernel void @global_sextload_v64i16_to_v64i32(ptr addrspace(1) %ou ; GCN-NOHSA-SI-NEXT:buffer_store_dwordx4 v[8:11], off, s[0:3], 0 offset:48 ; GCN-NOHSA-SI-NEXT:buffer_store_dwordx4 v[4:7], off, s[0:3], 0 ; GCN-NOHSA-SI-NEXT:bu

[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)

2024-08-22 Thread Jay Foad via llvm-branch-commits

@@ -754,13 +754,21 @@ define amdgpu_kernel void @constant_load_v16i16_align2(ptr addrspace(4) %ptr0) # ; GFX12-NEXT:global_load_u16 v6, v8, s[0:1] offset:8 ; GFX12-NEXT:global_load_u16 v5, v8, s[0:1] offset:4 ; GFX12-NEXT:global_load_u16 v4, v8, s[0:1] +; GFX12-NEX

[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)

2024-08-22 Thread Jay Foad via llvm-branch-commits

@@ -953,6 +953,12 @@ def FeatureRequiredExportPriority : SubtargetFeature<"required-export-priority", "Export priority must be explicitly manipulated on GFX11.5" >; +def FeatureVmemWriteVgprInOrder : SubtargetFeature<"vmem-write-vgpr-in-order", jayfoad wrot

[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)

2024-08-22 Thread Jay Foad via llvm-branch-commits

@@ -1778,11 +1778,12 @@ bool SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr &MI, if (IsVGPR) { // RAW always needs an s_waitcnt. WAW needs an s_waitcnt unless the // previous write and this write are the same type of VMEM -

[llvm-branch-commits] [llvm] [AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 (PR #105550)

2024-08-21 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad ready_for_review https://github.com/llvm/llvm-project/pull/105550 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)

2024-08-21 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad ready_for_review https://github.com/llvm/llvm-project/pull/105549 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 (PR #105550)

2024-08-21 Thread Jay Foad via llvm-branch-commits

jayfoad wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/105550?utm_source=stack-comment-downstack-mergeability-warning";

[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)

2024-08-21 Thread Jay Foad via llvm-branch-commits

jayfoad wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/105549?utm_source=stack-comment-downstack-mergeability-warning";

[llvm-branch-commits] [llvm] [AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 (PR #105550)

2024-08-21 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad created https://github.com/llvm/llvm-project/pull/105550 When a loop contains a VMEM load whose result is only used outside the loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for vmcnt will be required inside the loop anyway, because VMEM instruct

[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)

2024-08-21 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad created https://github.com/llvm/llvm-project/pull/105549 Fix SIInsertWaitcnts to account for this by adding extra waits to avoid WAW dependencies. >From 9a2103df4094af38f59e1adce5414b94672e6d6e Mon Sep 17 00:00:00 2001 From: Jay Foad Date: Wed, 21 Aug 2024 16:23:49 +

[llvm-branch-commits] [llvm] release/19.x: [AMDGPU] Disable inline constants for pseudo scalar transcendentals (#104395) (PR #105472)

2024-08-21 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request. https://github.com/llvm/llvm-project/pull/105472 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/19.x: [AMDGPU] Fix folding clamp into pseudo scalar instructions (#100568) (PR #102446)

2024-08-08 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request. LGTM for backporting. https://github.com/llvm/llvm-project/pull/102446 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-b

[llvm-branch-commits] [llvm] TTI: Check legalization cost of abs nodes (PR #100523)

2024-07-25 Thread Jay Foad via llvm-branch-commits

@@ -54,11 +54,11 @@ define i32 @abs_nonpoison(i32 %arg) { ; FAST-NEXT: Cost Model: Found an estimated cost of 80 for instruction: %V16I32 = call <16 x i32> @llvm.abs.v16i32(<16 x i32> undef, i1 false) ; FAST-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I16

[llvm-branch-commits] [llvm] AMDGPU: Add baseline test for vectorize of integer min/max (PR #100513)

2024-07-25 Thread Jay Foad via llvm-branch-commits

@@ -0,0 +1,366 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py +; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -passes=slp-vectorizer,instcombine %s | FileCheck -check-prefixes=GCN,GFX7 %s +; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=fiji

[llvm-branch-commits] [llvm] AMDGPU: Add baseline test for vectorize of integer min/max (PR #100513)

2024-07-25 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad edited https://github.com/llvm/llvm-project/pull/100513 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Add baseline test for vectorize of integer min/max (PR #100513)

2024-07-25 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request. LGTM. https://github.com/llvm/llvm-project/pull/100513 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] DAG: Lower fcNormal is.fpclass to compare with inf (PR #100389)

2024-07-24 Thread Jay Foad via llvm-branch-commits

jayfoad wrote: > Looks worse for x86 without the fabs check. Not sure if this is useful for > any targets. Seems unlikely that this would ever be profitable in the ordered case, since you can implement that with pretty simple integer checks on the exponent field. (Check that it isn't 0 and is

[llvm-branch-commits] [llvm] DAG: Lower is.fpclass fcSubnormal|fcZero to fabs(x) < smallest_normal (PR #100390)

2024-07-24 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request. Makes sense to me. For the ordered case I think this would only be profitable if fabs is free _and_ you don't have integer "test"-style instructions. https://github.com/llvm/llvm-project/pull/100390

[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-07-22 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request. LGTM. https://github.com/llvm/llvm-project/pull/96163 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-07-22 Thread Jay Foad via llvm-branch-commits

@@ -34,18 +34,17 @@ entry: } define amdgpu_kernel void @test_llvm_amdgcn_fdot2_bf16_bf16_dpp( -; SDAG-GFX11-LABEL: test_llvm_amdgcn_fdot2_bf16_bf16_dpp: -; SDAG-GFX11: ; %bb.0: ; %entry -; SDAG-GFX11-NEXT:s_load_b128 s[0:3], s[0:1], 0x24 -; SDAG-GFX11-NEXT:s_wait

[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-07-22 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad edited https://github.com/llvm/llvm-project/pull/96163 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

1 2 >

1 - 100 of 183 matches

Mail list logo