[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)

2024-05-31 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad closed 
https://github.com/llvm/llvm-project/pull/90582
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)

2024-05-31 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

Too late to backport - no more 18.x releases are planned.

https://github.com/llvm/llvm-project/pull/90582
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Fix buffer intrinsic store of bfloat (PR #95377)

2024-06-13 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request.


https://github.com/llvm/llvm-project/pull/95377
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Handle legal v2bf16 atomicrmw fadd for gfx12 (PR #95930)

2024-06-19 Thread Jay Foad via llvm-branch-commits


@@ -1735,8 +1737,11 @@ defm : SIBufferAtomicPat<"SIbuffer_atomic_dec", i64, 
"BUFFER_ATOMIC_DEC_X2">;
 let OtherPredicates = [HasAtomicCSubNoRtnInsts] in
 defm : SIBufferAtomicPat<"SIbuffer_atomic_csub", i32, "BUFFER_ATOMIC_CSUB", 
["noret"]>;
 
-let SubtargetPredicate = isGFX12Plus in {
+let SubtargetPredicate = HasAtomicBufferPkAddBF16Inst in {
   defm : SIBufferAtomicPat_Common<"SIbuffer_atomic_fadd", v2bf16, 
"BUFFER_ATOMIC_PK_ADD_BF16_VBUFFER">;

jayfoad wrote:

VBUFFER is a new encoding in GFX12 which replaces the old MTBUF and MUBUF 
encodings. We have different pseudos for VBUFFER (which should only be selected 
on GFX12+) and MTBUF/MUBUF (which should only be selected pre-GFX12).

https://github.com/llvm/llvm-project/pull/95930
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-20 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

This looks like it is affecting codegen even when xnack is disabled? That 
should not happen.

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-20 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

> > This looks like it is affecting codegen even when xnack is disabled? That 
> > should not happen.
> 
> It shouldn't. I put the xnack replay subtarget check before using *_ec 
> equivalents. See the code here: 
> [65eb443#diff-35f4d1b6c4c17815f6989f86abbac2e606ca760f9d93f501ff503449048bf760R1735](https://github.com/llvm/llvm-project/commit/65eb44327cf32a83dbbf13eb70f9d8c03f3efaef#diff-35f4d1b6c4c17815f6989f86abbac2e606ca760f9d93f501ff503449048bf760R1735)

You're checking `STI->hasXnackReplay()` which is true on all GFX8+ targets. You 
should be checking whether xnack support is enabled with 
`STI->isXNACKEnabled()`.

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-20 Thread Jay Foad via llvm-branch-commits


@@ -967,6 +967,7 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
 
   bool hasLDSFPAtomicAddF32() const { return GFX8Insts; }
   bool hasLDSFPAtomicAddF64() const { return GFX90AInsts; }
+  bool hasXnackReplay() const { return GFX8Insts; }

jayfoad wrote:

We already have a field SupportsXNACK for this which is hooked up to the 
"xnack-support" target feature.

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-06-21 Thread Jay Foad via llvm-branch-commits


@@ -867,13 +867,104 @@ def SMRDBufferImm   : ComplexPattern;
 def SMRDBufferImm32 : ComplexPattern;
 def SMRDBufferSgprImm : ComplexPattern;
 
+class SMRDAlignedLoadPat : PatFrag <(ops node:$ptr), (Op 
node:$ptr), [{
+  // Returns true if it is a naturally aligned multi-dword load.
+  LoadSDNode *Ld = cast(N);
+  unsigned Size = Ld->getMemoryVT().getStoreSize();
+  return (Size <= 4) || (Ld->getAlign().value() >= PowerOf2Ceil(Size));

jayfoad wrote:

Right but the PowerOf2Ceil makes no difference. Either you test 16>=12 or 
16>=16, the result it the same. Also you don't need most of the parens on this 
line.

https://github.com/llvm/llvm-project/pull/96163
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-06-24 Thread Jay Foad via llvm-branch-commits


@@ -867,13 +867,104 @@ def SMRDBufferImm   : ComplexPattern;
 def SMRDBufferImm32 : ComplexPattern;
 def SMRDBufferSgprImm : ComplexPattern;
 
+class SMRDAlignedLoadPat : PatFrag <(ops node:$ptr), (Op 
node:$ptr), [{
+  // Returns true if it is a naturally aligned multi-dword load.
+  LoadSDNode *Ld = cast(N);
+  unsigned Size = Ld->getMemoryVT().getStoreSize();
+  return (Size <= 4) || (Ld->getAlign().value() >= PowerOf2Ceil(Size));

jayfoad wrote:

`Ld->getAlign().value()` will never be 12. There's no such thing as a 
non-power-of-two alignment.

https://github.com/llvm/llvm-project/pull/96163
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-24 Thread Jay Foad via llvm-branch-commits


@@ -1701,17 +1732,33 @@ unsigned SILoadStoreOptimizer::getNewOpcode(const 
CombineInfo &CI,
   return AMDGPU::S_BUFFER_LOAD_DWORDX8_SGPR_IMM;
 }
   case S_LOAD_IMM:
-switch (Width) {
-default:
-  return 0;
-case 2:
-  return AMDGPU::S_LOAD_DWORDX2_IMM;
-case 3:
-  return AMDGPU::S_LOAD_DWORDX3_IMM;
-case 4:
-  return AMDGPU::S_LOAD_DWORDX4_IMM;
-case 8:
-  return AMDGPU::S_LOAD_DWORDX8_IMM;
+// For targets that support XNACK replay, use the constrained load opcode.
+if (STI && STI->hasXnackReplay()) {
+  switch (Width) {

jayfoad wrote:

> currently the alignment is picked from the first MMO and that'd definitely be 
> smaller than the natural align requirement for the new load

You don't know that - the alignment in the first MMO will be whatever alignment 
the compiler could deduce, which could be large.

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-24 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad edited 
https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add subtarget feature for global atomic fadd denormal support (PR #96443)

2024-06-26 Thread Jay Foad via llvm-branch-commits


@@ -167,6 +167,7 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
   bool HasAtomicFlatPkAdd16Insts = false;
   bool HasAtomicFaddRtnInsts = false;
   bool HasAtomicFaddNoRtnInsts = false;
+  bool HasAtomicMemoryAtomicFaddF32DenormalSupport = false;

jayfoad wrote:

What does "AtomicMemoryAtomic" mean?

https://github.com/llvm/llvm-project/pull/96443
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Call SimplifyDemandedBits on fcopysign sign value (PR #97151)

2024-07-01 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad edited 
https://github.com/llvm/llvm-project/pull/97151
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Call SimplifyDemandedBits on fcopysign sign value (PR #97151)

2024-07-01 Thread Jay Foad via llvm-branch-commits


@@ -17565,6 +17565,12 @@ SDValue DAGCombiner::visitFCOPYSIGN(SDNode *N) {
   if (CanCombineFCOPYSIGN_EXTEND_ROUND(N))
 return DAG.getNode(ISD::FCOPYSIGN, SDLoc(N), VT, N0, N1.getOperand(0));
 
+  // We only take the sign bit from the sign operand.
+  EVT SignVT = N1.getValueType();
+  if (SimplifyDemandedBits(N1,

jayfoad wrote:

I think this should be able to subsume some of the optimizations above, e.g. 
`copysign(x, abs(y)) -> abs(x)` would fall out if SimplifyDemandedBits knew 
about extracting the sign bit from `abs(x)`.

https://github.com/llvm/llvm-project/pull/97151
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Call SimplifyDemandedBits on fcopysign sign value (PR #97151)

2024-07-01 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/97151
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Enable atomic optimizer for divergent i64 and double values (PR #96934)

2024-07-01 Thread Jay Foad via llvm-branch-commits


@@ -313,8 +327,7 @@ void 
AMDGPUAtomicOptimizerImpl::visitIntrinsicInst(IntrinsicInst &I) {
   // value to the atomic calculation. We can only optimize divergent values if
   // we have DPP available on our subtarget, and the atomic operation is 32
   // bits.
-  if (ValDivergent &&
-  (!ST->hasDPP() || DL->getTypeSizeInBits(I.getType()) != 32)) {
+  if (ValDivergent && (!ST->hasDPP() || !isOptimizableAtomic(I.getType( {

jayfoad wrote:

Same here.

https://github.com/llvm/llvm-project/pull/96934
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Enable atomic optimizer for divergent i64 and double values (PR #96934)

2024-07-01 Thread Jay Foad via llvm-branch-commits


@@ -230,8 +245,7 @@ void 
AMDGPUAtomicOptimizerImpl::visitAtomicRMWInst(AtomicRMWInst &I) {
   // value to the atomic calculation. We can only optimize divergent values if
   // we have DPP available on our subtarget, and the atomic operation is 32
   // bits.
-  if (ValDivergent &&
-  (!ST->hasDPP() || DL->getTypeSizeInBits(I.getType()) != 32)) {
+  if (ValDivergent && (!ST->hasDPP() || !isOptimizableAtomic(I.getType( {

jayfoad wrote:

Pre-existing problem: this `hasDPP` check is in the wrong place. It should only 
be tested if we're using the DPP strategy.

https://github.com/llvm/llvm-project/pull/96934
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Enable atomic optimizer for divergent i64 and double values (PR #96934)

2024-07-01 Thread Jay Foad via llvm-branch-commits


@@ -178,6 +178,21 @@ bool AMDGPUAtomicOptimizerImpl::run(Function &F) {
   return Changed;
 }
 
+static bool isOptimizableAtomic(Type *Ty) {
+  switch (Ty->getTypeID()) {
+  case Type::FloatTyID:
+  case Type::DoubleTyID:
+return true;
+  case Type::IntegerTyID: {
+unsigned size = Ty->getIntegerBitWidth();

jayfoad wrote:

```suggestion
unsigned Size = Ty->getIntegerBitWidth();
```

https://github.com/llvm/llvm-project/pull/96934
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Enable atomic optimizer for divergent i64 and double values (PR #96934)

2024-07-01 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

> [AMDGPU] Enable atomic optimizer for divergent i64 and double values

Needs some i64 tests

https://github.com/llvm/llvm-project/pull/96934
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-01 Thread Jay Foad via llvm-branch-commits


@@ -1700,19 +1725,30 @@ unsigned SILoadStoreOptimizer::getNewOpcode(const 
CombineInfo &CI,
 case 8:
   return AMDGPU::S_BUFFER_LOAD_DWORDX8_SGPR_IMM;
 }
-  case S_LOAD_IMM:
+  case S_LOAD_IMM: {
+// If XNACK is enabled, use the constrained opcodes when the first load is
+// under-aligned.
+const MachineMemOperand *MMO = *CI.I->memoperands_begin();
+auto NeedsConstrainedOpc = [&MMO, Width](const GCNSubtarget &ST) {
+  return ST.isXNACKEnabled() && MMO->getAlign().value() < Width;

jayfoad wrote:

This doesn't look right since `Width` is in units of dwords here.

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-01 Thread Jay Foad via llvm-branch-commits


@@ -1212,8 +1228,17 @@ void SILoadStoreOptimizer::copyToDestRegs(
 
   // Copy to the old destination registers.
   const MCInstrDesc &CopyDesc = TII->get(TargetOpcode::COPY);
-  const auto *Dest0 = TII->getNamedOperand(*CI.I, OpName);
-  const auto *Dest1 = TII->getNamedOperand(*Paired.I, OpName);
+  auto *Dest0 = TII->getNamedOperand(*CI.I, OpName);
+  auto *Dest1 = TII->getNamedOperand(*Paired.I, OpName);
+
+  // The constrained sload instructions in S_LOAD_IMM class will have
+  // `early-clobber` flag in the dst operand. Remove the flag before using the
+  // MOs in copies.
+  if (Dest0->isEarlyClobber())
+Dest0->setIsEarlyClobber(false);
+
+  if (Dest1->isEarlyClobber())
+Dest1->setIsEarlyClobber(false);

jayfoad wrote:

```suggestion
  Dest0->setIsEarlyClobber(false);
  Dest1->setIsEarlyClobber(false);
```

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-01 Thread Jay Foad via llvm-branch-commits


@@ -1700,19 +1725,30 @@ unsigned SILoadStoreOptimizer::getNewOpcode(const 
CombineInfo &CI,
 case 8:
   return AMDGPU::S_BUFFER_LOAD_DWORDX8_SGPR_IMM;
 }
-  case S_LOAD_IMM:
+  case S_LOAD_IMM: {
+// If XNACK is enabled, use the constrained opcodes when the first load is
+// under-aligned.
+const MachineMemOperand *MMO = *CI.I->memoperands_begin();
+auto NeedsConstrainedOpc = [&MMO, Width](const GCNSubtarget &ST) {

jayfoad wrote:

This doesn't need to be a lambda. It is always called, with identical 
arguments. Just calculate the result as a `bool` here.

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-07-01 Thread Jay Foad via llvm-branch-commits


@@ -867,13 +867,61 @@ def SMRDBufferImm   : ComplexPattern;
 def SMRDBufferImm32 : ComplexPattern;
 def SMRDBufferSgprImm : ComplexPattern;
 
+class SMRDAlignedLoadPat : PatFrag <(ops node:$ptr), (Op 
node:$ptr), [{
+  // Ignore the alignment check if XNACK support is disabled.
+  if (!Subtarget->isXNACKEnabled())
+return true;
+
+  // Returns true if it is a naturally aligned multi-dword load.
+  LoadSDNode *Ld = cast(N);
+  unsigned Size = Ld->getMemoryVT().getStoreSize();
+  return Size <= 4 || Ld->getAlign().value() >= Size;
+}]> {
+  let GISelPredicateCode = [{
+  if (!Subtarget->isXNACKEnabled())
+return true;
+
+  auto &Ld = cast(MI);
+  TypeSize Size = Ld.getMMO().getSize().getValue();
+  return Size <= 4 || Ld.getMMO().getAlign().value() >= Size;
+  }];
+}
+
+class SMRDUnalignedLoadPat : PatFrag <(ops node:$ptr), (Op 
node:$ptr), [{
+  // Do the alignment check if XNACK support is enabled.
+  if (!Subtarget->isXNACKEnabled())
+return false;
+
+  // Returns true if it is an under aligned multi-dword load.
+  LoadSDNode *Ld = cast(N);
+  unsigned Size = Ld->getMemoryVT().getStoreSize();
+  return Size > 4 && (Ld->getAlign().value() < Size);

jayfoad wrote:

Don't need the parens

https://github.com/llvm/llvm-project/pull/96163
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-07-01 Thread Jay Foad via llvm-branch-commits


@@ -867,13 +867,61 @@ def SMRDBufferImm   : ComplexPattern;
 def SMRDBufferImm32 : ComplexPattern;
 def SMRDBufferSgprImm : ComplexPattern;
 
+class SMRDAlignedLoadPat : PatFrag <(ops node:$ptr), (Op 
node:$ptr), [{
+  // Ignore the alignment check if XNACK support is disabled.
+  if (!Subtarget->isXNACKEnabled())
+return true;
+
+  // Returns true if it is a naturally aligned multi-dword load.

jayfoad wrote:

... or if it's a non-multi-dword load.

https://github.com/llvm/llvm-project/pull/96163
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-07-01 Thread Jay Foad via llvm-branch-commits


@@ -867,13 +867,61 @@ def SMRDBufferImm   : ComplexPattern;
 def SMRDBufferImm32 : ComplexPattern;
 def SMRDBufferSgprImm : ComplexPattern;
 
+class SMRDAlignedLoadPat : PatFrag <(ops node:$ptr), (Op 
node:$ptr), [{
+  // Ignore the alignment check if XNACK support is disabled.
+  if (!Subtarget->isXNACKEnabled())
+return true;
+
+  // Returns true if it is a naturally aligned multi-dword load.
+  LoadSDNode *Ld = cast(N);
+  unsigned Size = Ld->getMemoryVT().getStoreSize();
+  return Size <= 4 || Ld->getAlign().value() >= Size;
+}]> {
+  let GISelPredicateCode = [{
+  if (!Subtarget->isXNACKEnabled())
+return true;
+
+  auto &Ld = cast(MI);
+  TypeSize Size = Ld.getMMO().getSize().getValue();
+  return Size <= 4 || Ld.getMMO().getAlign().value() >= Size;
+  }];
+}
+
+class SMRDUnalignedLoadPat : PatFrag <(ops node:$ptr), (Op 
node:$ptr), [{
+  // Do the alignment check if XNACK support is enabled.
+  if (!Subtarget->isXNACKEnabled())
+return false;
+
+  // Returns true if it is an under aligned multi-dword load.
+  LoadSDNode *Ld = cast(N);
+  unsigned Size = Ld->getMemoryVT().getStoreSize();
+  return Size > 4 && (Ld->getAlign().value() < Size);
+}]> {
+  let GISelPredicateCode = [{
+  if (!Subtarget->isXNACKEnabled())
+return false;
+
+  auto &Ld = cast(MI);
+  TypeSize Size = Ld.getMMO().getSize().getValue();
+  return Size > 4 && (Ld.getMMO().getAlign().value() < Size);

jayfoad wrote:

Don't need the parens

https://github.com/llvm/llvm-project/pull/96163
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-03 Thread Jay Foad via llvm-branch-commits


@@ -1700,19 +1722,29 @@ unsigned SILoadStoreOptimizer::getNewOpcode(const 
CombineInfo &CI,
 case 8:
   return AMDGPU::S_BUFFER_LOAD_DWORDX8_SGPR_IMM;
 }
-  case S_LOAD_IMM:
+  case S_LOAD_IMM: {
+// If XNACK is enabled, use the constrained opcodes when the first load is
+// under-aligned.
+const MachineMemOperand *MMO = *CI.I->memoperands_begin();
+bool NeedsConstrainedOpc =
+STM->isXNACKEnabled() && MMO->getAlign().value() < (Width << 2);

jayfoad wrote:

```suggestion
STM->isXNACKEnabled() && MMO->getAlign().value() < Width * 4;
```

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-10 Thread Jay Foad via llvm-branch-commits


@@ -658,17 +658,17 @@ define amdgpu_kernel void 
@image_bvh_intersect_ray_nsa_reassign(ptr %p_node_ptr,
 ;
 ; GFX1013-LABEL: image_bvh_intersect_ray_nsa_reassign:
 ; GFX1013:   ; %bb.0:
-; GFX1013-NEXT:s_load_dwordx8 s[0:7], s[0:1], 0x24
+; GFX1013-NEXT:s_load_dwordx8 s[4:11], s[0:1], 0x24

jayfoad wrote:

I guess this code changes because xnack is enabled by default for GFX10.1? Is 
there anything we could do to add known alignment info here, to avoid the code 
pessimization?

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-10 Thread Jay Foad via llvm-branch-commits


@@ -1212,8 +1228,14 @@ void SILoadStoreOptimizer::copyToDestRegs(
 
   // Copy to the old destination registers.
   const MCInstrDesc &CopyDesc = TII->get(TargetOpcode::COPY);
-  const auto *Dest0 = TII->getNamedOperand(*CI.I, OpName);
-  const auto *Dest1 = TII->getNamedOperand(*Paired.I, OpName);
+  auto *Dest0 = TII->getNamedOperand(*CI.I, OpName);
+  auto *Dest1 = TII->getNamedOperand(*Paired.I, OpName);
+
+  // The constrained sload instructions in S_LOAD_IMM class will have
+  // `early-clobber` flag in the dst operand. Remove the flag before using the
+  // MOs in copies.
+  Dest0->setIsEarlyClobber(false);
+  Dest1->setIsEarlyClobber(false);

jayfoad wrote:

It's a bit ugly to modify in-place the operands of `CI.I` and `Paired.I`. But I 
guess it is harmless since they will be erased soon, when the merged load 
instruction is created.

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-07-10 Thread Jay Foad via llvm-branch-commits


@@ -866,13 +866,61 @@ def SMRDBufferImm   : ComplexPattern;
 def SMRDBufferImm32 : ComplexPattern;
 def SMRDBufferSgprImm : ComplexPattern;
 
+class SMRDAlignedLoadPat : PatFrag <(ops node:$ptr), (Op 
node:$ptr), [{
+  // Ignore the alignment check if XNACK support is disabled.
+  if (!Subtarget->isXNACKEnabled())
+return true;
+
+  // Returns true if it is a single dword load or naturally aligned 
multi-dword load.
+  LoadSDNode *Ld = cast(N);
+  unsigned Size = Ld->getMemoryVT().getStoreSize();
+  return Size <= 4 || Ld->getAlign().value() >= Size;
+}]> {
+  let GISelPredicateCode = [{
+  if (!Subtarget->isXNACKEnabled())
+return true;
+
+  auto &Ld = cast(MI);
+  TypeSize Size = Ld.getMMO().getSize().getValue();
+  return Size <= 4 || Ld.getMMO().getAlign().value() >= Size;
+  }];
+}
+
+class SMRDUnalignedLoadPat : PatFrag <(ops node:$ptr), (Op 
node:$ptr), [{

jayfoad wrote:

I don't think you need this class at all, since the _ec forms should work in 
all cases. It's just an optimization to prefer the non-_ec forms when the load 
is suitable aligned, and you can handle that with DAG pattern priority (maybe 
by setting AddedComplexity).

https://github.com/llvm/llvm-project/pull/96163
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-22 Thread Jay Foad via llvm-branch-commits


@@ -6,7 +6,7 @@ declare i32 @llvm.amdgcn.global.atomic.csub(ptr addrspace(1), 
i32)
 
 ; GCN-LABEL: {{^}}global_atomic_csub_rtn:
 ; PREGFX12: global_atomic_csub v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9:]+}}, 
s{{\[[0-9]+:[0-9]+\]}} glc
-; GFX12PLUS: global_atomic_sub_clamp_u32 v0, v0, v1, s[0:1] th:TH_ATOMIC_RETURN
+; GFX12PLUS: global_atomic_sub_clamp_u32 v{{[0-9]+}}, v{{[0-9]+}}, 
v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}} th:TH_ATOMIC_RETURN

jayfoad wrote:

You shouldn't need any changes in this file.

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-22 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request.


https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-07-22 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad edited 
https://github.com/llvm/llvm-project/pull/96163
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-07-22 Thread Jay Foad via llvm-branch-commits


@@ -34,18 +34,17 @@ entry:
 }
 
 define amdgpu_kernel void @test_llvm_amdgcn_fdot2_bf16_bf16_dpp(
-; SDAG-GFX11-LABEL: test_llvm_amdgcn_fdot2_bf16_bf16_dpp:
-; SDAG-GFX11:   ; %bb.0: ; %entry
-; SDAG-GFX11-NEXT:s_load_b128 s[0:3], s[0:1], 0x24
-; SDAG-GFX11-NEXT:s_waitcnt lgkmcnt(0)
-; SDAG-GFX11-NEXT:scratch_load_b32 v0, off, s2
-; SDAG-GFX11-NEXT:scratch_load_u16 v1, off, s3
-; SDAG-GFX11-NEXT:scratch_load_b32 v2, off, s1
-; SDAG-GFX11-NEXT:s_waitcnt vmcnt(0)
-; SDAG-GFX11-NEXT:v_dot2_bf16_bf16_e64_dpp v0, v2, v0, v1 
quad_perm:[1,0,0,0] row_mask:0xf bank_mask:0xf bound_ctrl:1
-; SDAG-GFX11-NEXT:scratch_store_b16 off, v0, s0
-; SDAG-GFX11-NEXT:s_endpgm
-;
+; GFX11-LABEL: test_llvm_amdgcn_fdot2_bf16_bf16_dpp:
+; GFX11:   ; %bb.0: ; %entry
+; GFX11-NEXT:s_load_b128 s[0:3], s[0:1], 0x24
+; GFX11-NEXT:s_waitcnt lgkmcnt(0)
+; GFX11-NEXT:scratch_load_b32 v0, off, s2
+; GFX11-NEXT:scratch_load_u16 v1, off, s3
+; GFX11-NEXT:scratch_load_b32 v2, off, s1
+; GFX11-NEXT:s_waitcnt vmcnt(0)
+; GFX11-NEXT:v_dot2_bf16_bf16_e64_dpp v0, v2, v0, v1 quad_perm:[1,0,0,0] 
row_mask:0xf bank_mask:0xf bound_ctrl:1
+; GFX11-NEXT:scratch_store_b16 off, v0, s0
+; GFX11-NEXT:s_endpgm
 ; GISEL-GFX11-LABEL: test_llvm_amdgcn_fdot2_bf16_bf16_dpp:

jayfoad wrote:

Should probably remove these GISEL-GFX11 checks since the corresponding RUN 
line is disabled.

https://github.com/llvm/llvm-project/pull/96163
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-07-22 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request.

LGTM.

https://github.com/llvm/llvm-project/pull/96163
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Lower is.fpclass fcSubnormal|fcZero to fabs(x) < smallest_normal (PR #100390)

2024-07-24 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request.

Makes sense to me.

For the ordered case I think this would only be profitable if fabs is free 
_and_ you don't have integer "test"-style instructions.

https://github.com/llvm/llvm-project/pull/100390
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Lower fcNormal is.fpclass to compare with inf (PR #100389)

2024-07-24 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

> Looks worse for x86 without the fabs check. Not sure if this is useful for 
> any targets.

Seems unlikely that this would ever be profitable in the ordered case, since 
you can implement that with pretty simple integer checks on the exponent field. 
(Check that it isn't 0 and isn't maximal.)

https://github.com/llvm/llvm-project/pull/100389
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add baseline test for vectorize of integer min/max (PR #100513)

2024-07-25 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad edited 
https://github.com/llvm/llvm-project/pull/100513
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add baseline test for vectorize of integer min/max (PR #100513)

2024-07-25 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request.

LGTM.

https://github.com/llvm/llvm-project/pull/100513
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add baseline test for vectorize of integer min/max (PR #100513)

2024-07-25 Thread Jay Foad via llvm-branch-commits


@@ -0,0 +1,366 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii 
-passes=slp-vectorizer,instcombine %s | FileCheck -check-prefixes=GCN,GFX7 %s
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=fiji 
-passes=slp-vectorizer,instcombine %s | FileCheck -check-prefixes=GCN,GFX8 %s
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 
-passes=slp-vectorizer,instcombine %s | FileCheck -check-prefixes=GCN,GFX9 %s
+
+define <2 x i16> @uadd_sat_v2i16(<2 x i16> %arg0, <2 x i16> %arg1) {
+; GFX7-LABEL: @uadd_sat_v2i16(
+; GFX7-NEXT:  bb:
+; GFX7-NEXT:[[ARG0_0:%.*]] = extractelement <2 x i16> [[ARG0:%.*]], i64 0
+; GFX7-NEXT:[[ARG0_1:%.*]] = extractelement <2 x i16> [[ARG0]], i64 1
+; GFX7-NEXT:[[ARG1_0:%.*]] = extractelement <2 x i16> [[ARG1:%.*]], i64 0
+; GFX7-NEXT:[[ARG1_1:%.*]] = extractelement <2 x i16> [[ARG1]], i64 1
+; GFX7-NEXT:[[ADD_0:%.*]] = call i16 @llvm.umin.i16(i16 [[ARG0_0]], i16 
[[ARG1_0]])
+; GFX7-NEXT:[[ADD_1:%.*]] = call i16 @llvm.umin.i16(i16 [[ARG0_1]], i16 
[[ARG1_1]])
+; GFX7-NEXT:[[INS_0:%.*]] = insertelement <2 x i16> poison, i16 [[ADD_0]], 
i64 0
+; GFX7-NEXT:[[INS_1:%.*]] = insertelement <2 x i16> [[INS_0]], i16 
[[ADD_1]], i64 1
+; GFX7-NEXT:ret <2 x i16> [[INS_1]]
+;
+; GFX8-LABEL: @uadd_sat_v2i16(
+; GFX8-NEXT:  bb:
+; GFX8-NEXT:[[TMP0:%.*]] = call <2 x i16> @llvm.umin.v2i16(<2 x i16> 
[[ARG0:%.*]], <2 x i16> [[ARG1:%.*]])
+; GFX8-NEXT:ret <2 x i16> [[TMP0]]
+;
+; GFX9-LABEL: @uadd_sat_v2i16(
+; GFX9-NEXT:  bb:
+; GFX9-NEXT:[[TMP0:%.*]] = call <2 x i16> @llvm.umin.v2i16(<2 x i16> 
[[ARG0:%.*]], <2 x i16> [[ARG1:%.*]])
+; GFX9-NEXT:ret <2 x i16> [[TMP0]]
+;
+bb:
+  %arg0.0 = extractelement <2 x i16> %arg0, i64 0
+  %arg0.1 = extractelement <2 x i16> %arg0, i64 1
+  %arg1.0 = extractelement <2 x i16> %arg1, i64 0
+  %arg1.1 = extractelement <2 x i16> %arg1, i64 1
+  %add.0 = call i16 @llvm.umin.i16(i16 %arg0.0, i16 %arg1.0)
+  %add.1 = call i16 @llvm.umin.i16(i16 %arg0.1, i16 %arg1.1)
+  %ins.0 = insertelement <2 x i16> undef, i16 %add.0, i64 0
+  %ins.1 = insertelement <2 x i16> %ins.0, i16 %add.1, i64 1
+  ret <2 x i16> %ins.1
+}
+
+define <2 x i16> @usub_sat_v2i16(<2 x i16> %arg0, <2 x i16> %arg1) {
+; GFX7-LABEL: @usub_sat_v2i16(
+; GFX7-NEXT:  bb:
+; GFX7-NEXT:[[ARG0_0:%.*]] = extractelement <2 x i16> [[ARG0:%.*]], i64 0
+; GFX7-NEXT:[[ARG0_1:%.*]] = extractelement <2 x i16> [[ARG0]], i64 1
+; GFX7-NEXT:[[ARG1_0:%.*]] = extractelement <2 x i16> [[ARG1:%.*]], i64 0
+; GFX7-NEXT:[[ARG1_1:%.*]] = extractelement <2 x i16> [[ARG1]], i64 1
+; GFX7-NEXT:[[ADD_0:%.*]] = call i16 @llvm.umax.i16(i16 [[ARG0_0]], i16 
[[ARG1_0]])
+; GFX7-NEXT:[[ADD_1:%.*]] = call i16 @llvm.umax.i16(i16 [[ARG0_1]], i16 
[[ARG1_1]])
+; GFX7-NEXT:[[INS_0:%.*]] = insertelement <2 x i16> poison, i16 [[ADD_0]], 
i64 0
+; GFX7-NEXT:[[INS_1:%.*]] = insertelement <2 x i16> [[INS_0]], i16 
[[ADD_1]], i64 1
+; GFX7-NEXT:ret <2 x i16> [[INS_1]]
+;
+; GFX8-LABEL: @usub_sat_v2i16(
+; GFX8-NEXT:  bb:
+; GFX8-NEXT:[[TMP0:%.*]] = call <2 x i16> @llvm.umax.v2i16(<2 x i16> 
[[ARG0:%.*]], <2 x i16> [[ARG1:%.*]])
+; GFX8-NEXT:ret <2 x i16> [[TMP0]]
+;
+; GFX9-LABEL: @usub_sat_v2i16(
+; GFX9-NEXT:  bb:
+; GFX9-NEXT:[[TMP0:%.*]] = call <2 x i16> @llvm.umax.v2i16(<2 x i16> 
[[ARG0:%.*]], <2 x i16> [[ARG1:%.*]])
+; GFX9-NEXT:ret <2 x i16> [[TMP0]]
+;
+bb:
+  %arg0.0 = extractelement <2 x i16> %arg0, i64 0
+  %arg0.1 = extractelement <2 x i16> %arg0, i64 1
+  %arg1.0 = extractelement <2 x i16> %arg1, i64 0
+  %arg1.1 = extractelement <2 x i16> %arg1, i64 1
+  %add.0 = call i16 @llvm.umax.i16(i16 %arg0.0, i16 %arg1.0)
+  %add.1 = call i16 @llvm.umax.i16(i16 %arg0.1, i16 %arg1.1)
+  %ins.0 = insertelement <2 x i16> undef, i16 %add.0, i64 0
+  %ins.1 = insertelement <2 x i16> %ins.0, i16 %add.1, i64 1
+  ret <2 x i16> %ins.1
+}
+
+define <2 x i16> @sadd_sat_v2i16(<2 x i16> %arg0, <2 x i16> %arg1) {
+; GFX7-LABEL: @sadd_sat_v2i16(
+; GFX7-NEXT:  bb:
+; GFX7-NEXT:[[ARG0_0:%.*]] = extractelement <2 x i16> [[ARG0:%.*]], i64 0
+; GFX7-NEXT:[[ARG0_1:%.*]] = extractelement <2 x i16> [[ARG0]], i64 1
+; GFX7-NEXT:[[ARG1_0:%.*]] = extractelement <2 x i16> [[ARG1:%.*]], i64 0
+; GFX7-NEXT:[[ARG1_1:%.*]] = extractelement <2 x i16> [[ARG1]], i64 1
+; GFX7-NEXT:[[ADD_0:%.*]] = call i16 @llvm.smin.i16(i16 [[ARG0_0]], i16 
[[ARG1_0]])
+; GFX7-NEXT:[[ADD_1:%.*]] = call i16 @llvm.smin.i16(i16 [[ARG0_1]], i16 
[[ARG1_1]])
+; GFX7-NEXT:[[INS_0:%.*]] = insertelement <2 x i16> poison, i16 [[ADD_0]], 
i64 0
+; GFX7-NEXT:[[INS_1:%.*]] = insertelement <2 x i16> [[INS_0]], i16 
[[ADD_1]], i64 1
+; GFX7-NEXT:ret <2 x i16> [[INS_1]]
+;
+; GFX8-LABEL: @sadd_sat_v2i16(
+; GFX8-NEXT:  bb:
+; GFX8-NEXT:[[TMP0:%.*]] = call <2 x i16> @llvm.smin.v2i16(<2 x i16> 
[[ARG0:%.*]], <2 x i16> [[ARG1:%.*]])
+; GFX8-NEXT:ret <2 x i16> [[T

[llvm-branch-commits] [llvm] TTI: Check legalization cost of abs nodes (PR #100523)

2024-07-25 Thread Jay Foad via llvm-branch-commits


@@ -54,11 +54,11 @@ define i32 @abs_nonpoison(i32 %arg) {
 ; FAST-NEXT:  Cost Model: Found an estimated cost of 80 for instruction: 
%V16I32 = call <16 x i32> @llvm.abs.v16i32(<16 x i32> undef, i1 false)
 ; FAST-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %I16 = 
call i16 @llvm.abs.i16(i16 undef, i1 false)
 ; FAST-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %V2I16 
= call <2 x i16> @llvm.abs.v2i16(<2 x i16> undef, i1 false)
-; FAST-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: 
%V4I16 = call <4 x i16> @llvm.abs.v4i16(<4 x i16> undef, i1 false)
-; FAST-NEXT:  Cost Model: Found an estimated cost of 34 for instruction: 
%V8I16 = call <8 x i16> @llvm.abs.v8i16(<8 x i16> undef, i1 false)
-; FAST-NEXT:  Cost Model: Found an estimated cost of 70 for instruction: 
%V16I16 = call <16 x i16> @llvm.abs.v16i16(<16 x i16> undef, i1 false)
-; FAST-NEXT:  Cost Model: Found an estimated cost of 114 for instruction: 
%V17I16 = call <17 x i16> @llvm.abs.v17i16(<17 x i16> undef, i1 false)
-; FAST-NEXT:  Cost Model: Found an estimated cost of 174 for instruction: 
%V32I16 = call <32 x i16> @llvm.abs.v32i16(<32 x i16> undef, i1 false)
+; FAST-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V4I16 
= call <4 x i16> @llvm.abs.v4i16(<4 x i16> undef, i1 false)
+; FAST-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8I16 
= call <8 x i16> @llvm.abs.v8i16(<8 x i16> undef, i1 false)
+; FAST-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: 
%V16I16 = call <16 x i16> @llvm.abs.v16i16(<16 x i16> undef, i1 false)
+; FAST-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: 
%V17I16 = call <17 x i16> @llvm.abs.v17i16(<17 x i16> undef, i1 false)
+; FAST-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: 
%V32I16 = call <32 x i16> @llvm.abs.v32i16(<32 x i16> undef, i1 false)

jayfoad wrote:

What is this demonstrating? 2 does not seem like the right cost for any 
VALU/SALU operation on v32i16.

https://github.com/llvm/llvm-project/pull/100523
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [AMDGPU] Fix folding clamp into pseudo scalar instructions (#100568) (PR #102446)

2024-08-08 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request.

LGTM for backporting.

https://github.com/llvm/llvm-project/pull/102446
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [AMDGPU] Disable inline constants for pseudo scalar transcendentals (#104395) (PR #105472)

2024-08-21 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request.


https://github.com/llvm/llvm-project/pull/105472
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: Convert many LivePhysRegs uses to LiveRegUnits (PR #84118)

2024-03-07 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad requested changes to this pull request.

> this isn't fixing any known correctness issue

Exactly. I don't think there is any reason to backport this.

https://github.com/llvm/llvm-project/pull/84118
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Fix setting nontemporal in memory legalizer (#83815) (PR #90204)

2024-04-26 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad created 
https://github.com/llvm/llvm-project/pull/90204

Iterator MI can advance in insertWait() but we need original instruction
to set temporal hint. Just move it before handling volatile.

>From b544217fb31ffafb9b072de53a28c71acc169cf8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mirko=20Brku=C5=A1anin?= 
Date: Mon, 4 Mar 2024 15:05:31 +0100
Subject: [PATCH] [AMDGPU] Fix setting nontemporal in memory legalizer (#83815)

Iterator MI can advance in insertWait() but we need original instruction
to set temporal hint. Just move it before handling volatile.
---
 llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp  |  10 +-
 .../memory-legalizer-flat-nontemporal.ll  | 165 ++
 .../memory-legalizer-global-nontemporal.ll| 158 ++
 .../memory-legalizer-local-nontemporal.ll | 179 +++
 .../memory-legalizer-private-nontemporal.ll   | 203 ++
 5 files changed, 710 insertions(+), 5 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp 
b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
index 84b9330ef9633e..50d8bfa8750818 100644
--- a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
@@ -2358,6 +2358,11 @@ bool SIGfx12CacheControl::enableVolatileAndOrNonTemporal(
 
   bool Changed = false;
 
+  if (IsNonTemporal) {
+// Set non-temporal hint for all cache levels.
+Changed |= setTH(MI, AMDGPU::CPol::TH_NT);
+  }
+
   if (IsVolatile) {
 Changed |= setScope(MI, AMDGPU::CPol::SCOPE_SYS);
 
@@ -2370,11 +2375,6 @@ bool SIGfx12CacheControl::enableVolatileAndOrNonTemporal(
   Position::AFTER);
   }
 
-  if (IsNonTemporal) {
-// Set non-temporal hint for all cache levels.
-Changed |= setTH(MI, AMDGPU::CPol::TH_NT);
-  }
-
   return Changed;
 }
 
diff --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll 
b/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll
index a59c0394bebe20..ca7486536cf556 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll
@@ -582,5 +582,170 @@ entry:
   ret void
 }
 
+define amdgpu_kernel void @flat_nontemporal_volatile_load(
+; GFX7-LABEL: flat_nontemporal_volatile_load:
+; GFX7:   ; %bb.0: ; %entry
+; GFX7-NEXT:s_load_dwordx4 s[0:3], s[4:5], 0x0
+; GFX7-NEXT:s_waitcnt lgkmcnt(0)
+; GFX7-NEXT:v_mov_b32_e32 v0, s0
+; GFX7-NEXT:v_mov_b32_e32 v1, s1
+; GFX7-NEXT:flat_load_dword v2, v[0:1] glc
+; GFX7-NEXT:s_waitcnt vmcnt(0)
+; GFX7-NEXT:v_mov_b32_e32 v0, s2
+; GFX7-NEXT:v_mov_b32_e32 v1, s3
+; GFX7-NEXT:s_waitcnt lgkmcnt(0)
+; GFX7-NEXT:flat_store_dword v[0:1], v2
+; GFX7-NEXT:s_endpgm
+;
+; GFX10-WGP-LABEL: flat_nontemporal_volatile_load:
+; GFX10-WGP:   ; %bb.0: ; %entry
+; GFX10-WGP-NEXT:s_load_dwordx4 s[0:3], s[4:5], 0x0
+; GFX10-WGP-NEXT:s_waitcnt lgkmcnt(0)
+; GFX10-WGP-NEXT:v_mov_b32_e32 v0, s0
+; GFX10-WGP-NEXT:v_mov_b32_e32 v1, s1
+; GFX10-WGP-NEXT:flat_load_dword v2, v[0:1] glc dlc
+; GFX10-WGP-NEXT:s_waitcnt vmcnt(0)
+; GFX10-WGP-NEXT:v_mov_b32_e32 v0, s2
+; GFX10-WGP-NEXT:v_mov_b32_e32 v1, s3
+; GFX10-WGP-NEXT:s_waitcnt lgkmcnt(0)
+; GFX10-WGP-NEXT:flat_store_dword v[0:1], v2
+; GFX10-WGP-NEXT:s_endpgm
+;
+; GFX10-CU-LABEL: flat_nontemporal_volatile_load:
+; GFX10-CU:   ; %bb.0: ; %entry
+; GFX10-CU-NEXT:s_load_dwordx4 s[0:3], s[4:5], 0x0
+; GFX10-CU-NEXT:s_waitcnt lgkmcnt(0)
+; GFX10-CU-NEXT:v_mov_b32_e32 v0, s0
+; GFX10-CU-NEXT:v_mov_b32_e32 v1, s1
+; GFX10-CU-NEXT:flat_load_dword v2, v[0:1] glc dlc
+; GFX10-CU-NEXT:s_waitcnt vmcnt(0)
+; GFX10-CU-NEXT:v_mov_b32_e32 v0, s2
+; GFX10-CU-NEXT:v_mov_b32_e32 v1, s3
+; GFX10-CU-NEXT:s_waitcnt lgkmcnt(0)
+; GFX10-CU-NEXT:flat_store_dword v[0:1], v2
+; GFX10-CU-NEXT:s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: flat_nontemporal_volatile_load:
+; SKIP-CACHE-INV:   ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT:s_load_dwordx4 s[0:3], s[0:1], 0x0
+; SKIP-CACHE-INV-NEXT:s_waitcnt lgkmcnt(0)
+; SKIP-CACHE-INV-NEXT:v_mov_b32_e32 v0, s0
+; SKIP-CACHE-INV-NEXT:v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT:flat_load_dword v2, v[0:1] glc
+; SKIP-CACHE-INV-NEXT:s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT:v_mov_b32_e32 v0, s2
+; SKIP-CACHE-INV-NEXT:v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT:s_waitcnt lgkmcnt(0)
+; SKIP-CACHE-INV-NEXT:flat_store_dword v[0:1], v2
+; SKIP-CACHE-INV-NEXT:s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: flat_nontemporal_volatile_load:
+; GFX90A-NOTTGSPLIT:   ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT:s_load_dwordx4 s[0:3], s[4:5], 0x0
+; GFX90A-NOTTGSPLIT-NEXT:s_waitcnt lgkmcnt(0)
+; GFX90A-NOTTGSPLIT-NEXT:v_mov_b32_e32 v0, s0
+; GFX90A-NOTTGSPLIT-NEXT:v_mov_b32_e32 v1, s1
+; GFX90A-NOTTGSPLIT-NEXT:flat_load_dword v2, v[0:1] glc
+; GFX90A-NOTTGSPLIT-NEXT:

[llvm-branch-commits] [llvm] [AMDGPU] Fix setting nontemporal in memory legalizer (#83815) (PR #90204)

2024-04-26 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad milestoned 
https://github.com/llvm/llvm-project/pull/90204
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] b544217 - [AMDGPU] Fix setting nontemporal in memory legalizer (#83815)

2024-04-26 Thread Jay Foad via llvm-branch-commits

Author: Mirko Brkušanin
Date: 2024-04-26T13:35:58+01:00
New Revision: b544217fb31ffafb9b072de53a28c71acc169cf8

URL: 
https://github.com/llvm/llvm-project/commit/b544217fb31ffafb9b072de53a28c71acc169cf8
DIFF: 
https://github.com/llvm/llvm-project/commit/b544217fb31ffafb9b072de53a28c71acc169cf8.diff

LOG: [AMDGPU] Fix setting nontemporal in memory legalizer (#83815)

Iterator MI can advance in insertWait() but we need original instruction
to set temporal hint. Just move it before handling volatile.

Added: 


Modified: 
llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll
llvm/test/CodeGen/AMDGPU/memory-legalizer-global-nontemporal.ll
llvm/test/CodeGen/AMDGPU/memory-legalizer-local-nontemporal.ll
llvm/test/CodeGen/AMDGPU/memory-legalizer-private-nontemporal.ll

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp 
b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
index 84b9330ef9633e..50d8bfa8750818 100644
--- a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
@@ -2358,6 +2358,11 @@ bool SIGfx12CacheControl::enableVolatileAndOrNonTemporal(
 
   bool Changed = false;
 
+  if (IsNonTemporal) {
+// Set non-temporal hint for all cache levels.
+Changed |= setTH(MI, AMDGPU::CPol::TH_NT);
+  }
+
   if (IsVolatile) {
 Changed |= setScope(MI, AMDGPU::CPol::SCOPE_SYS);
 
@@ -2370,11 +2375,6 @@ bool SIGfx12CacheControl::enableVolatileAndOrNonTemporal(
   Position::AFTER);
   }
 
-  if (IsNonTemporal) {
-// Set non-temporal hint for all cache levels.
-Changed |= setTH(MI, AMDGPU::CPol::TH_NT);
-  }
-
   return Changed;
 }
 

diff  --git a/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll 
b/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll
index a59c0394bebe20..ca7486536cf556 100644
--- a/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll
+++ b/llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll
@@ -582,5 +582,170 @@ entry:
   ret void
 }
 
+define amdgpu_kernel void @flat_nontemporal_volatile_load(
+; GFX7-LABEL: flat_nontemporal_volatile_load:
+; GFX7:   ; %bb.0: ; %entry
+; GFX7-NEXT:s_load_dwordx4 s[0:3], s[4:5], 0x0
+; GFX7-NEXT:s_waitcnt lgkmcnt(0)
+; GFX7-NEXT:v_mov_b32_e32 v0, s0
+; GFX7-NEXT:v_mov_b32_e32 v1, s1
+; GFX7-NEXT:flat_load_dword v2, v[0:1] glc
+; GFX7-NEXT:s_waitcnt vmcnt(0)
+; GFX7-NEXT:v_mov_b32_e32 v0, s2
+; GFX7-NEXT:v_mov_b32_e32 v1, s3
+; GFX7-NEXT:s_waitcnt lgkmcnt(0)
+; GFX7-NEXT:flat_store_dword v[0:1], v2
+; GFX7-NEXT:s_endpgm
+;
+; GFX10-WGP-LABEL: flat_nontemporal_volatile_load:
+; GFX10-WGP:   ; %bb.0: ; %entry
+; GFX10-WGP-NEXT:s_load_dwordx4 s[0:3], s[4:5], 0x0
+; GFX10-WGP-NEXT:s_waitcnt lgkmcnt(0)
+; GFX10-WGP-NEXT:v_mov_b32_e32 v0, s0
+; GFX10-WGP-NEXT:v_mov_b32_e32 v1, s1
+; GFX10-WGP-NEXT:flat_load_dword v2, v[0:1] glc dlc
+; GFX10-WGP-NEXT:s_waitcnt vmcnt(0)
+; GFX10-WGP-NEXT:v_mov_b32_e32 v0, s2
+; GFX10-WGP-NEXT:v_mov_b32_e32 v1, s3
+; GFX10-WGP-NEXT:s_waitcnt lgkmcnt(0)
+; GFX10-WGP-NEXT:flat_store_dword v[0:1], v2
+; GFX10-WGP-NEXT:s_endpgm
+;
+; GFX10-CU-LABEL: flat_nontemporal_volatile_load:
+; GFX10-CU:   ; %bb.0: ; %entry
+; GFX10-CU-NEXT:s_load_dwordx4 s[0:3], s[4:5], 0x0
+; GFX10-CU-NEXT:s_waitcnt lgkmcnt(0)
+; GFX10-CU-NEXT:v_mov_b32_e32 v0, s0
+; GFX10-CU-NEXT:v_mov_b32_e32 v1, s1
+; GFX10-CU-NEXT:flat_load_dword v2, v[0:1] glc dlc
+; GFX10-CU-NEXT:s_waitcnt vmcnt(0)
+; GFX10-CU-NEXT:v_mov_b32_e32 v0, s2
+; GFX10-CU-NEXT:v_mov_b32_e32 v1, s3
+; GFX10-CU-NEXT:s_waitcnt lgkmcnt(0)
+; GFX10-CU-NEXT:flat_store_dword v[0:1], v2
+; GFX10-CU-NEXT:s_endpgm
+;
+; SKIP-CACHE-INV-LABEL: flat_nontemporal_volatile_load:
+; SKIP-CACHE-INV:   ; %bb.0: ; %entry
+; SKIP-CACHE-INV-NEXT:s_load_dwordx4 s[0:3], s[0:1], 0x0
+; SKIP-CACHE-INV-NEXT:s_waitcnt lgkmcnt(0)
+; SKIP-CACHE-INV-NEXT:v_mov_b32_e32 v0, s0
+; SKIP-CACHE-INV-NEXT:v_mov_b32_e32 v1, s1
+; SKIP-CACHE-INV-NEXT:flat_load_dword v2, v[0:1] glc
+; SKIP-CACHE-INV-NEXT:s_waitcnt vmcnt(0)
+; SKIP-CACHE-INV-NEXT:v_mov_b32_e32 v0, s2
+; SKIP-CACHE-INV-NEXT:v_mov_b32_e32 v1, s3
+; SKIP-CACHE-INV-NEXT:s_waitcnt lgkmcnt(0)
+; SKIP-CACHE-INV-NEXT:flat_store_dword v[0:1], v2
+; SKIP-CACHE-INV-NEXT:s_endpgm
+;
+; GFX90A-NOTTGSPLIT-LABEL: flat_nontemporal_volatile_load:
+; GFX90A-NOTTGSPLIT:   ; %bb.0: ; %entry
+; GFX90A-NOTTGSPLIT-NEXT:s_load_dwordx4 s[0:3], s[4:5], 0x0
+; GFX90A-NOTTGSPLIT-NEXT:s_waitcnt lgkmcnt(0)
+; GFX90A-NOTTGSPLIT-NEXT:v_mov_b32_e32 v0, s0
+; GFX90A-NOTTGSPLIT-NEXT:v_mov_b32_e32 v1, s1
+; GFX90A-NOTTGSPLIT-NEXT:flat_load_dword v2, v[0:1] glc
+; GFX90A-NOTTGSPLIT-NEXT:s_waitcnt vmcnt(0)
+;

[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)

2024-04-30 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad created 
https://github.com/llvm/llvm-project/pull/90582

image_msaa_load is actually encoded as a VSAMPLE instruction and
requires the appropriate waitcnt variant.


>From 17b75a9517891d662e677a357713c920bb79c43c Mon Sep 17 00:00:00 2001
From: David Stuttard 
Date: Tue, 30 Apr 2024 10:41:51 +0100
Subject: [PATCH] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201)

image_msaa_load is actually encoded as a VSAMPLE instruction and
requires the appropriate waitcnt variant.
---
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp   |  8 --
 .../AMDGPU/llvm.amdgcn.image.msaa.load.ll | 26 +--
 2 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index 6ecb1c8bf6e1db..97c55e4d9e41c2 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -187,8 +187,12 @@ VmemType getVmemType(const MachineInstr &Inst) {
   const AMDGPU::MIMGInfo *Info = AMDGPU::getMIMGInfo(Inst.getOpcode());
   const AMDGPU::MIMGBaseOpcodeInfo *BaseInfo =
   AMDGPU::getMIMGBaseOpcodeInfo(Info->BaseOpcode);
-  return BaseInfo->BVH ? VMEM_BVH
-   : BaseInfo->Sampler ? VMEM_SAMPLER : VMEM_NOSAMPLER;
+  // The test for MSAA here is because gfx12+ image_msaa_load is actually
+  // encoded as VSAMPLE and requires the appropriate s_waitcnt variant for 
that.
+  // Pre-gfx12 doesn't care since all vmem types result in the same s_waitcnt.
+  return BaseInfo->BVH ? VMEM_BVH
+ : BaseInfo->Sampler || BaseInfo->MSAA ? VMEM_SAMPLER
+   : VMEM_NOSAMPLER;
 }
 
 unsigned &getCounterRef(AMDGPU::Waitcnt &Wait, InstCounterType T) {
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.msaa.load.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.msaa.load.ll
index 1348315e72e7bc..8da48551855570 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.msaa.load.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.msaa.load.ll
@@ -12,7 +12,7 @@ define amdgpu_ps <4 x float> @load_2dmsaa(<8 x i32> inreg 
%rsrc, i32 %s, i32 %t,
 ; GFX12-LABEL: load_2dmsaa:
 ; GFX12:   ; %bb.0: ; %main_body
 ; GFX12-NEXT:image_msaa_load v[0:3], [v0, v1, v2], s[0:7] dmask:0x1 
dim:SQ_RSRC_IMG_2D_MSAA unorm ; encoding: 
[0x06,0x20,0x46,0xe4,0x00,0x00,0x00,0x00,0x00,0x01,0x02,0x00]
-; GFX12-NEXT:s_wait_loadcnt 0x0 ; encoding: [0x00,0x00,0xc0,0xbf]
+; GFX12-NEXT:s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf]
 ; GFX12-NEXT:; return to shader part epilog
 main_body:
   %v = call <4 x float> @llvm.amdgcn.image.msaa.load.2dmsaa.v4f32.i32(i32 1, 
i32 %s, i32 %t, i32 %fragid, <8 x i32> %rsrc, i32 0, i32 0)
@@ -32,7 +32,7 @@ define amdgpu_ps <4 x float> @load_2dmsaa_both(<8 x i32> 
inreg %rsrc, ptr addrsp
 ; GFX12:   ; %bb.0: ; %main_body
 ; GFX12-NEXT:image_msaa_load v[0:4], [v0, v1, v2], s[0:7] dmask:0x2 
dim:SQ_RSRC_IMG_2D_MSAA unorm tfe lwe ; encoding: 
[0x0e,0x20,0x86,0xe4,0x00,0x01,0x00,0x00,0x00,0x01,0x02,0x00]
 ; GFX12-NEXT:v_mov_b32_e32 v5, 0 ; encoding: [0x80,0x02,0x0a,0x7e]
-; GFX12-NEXT:s_wait_loadcnt 0x0 ; encoding: [0x00,0x00,0xc0,0xbf]
+; GFX12-NEXT:s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf]
 ; GFX12-NEXT:global_store_b32 v5, v4, s[8:9] ; encoding: 
[0x08,0x80,0x06,0xee,0x00,0x00,0x00,0x02,0x05,0x00,0x00,0x00]
 ; GFX12-NEXT:; return to shader part epilog
 main_body:
@@ -53,7 +53,7 @@ define amdgpu_ps <4 x float> @load_2darraymsaa(<8 x i32> 
inreg %rsrc, i32 %s, i3
 ; GFX12-LABEL: load_2darraymsaa:
 ; GFX12:   ; %bb.0: ; %main_body
 ; GFX12-NEXT:image_msaa_load v[0:3], [v0, v1, v2, v3], s[0:7] dmask:0x4 
dim:SQ_RSRC_IMG_2D_MSAA_ARRAY unorm ; encoding: 
[0x07,0x20,0x06,0xe5,0x00,0x00,0x00,0x00,0x00,0x01,0x02,0x03]
-; GFX12-NEXT:s_wait_loadcnt 0x0 ; encoding: [0x00,0x00,0xc0,0xbf]
+; GFX12-NEXT:s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf]
 ; GFX12-NEXT:; return to shader part epilog
 main_body:
   %v = call <4 x float> @llvm.amdgcn.image.msaa.load.2darraymsaa.v4f32.i32(i32 
4, i32 %s, i32 %t, i32 %slice, i32 %fragid, <8 x i32> %rsrc, i32 0, i32 0)
@@ -73,7 +73,7 @@ define amdgpu_ps <4 x float> @load_2darraymsaa_tfe(<8 x i32> 
inreg %rsrc, ptr ad
 ; GFX12:   ; %bb.0: ; %main_body
 ; GFX12-NEXT:image_msaa_load v[0:4], [v0, v1, v2, v3], s[0:7] dmask:0x8 
dim:SQ_RSRC_IMG_2D_MSAA_ARRAY unorm tfe ; encoding: 
[0x0f,0x20,0x06,0xe6,0x00,0x00,0x00,0x00,0x00,0x01,0x02,0x03]
 ; GFX12-NEXT:v_mov_b32_e32 v5, 0 ; encoding: [0x80,0x02,0x0a,0x7e]
-; GFX12-NEXT:s_wait_loadcnt 0x0 ; encoding: [0x00,0x00,0xc0,0xbf]
+; GFX12-NEXT:s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf]
 ; GFX12-NEXT:global_store_b32 v5, v4, s[8:9] ; encoding: 
[0x08,0x80,0x06,0xee,0x00,0x00,0x00,0x02,0x05,0x00,0x00,0x00]
 ; GFX12-NEXT:; return to shader part epilog
 main_body:
@@ -94,7 +94,7 @@ defin

[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)

2024-04-30 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad milestoned 
https://github.com/llvm/llvm-project/pull/90582
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)

2024-04-30 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

Let's not backport this yet since @pendingchaos has pointed out a problem with 
#90201.

https://github.com/llvm/llvm-project/pull/90582
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)

2024-04-30 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad converted_to_draft 
https://github.com/llvm/llvm-project/pull/90582
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Enhance s_waitcnt insertion before barrier for gfx12 (#90595) (PR #90719)

2024-05-01 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad created 
https://github.com/llvm/llvm-project/pull/90719

Code to determine if a waitcnt is required before a barrier instruction
only
considered S_BARRIER.
gfx12 adds barrier_signal/wait so need to enhance the existing code to
look for
a barrier start (which is just an S_BARRIER for earlier architectures).

>From e31113098e4669850f3ff924bead9e0fb9618f20 Mon Sep 17 00:00:00 2001
From: David Stuttard 
Date: Wed, 1 May 2024 11:37:13 +0100
Subject: [PATCH] [AMDGPU] Enhance s_waitcnt insertion before barrier for gfx12
 (#90595)

Code to determine if a waitcnt is required before a barrier instruction
only
considered S_BARRIER.
gfx12 adds barrier_signal/wait so need to enhance the existing code to
look for
a barrier start (which is just an S_BARRIER for earlier architectures).
---
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp   |  2 +-
 llvm/lib/Target/AMDGPU/SIInstrInfo.h  | 11 ++
 .../CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll   |  2 ++
 .../AMDGPU/llvm.amdgcn.s.barrier.wait.ll  | 22 +++
 4 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index 6ecb1c8bf6e1db..7a3198612f86fc 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -1832,7 +1832,7 @@ bool 
SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr &MI,
   // not, we need to ensure the subtarget is capable of backing off barrier
   // instructions in case there are any outstanding memory operations that may
   // cause an exception. Otherwise, insert an explicit S_WAITCNT 0 here.
-  if (MI.getOpcode() == AMDGPU::S_BARRIER &&
+  if (TII->isBarrierStart(MI.getOpcode()) &&
   !ST->hasAutoWaitcntBeforeBarrier() && !ST->supportsBackOffBarrier()) {
 Wait = Wait.combined(
 AMDGPU::Waitcnt::allZero(ST->hasExtendedWaitCounts(), ST->hasVscnt()));
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index 1c9dacc09f8154..626d903c0c6958 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
@@ -908,6 +908,17 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
 return MI.getDesc().TSFlags & SIInstrFlags::IsNeverUniform;
   }
 
+  // Check to see if opcode is for a barrier start. Pre gfx12 this is just the
+  // S_BARRIER, but after support for S_BARRIER_SIGNAL* / S_BARRIER_WAIT we 
want
+  // to check for the barrier start (S_BARRIER_SIGNAL*)
+  bool isBarrierStart(unsigned Opcode) const {
+return Opcode == AMDGPU::S_BARRIER ||
+   Opcode == AMDGPU::S_BARRIER_SIGNAL_M0 ||
+   Opcode == AMDGPU::S_BARRIER_SIGNAL_ISFIRST_M0 ||
+   Opcode == AMDGPU::S_BARRIER_SIGNAL_IMM ||
+   Opcode == AMDGPU::S_BARRIER_SIGNAL_ISFIRST_IMM;
+  }
+
   static bool doesNotReadTiedSource(const MachineInstr &MI) {
 return MI.getDesc().TSFlags & SIInstrFlags::TiedSourceNotRead;
   }
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll
index a7d3115af29bff..47c021769aa56f 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll
@@ -96,6 +96,7 @@ define amdgpu_kernel void @test_barrier(ptr addrspace(1) 
%out, i32 %size) #0 {
 ; VARIANT4-NEXT:s_wait_kmcnt 0x0
 ; VARIANT4-NEXT:v_xad_u32 v1, v0, -1, s2
 ; VARIANT4-NEXT:global_store_b32 v3, v0, s[0:1]
+; VARIANT4-NEXT:s_wait_storecnt 0x0
 ; VARIANT4-NEXT:s_barrier_signal -1
 ; VARIANT4-NEXT:s_barrier_wait -1
 ; VARIANT4-NEXT:v_ashrrev_i32_e32 v2, 31, v1
@@ -142,6 +143,7 @@ define amdgpu_kernel void @test_barrier(ptr addrspace(1) 
%out, i32 %size) #0 {
 ; VARIANT6-NEXT:v_dual_mov_b32 v4, s1 :: v_dual_mov_b32 v3, s0
 ; VARIANT6-NEXT:v_sub_nc_u32_e32 v1, s2, v0
 ; VARIANT6-NEXT:global_store_b32 v5, v0, s[0:1]
+; VARIANT6-NEXT:s_wait_storecnt 0x0
 ; VARIANT6-NEXT:s_barrier_signal -1
 ; VARIANT6-NEXT:s_barrier_wait -1
 ; VARIANT6-NEXT:v_ashrrev_i32_e32 v2, 31, v1
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll
index 4ab5e97964a857..38a34ec6daf73c 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll
@@ -12,6 +12,7 @@ define amdgpu_kernel void @test1_s_barrier_signal(ptr 
addrspace(1) %out) #0 {
 ; GCN-NEXT:v_sub_nc_u32_e32 v0, v1, v0
 ; GCN-NEXT:s_wait_kmcnt 0x0
 ; GCN-NEXT:global_store_b32 v3, v2, s[0:1]
+; GCN-NEXT:s_wait_storecnt 0x0
 ; GCN-NEXT:s_barrier_signal -1
 ; GCN-NEXT:s_barrier_wait -1
 ; GCN-NEXT:global_store_b32 v3, v0, s[0:1]
@@ -28,6 +29,7 @@ define amdgpu_kernel void @test1_s_barrier_signal(ptr 
addrspace(1) %out) #0 {
 ; GLOBAL-ISEL-NEXT:v_sub_nc_u32_e32 v0, v1, v0
 ; GLOBAL-ISEL-NEXT:s_wait_kmcnt 0x0
 ; GLOBAL-ISEL-N

[llvm-branch-commits] [llvm] [AMDGPU] Enhance s_waitcnt insertion before barrier for gfx12 (#90595) (PR #90719)

2024-05-01 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad milestoned 
https://github.com/llvm/llvm-project/pull/90719
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)

2024-05-01 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad updated 
https://github.com/llvm/llvm-project/pull/90582

>From 17b75a9517891d662e677a357713c920bb79c43c Mon Sep 17 00:00:00 2001
From: David Stuttard 
Date: Tue, 30 Apr 2024 10:41:51 +0100
Subject: [PATCH 1/2] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load
 (#90201)

image_msaa_load is actually encoded as a VSAMPLE instruction and
requires the appropriate waitcnt variant.
---
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp   |  8 --
 .../AMDGPU/llvm.amdgcn.image.msaa.load.ll | 26 +--
 2 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index 6ecb1c8bf6e1db..97c55e4d9e41c2 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -187,8 +187,12 @@ VmemType getVmemType(const MachineInstr &Inst) {
   const AMDGPU::MIMGInfo *Info = AMDGPU::getMIMGInfo(Inst.getOpcode());
   const AMDGPU::MIMGBaseOpcodeInfo *BaseInfo =
   AMDGPU::getMIMGBaseOpcodeInfo(Info->BaseOpcode);
-  return BaseInfo->BVH ? VMEM_BVH
-   : BaseInfo->Sampler ? VMEM_SAMPLER : VMEM_NOSAMPLER;
+  // The test for MSAA here is because gfx12+ image_msaa_load is actually
+  // encoded as VSAMPLE and requires the appropriate s_waitcnt variant for 
that.
+  // Pre-gfx12 doesn't care since all vmem types result in the same s_waitcnt.
+  return BaseInfo->BVH ? VMEM_BVH
+ : BaseInfo->Sampler || BaseInfo->MSAA ? VMEM_SAMPLER
+   : VMEM_NOSAMPLER;
 }
 
 unsigned &getCounterRef(AMDGPU::Waitcnt &Wait, InstCounterType T) {
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.msaa.load.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.msaa.load.ll
index 1348315e72e7bc..8da48551855570 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.msaa.load.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.msaa.load.ll
@@ -12,7 +12,7 @@ define amdgpu_ps <4 x float> @load_2dmsaa(<8 x i32> inreg 
%rsrc, i32 %s, i32 %t,
 ; GFX12-LABEL: load_2dmsaa:
 ; GFX12:   ; %bb.0: ; %main_body
 ; GFX12-NEXT:image_msaa_load v[0:3], [v0, v1, v2], s[0:7] dmask:0x1 
dim:SQ_RSRC_IMG_2D_MSAA unorm ; encoding: 
[0x06,0x20,0x46,0xe4,0x00,0x00,0x00,0x00,0x00,0x01,0x02,0x00]
-; GFX12-NEXT:s_wait_loadcnt 0x0 ; encoding: [0x00,0x00,0xc0,0xbf]
+; GFX12-NEXT:s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf]
 ; GFX12-NEXT:; return to shader part epilog
 main_body:
   %v = call <4 x float> @llvm.amdgcn.image.msaa.load.2dmsaa.v4f32.i32(i32 1, 
i32 %s, i32 %t, i32 %fragid, <8 x i32> %rsrc, i32 0, i32 0)
@@ -32,7 +32,7 @@ define amdgpu_ps <4 x float> @load_2dmsaa_both(<8 x i32> 
inreg %rsrc, ptr addrsp
 ; GFX12:   ; %bb.0: ; %main_body
 ; GFX12-NEXT:image_msaa_load v[0:4], [v0, v1, v2], s[0:7] dmask:0x2 
dim:SQ_RSRC_IMG_2D_MSAA unorm tfe lwe ; encoding: 
[0x0e,0x20,0x86,0xe4,0x00,0x01,0x00,0x00,0x00,0x01,0x02,0x00]
 ; GFX12-NEXT:v_mov_b32_e32 v5, 0 ; encoding: [0x80,0x02,0x0a,0x7e]
-; GFX12-NEXT:s_wait_loadcnt 0x0 ; encoding: [0x00,0x00,0xc0,0xbf]
+; GFX12-NEXT:s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf]
 ; GFX12-NEXT:global_store_b32 v5, v4, s[8:9] ; encoding: 
[0x08,0x80,0x06,0xee,0x00,0x00,0x00,0x02,0x05,0x00,0x00,0x00]
 ; GFX12-NEXT:; return to shader part epilog
 main_body:
@@ -53,7 +53,7 @@ define amdgpu_ps <4 x float> @load_2darraymsaa(<8 x i32> 
inreg %rsrc, i32 %s, i3
 ; GFX12-LABEL: load_2darraymsaa:
 ; GFX12:   ; %bb.0: ; %main_body
 ; GFX12-NEXT:image_msaa_load v[0:3], [v0, v1, v2, v3], s[0:7] dmask:0x4 
dim:SQ_RSRC_IMG_2D_MSAA_ARRAY unorm ; encoding: 
[0x07,0x20,0x06,0xe5,0x00,0x00,0x00,0x00,0x00,0x01,0x02,0x03]
-; GFX12-NEXT:s_wait_loadcnt 0x0 ; encoding: [0x00,0x00,0xc0,0xbf]
+; GFX12-NEXT:s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf]
 ; GFX12-NEXT:; return to shader part epilog
 main_body:
   %v = call <4 x float> @llvm.amdgcn.image.msaa.load.2darraymsaa.v4f32.i32(i32 
4, i32 %s, i32 %t, i32 %slice, i32 %fragid, <8 x i32> %rsrc, i32 0, i32 0)
@@ -73,7 +73,7 @@ define amdgpu_ps <4 x float> @load_2darraymsaa_tfe(<8 x i32> 
inreg %rsrc, ptr ad
 ; GFX12:   ; %bb.0: ; %main_body
 ; GFX12-NEXT:image_msaa_load v[0:4], [v0, v1, v2, v3], s[0:7] dmask:0x8 
dim:SQ_RSRC_IMG_2D_MSAA_ARRAY unorm tfe ; encoding: 
[0x0f,0x20,0x06,0xe6,0x00,0x00,0x00,0x00,0x00,0x01,0x02,0x03]
 ; GFX12-NEXT:v_mov_b32_e32 v5, 0 ; encoding: [0x80,0x02,0x0a,0x7e]
-; GFX12-NEXT:s_wait_loadcnt 0x0 ; encoding: [0x00,0x00,0xc0,0xbf]
+; GFX12-NEXT:s_wait_samplecnt 0x0 ; encoding: [0x00,0x00,0xc2,0xbf]
 ; GFX12-NEXT:global_store_b32 v5, v4, s[8:9] ; encoding: 
[0x08,0x80,0x06,0xee,0x00,0x00,0x00,0x02,0x05,0x00,0x00,0x00]
 ; GFX12-NEXT:; return to shader part epilog
 main_body:
@@ -94,7 +94,7 @@ define amdgpu_ps <4 x float> @load_2dmsaa_glc(<8 x i32> inreg 
%rsrc, i32 %s, i32
 ; GFX12-LABEL: load_2dmsaa

[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)

2024-05-01 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad ready_for_review 
https://github.com/llvm/llvm-project/pull/90582
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Fix gfx12 waitcnt type for image_msaa_load (#90201) (PR #90582)

2024-05-01 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

> Let's not backport this yet since @pendingchaos has pointed out a problem 
> with #90201.

Fixed by #90710 which I have added to this PR.

https://github.com/llvm/llvm-project/pull/90582
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Fix setting nontemporal in memory legalizer (#83815) (PR #90204)

2024-05-02 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

> Hi @jayfoad (or anyone else). If you would like to add a note about this fix 
> in the release notes (completely optional). Please reply to this comment with 
> a one or two sentence description of the fix. When you are done, please add 
> the release:note label to this PR.

I don't think this fix is particularly noteworthy. Would there already be a 
list of bugs fixed in the release notes?

https://github.com/llvm/llvm-project/pull/90204
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: [AMDGPU] Fix GFX12 encoding of s_wait_event export_ready (#89622) (PR #91034)

2024-05-05 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request.


https://github.com/llvm/llvm-project/pull/91034
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: [AMDGPU] Fix GFX12 encoding of s_wait_event export_ready (#89622) (PR #91034)

2024-05-10 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

> Fixed encoding of AMDGPU instructions

I don't think the release notes should say that. It makes it sound like all 
encodings were wrong.

https://github.com/llvm/llvm-project/pull/91034
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] PR for llvm/llvm-project#79451 (PR #79457)

2024-01-25 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

> @jayfoad What do you think about merging this PR to the release branch?

LGTM, but it was me that requested it.

https://github.com/llvm/llvm-project/pull/79457
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325) (PR #79689)

2024-01-27 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad created 
https://github.com/llvm/llvm-project/pull/79689

This is only valid on targets with architected SGPRs.

>From c5949b09b05e7417d0494b2301781b84d22b95ef Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Thu, 25 Jan 2024 07:48:06 +
Subject: [PATCH] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325)

This is only valid on targets with architected SGPRs.
---
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |  4 ++
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 19 ++
 llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h  |  1 +
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 14 +
 llvm/lib/Target/AMDGPU/SIISelLowering.h   |  1 +
 .../CodeGen/AMDGPU/llvm.amdgcn.wave.id.ll | 61 +++
 6 files changed, 100 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wave.id.ll

diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td 
b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
index 9eb1ac8e27befb..c5f43d17d1c148 100644
--- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -2777,6 +2777,10 @@ class AMDGPULoadTr:
 
 def int_amdgcn_global_load_tr : AMDGPULoadTr;
 
+// i32 @llvm.amdgcn.wave.id()
+def int_amdgcn_wave_id :
+  DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
+
 
//===--===//
 // Deep learning intrinsics.
 
//===--===//
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index 32921bb248caf0..118c8b7c66690f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -6848,6 +6848,23 @@ bool AMDGPULegalizerInfo::legalizeStackSave(MachineInstr 
&MI,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeWaveID(MachineInstr &MI,
+ MachineIRBuilder &B) const {
+  // With architected SGPRs, waveIDinGroup is in TTMP8[29:25].
+  if (!ST.hasArchitectedSGPRs())
+return false;
+  LLT S32 = LLT::scalar(32);
+  Register DstReg = MI.getOperand(0).getReg();
+  Register TTMP8 =
+  getFunctionLiveInPhysReg(B.getMF(), B.getTII(), AMDGPU::TTMP8,
+   AMDGPU::SReg_32RegClass, B.getDebugLoc(), S32);
+  auto LSB = B.buildConstant(S32, 25);
+  auto Width = B.buildConstant(S32, 5);
+  B.buildUbfx(DstReg, TTMP8, LSB, Width);
+  MI.eraseFromParent();
+  return true;
+}
+
 bool AMDGPULegalizerInfo::legalizeIntrinsic(LegalizerHelper &Helper,
 MachineInstr &MI) const {
   MachineIRBuilder &B = Helper.MIRBuilder;
@@ -6970,6 +6987,8 @@ bool 
AMDGPULegalizerInfo::legalizeIntrinsic(LegalizerHelper &Helper,
   case Intrinsic::amdgcn_workgroup_id_z:
 return legalizePreloadedArgIntrin(MI, MRI, B,
   AMDGPUFunctionArgInfo::WORKGROUP_ID_Z);
+  case Intrinsic::amdgcn_wave_id:
+return legalizeWaveID(MI, B);
   case Intrinsic::amdgcn_lds_kernel_id:
 return legalizePreloadedArgIntrin(MI, MRI, B,
   AMDGPUFunctionArgInfo::LDS_KERNEL_ID);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
index 56aabd4f6ab71b..ecbe42681c6690 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
@@ -212,6 +212,7 @@ class AMDGPULegalizerInfo final : public LegalizerInfo {
 
   bool legalizeFPTruncRound(MachineInstr &MI, MachineIRBuilder &B) const;
   bool legalizeStackSave(MachineInstr &MI, MachineIRBuilder &B) const;
+  bool legalizeWaveID(MachineInstr &MI, MachineIRBuilder &B) const;
 
   bool legalizeImageIntrinsic(
   MachineInstr &MI, MachineIRBuilder &B,
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index d35b76c8ad54eb..9cbcf0012ea878 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -7890,6 +7890,18 @@ SDValue SITargetLowering::lowerSBuffer(EVT VT, SDLoc DL, 
SDValue Rsrc,
   return Loads[0];
 }
 
+SDValue SITargetLowering::lowerWaveID(SelectionDAG &DAG, SDValue Op) const {
+  // With architected SGPRs, waveIDinGroup is in TTMP8[29:25].
+  if (!Subtarget->hasArchitectedSGPRs())
+return {};
+  SDLoc SL(Op);
+  MVT VT = MVT::i32;
+  SDValue TTMP8 = CreateLiveInRegister(DAG, &AMDGPU::SReg_32RegClass,
+   AMDGPU::TTMP8, VT, SL);
+  return DAG.getNode(AMDGPUISD::BFE_U32, SL, VT, TTMP8,
+ DAG.getConstant(25, SL, VT), DAG.getConstant(5, SL, VT));
+}
+
 SDValue SITargetLowering::lowerWorkitemID(SelectionDAG &DAG, SDValue Op,
   unsigned Dim,
   const ArgDescriptor &Arg) const {
@@ -8060,6 +8072,8 @@ SDValue SITargetLowering::Lower

[llvm-branch-commits] [llvm] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325) (PR #79689)

2024-01-27 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad milestoned 
https://github.com/llvm/llvm-project/pull/79689
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325) (PR #79689)

2024-01-27 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad edited 
https://github.com/llvm/llvm-project/pull/79689
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325) (PR #79689)

2024-01-29 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

@tstellar does this backport PR look OK? I created it with `gh pr create -f -B 
release/18.x` and I wasn't sure if I had to edit anything, apart from adding 
the release milestone.

https://github.com/llvm/llvm-project/pull/79689
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325) (PR #79689)

2024-01-29 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad closed 
https://github.com/llvm/llvm-project/pull/79689
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Backport 45d2d7757feb386186f69af6ef57bde7b5adc2db to release/18.x (PR #79839)

2024-01-29 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad created 
https://github.com/llvm/llvm-project/pull/79839

This just missed the branch creation and is the last piece of functionality 
required to get AMDGPU GFX12 support working in the 18.x release.



>From c265c8527285075a58b2425198dbd4cca8b69477 Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Thu, 25 Jan 2024 07:48:06 +
Subject: [PATCH] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325)

This is only valid on targets with architected SGPRs.
---
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |  4 ++
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 19 ++
 llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h  |  1 +
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 14 +
 llvm/lib/Target/AMDGPU/SIISelLowering.h   |  1 +
 .../CodeGen/AMDGPU/llvm.amdgcn.wave.id.ll | 61 +++
 6 files changed, 100 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wave.id.ll

diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td 
b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
index 9eb1ac8e27befb1..c5f43d17d1c1481 100644
--- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -2777,6 +2777,10 @@ class AMDGPULoadTr:
 
 def int_amdgcn_global_load_tr : AMDGPULoadTr;
 
+// i32 @llvm.amdgcn.wave.id()
+def int_amdgcn_wave_id :
+  DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
+
 
//===--===//
 // Deep learning intrinsics.
 
//===--===//
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index 615685822f91eeb..e98ede88a7e2db9 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -6883,6 +6883,23 @@ bool AMDGPULegalizerInfo::legalizeStackSave(MachineInstr 
&MI,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeWaveID(MachineInstr &MI,
+ MachineIRBuilder &B) const {
+  // With architected SGPRs, waveIDinGroup is in TTMP8[29:25].
+  if (!ST.hasArchitectedSGPRs())
+return false;
+  LLT S32 = LLT::scalar(32);
+  Register DstReg = MI.getOperand(0).getReg();
+  Register TTMP8 =
+  getFunctionLiveInPhysReg(B.getMF(), B.getTII(), AMDGPU::TTMP8,
+   AMDGPU::SReg_32RegClass, B.getDebugLoc(), S32);
+  auto LSB = B.buildConstant(S32, 25);
+  auto Width = B.buildConstant(S32, 5);
+  B.buildUbfx(DstReg, TTMP8, LSB, Width);
+  MI.eraseFromParent();
+  return true;
+}
+
 bool AMDGPULegalizerInfo::legalizeIntrinsic(LegalizerHelper &Helper,
 MachineInstr &MI) const {
   MachineIRBuilder &B = Helper.MIRBuilder;
@@ -7005,6 +7022,8 @@ bool 
AMDGPULegalizerInfo::legalizeIntrinsic(LegalizerHelper &Helper,
   case Intrinsic::amdgcn_workgroup_id_z:
 return legalizePreloadedArgIntrin(MI, MRI, B,
   AMDGPUFunctionArgInfo::WORKGROUP_ID_Z);
+  case Intrinsic::amdgcn_wave_id:
+return legalizeWaveID(MI, B);
   case Intrinsic::amdgcn_lds_kernel_id:
 return legalizePreloadedArgIntrin(MI, MRI, B,
   AMDGPUFunctionArgInfo::LDS_KERNEL_ID);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
index 56aabd4f6ab71b6..ecbe42681c6690c 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
@@ -212,6 +212,7 @@ class AMDGPULegalizerInfo final : public LegalizerInfo {
 
   bool legalizeFPTruncRound(MachineInstr &MI, MachineIRBuilder &B) const;
   bool legalizeStackSave(MachineInstr &MI, MachineIRBuilder &B) const;
+  bool legalizeWaveID(MachineInstr &MI, MachineIRBuilder &B) const;
 
   bool legalizeImageIntrinsic(
   MachineInstr &MI, MachineIRBuilder &B,
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index d60f511302613e1..c5ad9da88ec2b31 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -7920,6 +7920,18 @@ SDValue SITargetLowering::lowerSBuffer(EVT VT, SDLoc DL, 
SDValue Rsrc,
   return Loads[0];
 }
 
+SDValue SITargetLowering::lowerWaveID(SelectionDAG &DAG, SDValue Op) const {
+  // With architected SGPRs, waveIDinGroup is in TTMP8[29:25].
+  if (!Subtarget->hasArchitectedSGPRs())
+return {};
+  SDLoc SL(Op);
+  MVT VT = MVT::i32;
+  SDValue TTMP8 = CreateLiveInRegister(DAG, &AMDGPU::SReg_32RegClass,
+   AMDGPU::TTMP8, VT, SL);
+  return DAG.getNode(AMDGPUISD::BFE_U32, SL, VT, TTMP8,
+ DAG.getConstant(25, SL, VT), DAG.getConstant(5, SL, VT));
+}
+
 SDValue SITargetLowering::lowerWorkitemID(SelectionDAG &DAG, SDValue Op,
   unsigned Dim,
   

[llvm-branch-commits] [llvm] Backport 45d2d7757feb386186f69af6ef57bde7b5adc2db to release/18.x (PR #79839)

2024-01-29 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad milestoned 
https://github.com/llvm/llvm-project/pull/79839
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325) (PR #79689)

2024-01-29 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

> jayfoad closed this by deleting the head repository 3 hours ago

Sorry. Recreated as #79839

https://github.com/llvm/llvm-project/pull/79689
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Backport 45d2d7757feb386186f69af6ef57bde7b5adc2db to release/18.x (PR #79839)

2024-01-29 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad updated 
https://github.com/llvm/llvm-project/pull/79839

>From c265c8527285075a58b2425198dbd4cca8b69477 Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Thu, 25 Jan 2024 07:48:06 +
Subject: [PATCH 1/2] [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325)

This is only valid on targets with architected SGPRs.
---
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |  4 ++
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | 19 ++
 llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h  |  1 +
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 14 +
 llvm/lib/Target/AMDGPU/SIISelLowering.h   |  1 +
 .../CodeGen/AMDGPU/llvm.amdgcn.wave.id.ll | 61 +++
 6 files changed, 100 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wave.id.ll

diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td 
b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
index 9eb1ac8e27befb..c5f43d17d1c148 100644
--- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -2777,6 +2777,10 @@ class AMDGPULoadTr:
 
 def int_amdgcn_global_load_tr : AMDGPULoadTr;
 
+// i32 @llvm.amdgcn.wave.id()
+def int_amdgcn_wave_id :
+  DefaultAttrsIntrinsic<[llvm_i32_ty], [], [IntrNoMem, IntrSpeculatable]>;
+
 
//===--===//
 // Deep learning intrinsics.
 
//===--===//
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index 615685822f91ee..e98ede88a7e2db 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -6883,6 +6883,23 @@ bool AMDGPULegalizerInfo::legalizeStackSave(MachineInstr 
&MI,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeWaveID(MachineInstr &MI,
+ MachineIRBuilder &B) const {
+  // With architected SGPRs, waveIDinGroup is in TTMP8[29:25].
+  if (!ST.hasArchitectedSGPRs())
+return false;
+  LLT S32 = LLT::scalar(32);
+  Register DstReg = MI.getOperand(0).getReg();
+  Register TTMP8 =
+  getFunctionLiveInPhysReg(B.getMF(), B.getTII(), AMDGPU::TTMP8,
+   AMDGPU::SReg_32RegClass, B.getDebugLoc(), S32);
+  auto LSB = B.buildConstant(S32, 25);
+  auto Width = B.buildConstant(S32, 5);
+  B.buildUbfx(DstReg, TTMP8, LSB, Width);
+  MI.eraseFromParent();
+  return true;
+}
+
 bool AMDGPULegalizerInfo::legalizeIntrinsic(LegalizerHelper &Helper,
 MachineInstr &MI) const {
   MachineIRBuilder &B = Helper.MIRBuilder;
@@ -7005,6 +7022,8 @@ bool 
AMDGPULegalizerInfo::legalizeIntrinsic(LegalizerHelper &Helper,
   case Intrinsic::amdgcn_workgroup_id_z:
 return legalizePreloadedArgIntrin(MI, MRI, B,
   AMDGPUFunctionArgInfo::WORKGROUP_ID_Z);
+  case Intrinsic::amdgcn_wave_id:
+return legalizeWaveID(MI, B);
   case Intrinsic::amdgcn_lds_kernel_id:
 return legalizePreloadedArgIntrin(MI, MRI, B,
   AMDGPUFunctionArgInfo::LDS_KERNEL_ID);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
index 56aabd4f6ab71b..ecbe42681c6690 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h
@@ -212,6 +212,7 @@ class AMDGPULegalizerInfo final : public LegalizerInfo {
 
   bool legalizeFPTruncRound(MachineInstr &MI, MachineIRBuilder &B) const;
   bool legalizeStackSave(MachineInstr &MI, MachineIRBuilder &B) const;
+  bool legalizeWaveID(MachineInstr &MI, MachineIRBuilder &B) const;
 
   bool legalizeImageIntrinsic(
   MachineInstr &MI, MachineIRBuilder &B,
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index d60f511302613e..c5ad9da88ec2b3 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -7920,6 +7920,18 @@ SDValue SITargetLowering::lowerSBuffer(EVT VT, SDLoc DL, 
SDValue Rsrc,
   return Loads[0];
 }
 
+SDValue SITargetLowering::lowerWaveID(SelectionDAG &DAG, SDValue Op) const {
+  // With architected SGPRs, waveIDinGroup is in TTMP8[29:25].
+  if (!Subtarget->hasArchitectedSGPRs())
+return {};
+  SDLoc SL(Op);
+  MVT VT = MVT::i32;
+  SDValue TTMP8 = CreateLiveInRegister(DAG, &AMDGPU::SReg_32RegClass,
+   AMDGPU::TTMP8, VT, SL);
+  return DAG.getNode(AMDGPUISD::BFE_U32, SL, VT, TTMP8,
+ DAG.getConstant(25, SL, VT), DAG.getConstant(5, SL, VT));
+}
+
 SDValue SITargetLowering::lowerWorkitemID(SelectionDAG &DAG, SDValue Op,
   unsigned Dim,
   const ArgDescriptor &Arg) const {
@@ -8090,6 +8102,8 @@ SDValue SITargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue 
Op,
   case Intrinsic::

[llvm-branch-commits] [llvm] Backport 45d2d7757feb386186f69af6ef57bde7b5adc2db to release/18.x (PR #79839)

2024-01-29 Thread Jay Foad via llvm-branch-commits


@@ -6883,6 +6883,23 @@ bool AMDGPULegalizerInfo::legalizeStackSave(MachineInstr 
&MI,
   return true;
 }
 
+bool AMDGPULegalizerInfo::legalizeWaveID(MachineInstr &MI,
+ MachineIRBuilder &B) const {
+  // With architected SGPRs, waveIDinGroup is in TTMP8[29:25].
+  if (!ST.hasArchitectedSGPRs())
+return false;
+  LLT S32 = LLT::scalar(32);
+  Register DstReg = MI.getOperand(0).getReg();
+  Register TTMP8 =
+  getFunctionLiveInPhysReg(B.getMF(), B.getTII(), AMDGPU::TTMP8,

jayfoad wrote:

True, 66c710ec9dcdbdec6cadd89b972d8945983dc92f improved this to avoid adding 
liveins. I wasn't going to bother backporting that since I didn't think it was 
required for correctness. But I have cherry-picked it into this PR now.

https://github.com/llvm/llvm-project/pull/79839
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)

2024-08-21 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad created 
https://github.com/llvm/llvm-project/pull/105549

Fix SIInsertWaitcnts to account for this by adding extra waits to avoid
WAW dependencies.

>From 9a2103df4094af38f59e1adce5414b94672e6d6e Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Wed, 21 Aug 2024 16:23:49 +0100
Subject: [PATCH] [AMDGPU] GFX12 VMEM instructions can write VGPR results out
 of order

Fix SIInsertWaitcnts to account for this by adding extra waits to avoid
WAW dependencies.
---
 llvm/lib/Target/AMDGPU/AMDGPU.td  | 23 ++-
 llvm/lib/Target/AMDGPU/GCNSubtarget.h |  3 +++
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp   |  7 +++---
 .../buffer-fat-pointer-atomicrmw-fadd.ll  |  3 +++
 .../buffer-fat-pointer-atomicrmw-fmax.ll  |  5 
 .../buffer-fat-pointer-atomicrmw-fmin.ll  |  5 
 amdgcn.struct.buffer.load.format.v3f16.ll |  1 +
 llvm/test/CodeGen/AMDGPU/load-constant-i16.ll | 10 +++-
 llvm/test/CodeGen/AMDGPU/load-global-i16.ll   | 10 
 llvm/test/CodeGen/AMDGPU/load-global-i32.ll   |  2 ++
 .../AMDGPU/spill-csr-frame-ptr-reg-copy.ll|  1 +
 .../CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir |  8 +++
 12 files changed, 64 insertions(+), 14 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 7906e0ee9d7858..9efdbd751d96e3 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -953,6 +953,12 @@ def FeatureRequiredExportPriority : 
SubtargetFeature<"required-export-priority",
   "Export priority must be explicitly manipulated on GFX11.5"
 >;
 
+def FeatureVmemWriteVgprInOrder : SubtargetFeature<"vmem-write-vgpr-in-order",
+  "HasVmemWriteVgprInOrder",
+  "true",
+  "VMEM instructions of the same type write VGPR results in order"
+>;
+
 //======//
 // Subtarget Features (options and debugging)
 //======//
@@ -1123,7 +1129,8 @@ def FeatureSouthernIslands : 
GCNSubtargetFeatureGeneration<"SOUTHERN_ISLANDS",
   FeatureDsSrc2Insts, FeatureLDSBankCount32, FeatureMovrel,
   FeatureTrigReducedRange, FeatureExtendedImageInsts, FeatureImageInsts,
   FeatureGDS, FeatureGWS, FeatureDefaultComponentZero,
-  FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF64GlobalInsts
+  FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF64GlobalInsts,
+  FeatureVmemWriteVgprInOrder
   ]
 >;
 
@@ -1136,7 +1143,8 @@ def FeatureSeaIslands : 
GCNSubtargetFeatureGeneration<"SEA_ISLANDS",
   FeatureDsSrc2Insts, FeatureExtendedImageInsts, FeatureUnalignedBufferAccess,
   FeatureImageInsts, FeatureGDS, FeatureGWS, FeatureDefaultComponentZero,
   FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF64GlobalInsts,
-  FeatureAtomicFMinFMaxF32FlatInsts, FeatureAtomicFMinFMaxF64FlatInsts
+  FeatureAtomicFMinFMaxF32FlatInsts, FeatureAtomicFMinFMaxF64FlatInsts,
+  FeatureVmemWriteVgprInOrder
   ]
 >;
 
@@ -1152,7 +1160,7 @@ def FeatureVolcanicIslands : 
GCNSubtargetFeatureGeneration<"VOLCANIC_ISLANDS",
FeatureGFX7GFX8GFX9Insts, FeatureSMemTimeInst, FeatureMadMacF32Insts,
FeatureDsSrc2Insts, FeatureExtendedImageInsts, FeatureFastDenormalF32,
FeatureUnalignedBufferAccess, FeatureImageInsts, FeatureGDS, FeatureGWS,
-   FeatureDefaultComponentZero
+   FeatureDefaultComponentZero, FeatureVmemWriteVgprInOrder
   ]
 >;
 
@@ -1170,7 +1178,8 @@ def FeatureGFX9 : GCNSubtargetFeatureGeneration<"GFX9",
FeatureScalarFlatScratchInsts, FeatureScalarAtomics, FeatureR128A16,
FeatureA16, FeatureSMemTimeInst, FeatureFastDenormalF32, 
FeatureSupportsXNACK,
FeatureUnalignedBufferAccess, FeatureUnalignedDSAccess,
-   FeatureNegativeScratchOffsetBug, FeatureGWS, FeatureDefaultComponentZero
+   FeatureNegativeScratchOffsetBug, FeatureGWS, FeatureDefaultComponentZero,
+   FeatureVmemWriteVgprInOrder
   ]
 >;
 
@@ -1193,7 +1202,8 @@ def FeatureGFX10 : GCNSubtargetFeatureGeneration<"GFX10",
FeatureGDS, FeatureGWS, FeatureDefaultComponentZero,
FeatureMaxHardClauseLength63,
FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF64GlobalInsts,
-   FeatureAtomicFMinFMaxF32FlatInsts, FeatureAtomicFMinFMaxF64FlatInsts
+   FeatureAtomicFMinFMaxF32FlatInsts, FeatureAtomicFMinFMaxF64FlatInsts,
+   FeatureVmemWriteVgprInOrder
   ]
 >;
 
@@ -1215,7 +1225,8 @@ def FeatureGFX11 : GCNSubtargetFeatureGeneration<"GFX11",
FeatureUnalignedBufferAccess, FeatureUnalignedDSAccess, FeatureGDS,
FeatureGWS, FeatureDefaultComponentZero,
FeatureMaxHardClauseLength32,
-   FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF32FlatInsts
+   FeatureAtomicFMinFMaxF32GlobalInsts, FeatureAtomicFMinFMaxF32FlatInsts,
+   FeatureVmemWriteVgprInOrder
   ]
 >;
 
diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h 
b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
index 902f51ae358d59..9386bcf0d74b22 100644
--- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h
+++ b/llvm/lib/Target/AMDGPU

[llvm-branch-commits] [llvm] [AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 (PR #105550)

2024-08-21 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad created 
https://github.com/llvm/llvm-project/pull/105550

When a loop contains a VMEM load whose result is only used outside the
loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for
vmcnt will be required inside the loop anyway, because VMEM instructions
can write their VGPR results out of order.

>From e53f75835dd0f0fc9d11b17afbe40de9b4a8a35b Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Wed, 21 Aug 2024 16:57:24 +0100
Subject: [PATCH] [AMDGPU] Remove one case of vmcnt loop header flushing for
 GFX12

When a loop contains a VMEM load whose result is only used outside the
loop, do not bother to flush vmcnt in the loop head on GFX12. A wait for
vmcnt will be required inside the loop anyway, because VMEM instructions
can write their VGPR results out of order.
---
 llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp |  2 +-
 llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir | 10 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp 
b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index 4262e7b5d9c25..eafe20be17d5b 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -2390,7 +2390,7 @@ bool SIInsertWaitcnts::shouldFlushVmCnt(MachineLoop *ML,
   }
   if (!ST->hasVscnt() && HasVMemStore && !HasVMemLoad && UsesVgprLoadedOutside)
 return true;
-  return HasVMemLoad && UsesVgprLoadedOutside;
+  return HasVMemLoad && UsesVgprLoadedOutside && ST->hasVmemWriteVgprInOrder();
 }
 
 bool SIInsertWaitcnts::runOnMachineFunction(MachineFunction &MF) {
diff --git a/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir 
b/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
index bdef55ab956a0..0ddd2aa285b26 100644
--- a/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
+++ b/llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
@@ -295,7 +295,7 @@ body: |
 # GFX12-LABEL: waitcnt_vm_loop2
 # GFX12-LABEL: bb.0:
 # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN
-# GFX12: S_WAIT_LOADCNT 0
+# GFX12-NOT: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.1:
 # GFX12: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.2:
@@ -342,7 +342,7 @@ body: |
 # GFX12-LABEL: waitcnt_vm_loop2_store
 # GFX12-LABEL: bb.0:
 # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN
-# GFX12: S_WAIT_LOADCNT 0
+# GFX12-NOT: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.1:
 # GFX12: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.2:
@@ -499,9 +499,9 @@ body: |
 # GFX12-LABEL: waitcnt_vm_loop2_reginterval
 # GFX12-LABEL: bb.0:
 # GFX12: GLOBAL_LOAD_DWORDX4
-# GFX12: S_WAIT_LOADCNT 0
-# GFX12-LABEL: bb.1:
 # GFX12-NOT: S_WAIT_LOADCNT 0
+# GFX12-LABEL: bb.1:
+# GFX12: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.2:
 name:waitcnt_vm_loop2_reginterval
 body: |
@@ -600,7 +600,7 @@ body: |
 # GFX12-LABEL: bb.0:
 # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN
 # GFX12: BUFFER_LOAD_FORMAT_X_IDXEN
-# GFX12: S_WAIT_LOADCNT 0
+# GFX12-NOT: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.1:
 # GFX12: S_WAIT_LOADCNT 0
 # GFX12-LABEL: bb.2:

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)

2024-08-21 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/105549?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#105550** https://app.graphite.dev/github/pr/llvm/llvm-project/105550?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#105549** https://app.graphite.dev/github/pr/llvm/llvm-project/105549?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* **#105548** https://app.graphite.dev/github/pr/llvm/llvm-project/105548?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @jayfoad and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/105549
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 (PR #105550)

2024-08-21 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/105550?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#105550** https://app.graphite.dev/github/pr/llvm/llvm-project/105550?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* **#105549** https://app.graphite.dev/github/pr/llvm/llvm-project/105549?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#105548** https://app.graphite.dev/github/pr/llvm/llvm-project/105548?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @jayfoad and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/105550
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)

2024-08-21 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad ready_for_review 
https://github.com/llvm/llvm-project/pull/105549
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 (PR #105550)

2024-08-21 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad ready_for_review 
https://github.com/llvm/llvm-project/pull/105550
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)

2024-08-22 Thread Jay Foad via llvm-branch-commits


@@ -953,6 +953,12 @@ def FeatureRequiredExportPriority : 
SubtargetFeature<"required-export-priority",
   "Export priority must be explicitly manipulated on GFX11.5"
 >;
 
+def FeatureVmemWriteVgprInOrder : SubtargetFeature<"vmem-write-vgpr-in-order",

jayfoad wrote:

"Easier" how? You mean it would make the patch smaller? I prefer to have 
features that state things in a "positive" way, so that not having the feature 
still generates conservatively correct code.

https://github.com/llvm/llvm-project/pull/105549
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)

2024-08-22 Thread Jay Foad via llvm-branch-commits


@@ -1778,11 +1778,12 @@ bool 
SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr &MI,
   if (IsVGPR) {
 // RAW always needs an s_waitcnt. WAW needs an s_waitcnt unless the
 // previous write and this write are the same type of VMEM
-// instruction, in which case they're guaranteed to write their
-// results in order anyway.
+// instruction, in which case they are (in some architectures)
+// guaranteed to write their results in order anyway.

jayfoad wrote:

No this is nothing to do with storing data to memory. We are only talking about 
loads (or atomic with results) and the order in which they write the loaded 
data into the result VGPR.

https://github.com/llvm/llvm-project/pull/105549
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)

2024-08-22 Thread Jay Foad via llvm-branch-commits


@@ -4371,8 +4375,10 @@ define amdgpu_kernel void 
@global_sextload_v64i16_to_v64i32(ptr addrspace(1) %ou
 ; GCN-NOHSA-SI-NEXT:buffer_store_dwordx4 v[8:11], off, s[0:3], 0 offset:48
 ; GCN-NOHSA-SI-NEXT:buffer_store_dwordx4 v[4:7], off, s[0:3], 0
 ; GCN-NOHSA-SI-NEXT:buffer_load_dword v0, off, s[12:15], 0 ; 4-byte Folded 
Reload
+; GCN-NOHSA-SI-NEXT:s_waitcnt vmcnt(0)

jayfoad wrote:

The first RUN line does not specify a CPU so it will get some generic CPU that 
does not have the new feature.

https://github.com/llvm/llvm-project/pull/105549
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM instructions can write VGPR results out of order (PR #105549)

2024-08-22 Thread Jay Foad via llvm-branch-commits


@@ -754,13 +754,21 @@ define amdgpu_kernel void 
@constant_load_v16i16_align2(ptr addrspace(4) %ptr0) #
 ; GFX12-NEXT:global_load_u16 v6, v8, s[0:1] offset:8
 ; GFX12-NEXT:global_load_u16 v5, v8, s[0:1] offset:4
 ; GFX12-NEXT:global_load_u16 v4, v8, s[0:1]
+; GFX12-NEXT:s_wait_loadcnt 0x7

jayfoad wrote:

This wait is required to ensure that the global_load_u16 on line 749 writes to 
v3 before the global_load_d16_hi_b16 on line 758.

https://github.com/llvm/llvm-project/pull/105549
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM loads can write VGPR results out of order (PR #105549)

2024-08-22 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad edited 
https://github.com/llvm/llvm-project/pull/105549
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] GFX12 VMEM loads can write VGPR results out of order (PR #105549)

2024-08-22 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

### Merge activity

* **Aug 22, 6:34 AM EDT**: @jayfoad started a stack merge that includes this 
pull request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/105549).


https://github.com/llvm/llvm-project/pull/105549
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [AMDGPU] Remove one case of vmcnt loop header flushing for GFX12 (#105550) (PR #105808)

2024-08-23 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

I'm not sure if I should have done three different backport requests for the 
three commits. It could be confusing if they get squash-and-merged onto the 
release branch.

https://github.com/llvm/llvm-project/pull/105808
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Fix sign confusion in performMulLoHiCombine (PR #106977)

2024-09-02 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad milestoned 
https://github.com/llvm/llvm-project/pull/106977
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Fix sign confusion in performMulLoHiCombine (PR #106977)

2024-09-02 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad created 
https://github.com/llvm/llvm-project/pull/106977

SMUL_LOHI and UMUL_LOHI are different operations because the high part of the 
result is different, so it is not OK to optimize the signed version to 
MUL_U24/MULHI_U24 or the unsigned version to MUL_I24/MULHI_I24.

>From 04226baceb4e2823a7ca3daac236f705b3c6c33e Mon Sep 17 00:00:00 2001
From: Jay Foad 
Date: Tue, 27 Aug 2024 17:09:40 +0100
Subject: [PATCH] [AMDGPU] Fix sign confusion in performMulLoHiCombine
 (#105831)

SMUL_LOHI and UMUL_LOHI are different operations because the high part
of the result is different, so it is not OK to optimize the signed
version to MUL_U24/MULHI_U24 or the unsigned version to
MUL_I24/MULHI_I24.
---
 llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | 30 +++---
 llvm/test/CodeGen/AMDGPU/mul_int24.ll | 98 +++
 2 files changed, 116 insertions(+), 12 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
index 39ae7c96cf7729..a71c9453d968dd 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
@@ -4349,6 +4349,7 @@ AMDGPUTargetLowering::performMulLoHiCombine(SDNode *N,
   SelectionDAG &DAG = DCI.DAG;
   SDLoc DL(N);
 
+  bool Signed = N->getOpcode() == ISD::SMUL_LOHI;
   SDValue N0 = N->getOperand(0);
   SDValue N1 = N->getOperand(1);
 
@@ -4363,20 +4364,25 @@ AMDGPUTargetLowering::performMulLoHiCombine(SDNode *N,
 
   // Try to use two fast 24-bit multiplies (one for each half of the result)
   // instead of one slow extending multiply.
-  unsigned LoOpcode, HiOpcode;
-  if (Subtarget->hasMulU24() && isU24(N0, DAG) && isU24(N1, DAG)) {
-N0 = DAG.getZExtOrTrunc(N0, DL, MVT::i32);
-N1 = DAG.getZExtOrTrunc(N1, DL, MVT::i32);
-LoOpcode = AMDGPUISD::MUL_U24;
-HiOpcode = AMDGPUISD::MULHI_U24;
-  } else if (Subtarget->hasMulI24() && isI24(N0, DAG) && isI24(N1, DAG)) {
-N0 = DAG.getSExtOrTrunc(N0, DL, MVT::i32);
-N1 = DAG.getSExtOrTrunc(N1, DL, MVT::i32);
-LoOpcode = AMDGPUISD::MUL_I24;
-HiOpcode = AMDGPUISD::MULHI_I24;
+  unsigned LoOpcode = 0;
+  unsigned HiOpcode = 0;
+  if (Signed) {
+if (Subtarget->hasMulI24() && isI24(N0, DAG) && isI24(N1, DAG)) {
+  N0 = DAG.getSExtOrTrunc(N0, DL, MVT::i32);
+  N1 = DAG.getSExtOrTrunc(N1, DL, MVT::i32);
+  LoOpcode = AMDGPUISD::MUL_I24;
+  HiOpcode = AMDGPUISD::MULHI_I24;
+}
   } else {
-return SDValue();
+if (Subtarget->hasMulU24() && isU24(N0, DAG) && isU24(N1, DAG)) {
+  N0 = DAG.getZExtOrTrunc(N0, DL, MVT::i32);
+  N1 = DAG.getZExtOrTrunc(N1, DL, MVT::i32);
+  LoOpcode = AMDGPUISD::MUL_U24;
+  HiOpcode = AMDGPUISD::MULHI_U24;
+}
   }
+  if (!LoOpcode)
+return SDValue();
 
   SDValue Lo = DAG.getNode(LoOpcode, DL, MVT::i32, N0, N1);
   SDValue Hi = DAG.getNode(HiOpcode, DL, MVT::i32, N0, N1);
diff --git a/llvm/test/CodeGen/AMDGPU/mul_int24.ll 
b/llvm/test/CodeGen/AMDGPU/mul_int24.ll
index be77a10380c49b..8f4c48fae6fb31 100644
--- a/llvm/test/CodeGen/AMDGPU/mul_int24.ll
+++ b/llvm/test/CodeGen/AMDGPU/mul_int24.ll
@@ -813,4 +813,102 @@ bb7:
   ret void
 
 }
+
+define amdgpu_kernel void @test_umul_i24(ptr addrspace(1) %out, i32 %arg) {
+; SI-LABEL: test_umul_i24:
+; SI:   ; %bb.0:
+; SI-NEXT:s_load_dword s1, s[2:3], 0xb
+; SI-NEXT:v_mov_b32_e32 v0, 0xff803fe1
+; SI-NEXT:s_mov_b32 s0, 0
+; SI-NEXT:s_mov_b32 s3, 0xf000
+; SI-NEXT:s_waitcnt lgkmcnt(0)
+; SI-NEXT:s_lshr_b32 s1, s1, 9
+; SI-NEXT:v_mul_hi_u32 v0, s1, v0
+; SI-NEXT:s_mul_i32 s1, s1, 0xff803fe1
+; SI-NEXT:v_alignbit_b32 v0, v0, s1, 1
+; SI-NEXT:s_mov_b32 s2, -1
+; SI-NEXT:s_mov_b32 s1, s0
+; SI-NEXT:buffer_store_dword v0, off, s[0:3], 0
+; SI-NEXT:s_endpgm
+;
+; VI-LABEL: test_umul_i24:
+; VI:   ; %bb.0:
+; VI-NEXT:s_load_dword s0, s[2:3], 0x2c
+; VI-NEXT:v_mov_b32_e32 v0, 0xff803fe1
+; VI-NEXT:s_mov_b32 s3, 0xf000
+; VI-NEXT:s_mov_b32 s2, -1
+; VI-NEXT:s_waitcnt lgkmcnt(0)
+; VI-NEXT:s_lshr_b32 s0, s0, 9
+; VI-NEXT:v_mad_u64_u32 v[0:1], s[0:1], s0, v0, 0
+; VI-NEXT:s_mov_b32 s0, 0
+; VI-NEXT:s_mov_b32 s1, s0
+; VI-NEXT:v_alignbit_b32 v0, v1, v0, 1
+; VI-NEXT:s_nop 1
+; VI-NEXT:buffer_store_dword v0, off, s[0:3], 0
+; VI-NEXT:s_endpgm
+;
+; GFX9-LABEL: test_umul_i24:
+; GFX9:   ; %bb.0:
+; GFX9-NEXT:s_load_dword s1, s[2:3], 0x2c
+; GFX9-NEXT:s_mov_b32 s0, 0
+; GFX9-NEXT:s_mov_b32 s3, 0xf000
+; GFX9-NEXT:s_mov_b32 s2, -1
+; GFX9-NEXT:s_waitcnt lgkmcnt(0)
+; GFX9-NEXT:s_lshr_b32 s1, s1, 9
+; GFX9-NEXT:s_mul_hi_u32 s4, s1, 0xff803fe1
+; GFX9-NEXT:s_mul_i32 s1, s1, 0xff803fe1
+; GFX9-NEXT:v_mov_b32_e32 v0, s1
+; GFX9-NEXT:v_alignbit_b32 v0, s4, v0, 1
+; GFX9-NEXT:s_mov_b32 s1, s0
+; GFX9-NEXT:buffer_store_dword v0, off, s[0:3], 0
+; GFX9-NEXT:s_endpgm
+;
+; EG-LABEL: test_umul_i24:
+; EG:   ; %bb.0:
+; EG-

[llvm-branch-commits] [llvm] [AMDGPU] Fix sign confusion in performMulLoHiCombine (PR #106977)

2024-09-02 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

This is a backport of #105831.

https://github.com/llvm/llvm-project/pull/106977
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Fix sign confusion in performMulLoHiCombine (PR #106977)

2024-09-02 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad edited 
https://github.com/llvm/llvm-project/pull/106977
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [X86] Avoid generating nested CALLSEQ for TLS pointer function arguments (PR #106965)

2024-09-05 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

> > This sounds sketchy to me. Is it really valid to enter a second call inside 
> > another call's CALLSEQ markers, but only if we avoid adding a second nested 
> > set of markers? It feels like attacking the symptom of the issue, but not 
> > the root cause. (I'm not certain it's _not_ valid, but it just seems really 
> > suspicious...)
> 
> From what I've gathered from the source comments and the 
> [patch](https://github.com/llvm/llvm-project/commit/228978c0dcfc9a9793f3dc8a69f42471192223bc)
>  introducing the code that inserts these CALLSEQ markers for TLSADDRs, their 
> only point here is to stop shrink-wrapping from moving the function 
> prologue/epilogue past the call to get the TLS address. This should also be 
> given when the TLSADDR is in another CALLSEQ.
> 
> I am however by no means an expert on this topic; I'd appreciate more 
> insights on which uses of CALLSEQ markers are and are not valid (besides the 
> MachineVerifier checks).

I also wondered about this. Are there other mechanisms that block shrink 
wrapping from moving the prologue? E.g. what if a regular instruction (not a 
call) has to come after the prologue, how would that be marked? Maybe adding an 
implicit use or def of some particular physical register would be enough??

https://github.com/llvm/llvm-project/pull/106965
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Fix sign confusion in performMulLoHiCombine (PR #106977)

2024-09-10 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

> Is this PR a fix for a regression or a critical issue?

No, I believe it has been broken for about 3 years (since 
d7e03df719464354b20a845b7853be57da863924) but it was only reported to me 
recently.

I guess this means it is not appropriate for 19.1.0.

https://github.com/llvm/llvm-project/pull/106977
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Fix sign confusion in performMulLoHiCombine (PR #106977)

2024-09-10 Thread Jay Foad via llvm-branch-commits

jayfoad wrote:

> > Is this PR a fix for a regression or a critical issue?
> 
> No, I believe it has been broken for about 3 years (since 
> [d7e03df](https://github.com/llvm/llvm-project/commit/d7e03df719464354b20a845b7853be57da863924))
>  but it was only reported to me recently.
> 
> I guess this means it is not appropriate for 19.1.0.

Cc @marekolsak FYI.

https://github.com/llvm/llvm-project/pull/106977
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] a1cba5b - [SelectionDAG] Make use of KnownBits::commonBits. NFC.

2021-01-14 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2021-01-14T14:02:43Z
New Revision: a1cba5b7a1fb09d2d4082967e2466a5a89ed698a

URL: 
https://github.com/llvm/llvm-project/commit/a1cba5b7a1fb09d2d4082967e2466a5a89ed698a
DIFF: 
https://github.com/llvm/llvm-project/commit/a1cba5b7a1fb09d2d4082967e2466a5a89ed698a.diff

LOG: [SelectionDAG] Make use of KnownBits::commonBits. NFC.

Differential Revision: https://reviews.llvm.org/D94587

Added: 


Modified: 
llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

Removed: 




diff  --git a/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp 
b/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
index 669bca966a7d..0b830f462c90 100644
--- a/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
@@ -509,8 +509,7 @@ void FunctionLoweringInfo::ComputePHILiveOutRegInfo(const 
PHINode *PN) {
   return;
 }
 DestLOI.NumSignBits = std::min(DestLOI.NumSignBits, SrcLOI->NumSignBits);
-DestLOI.Known.Zero &= SrcLOI->Known.Zero;
-DestLOI.Known.One &= SrcLOI->Known.One;
+DestLOI.Known = KnownBits::commonBits(DestLOI.Known, SrcLOI->Known);
   }
 }
 

diff  --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp 
b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 7ea0b09ef9c9..173e45a4b18e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -1016,10 +1016,8 @@ bool TargetLowering::SimplifyDemandedBits(
  Depth + 1))
   return true;
 
-if (!!DemandedVecElts) {
-  Known.One &= KnownVec.One;
-  Known.Zero &= KnownVec.Zero;
-}
+if (!!DemandedVecElts)
+  Known = KnownBits::commonBits(Known, KnownVec);
 
 return false;
   }
@@ -1044,14 +1042,10 @@ bool TargetLowering::SimplifyDemandedBits(
 
 Known.Zero.setAllBits();
 Known.One.setAllBits();
-if (!!DemandedSubElts) {
-  Known.One &= KnownSub.One;
-  Known.Zero &= KnownSub.Zero;
-}
-if (!!DemandedSrcElts) {
-  Known.One &= KnownSrc.One;
-  Known.Zero &= KnownSrc.Zero;
-}
+if (!!DemandedSubElts)
+  Known = KnownBits::commonBits(Known, KnownSub);
+if (!!DemandedSrcElts)
+  Known = KnownBits::commonBits(Known, KnownSrc);
 
 // Attempt to avoid multi-use src if we don't need anything from it.
 if (!DemandedBits.isAllOnesValue() || !DemandedSubElts.isAllOnesValue() ||
@@ -1108,10 +1102,8 @@ bool TargetLowering::SimplifyDemandedBits(
Known2, TLO, Depth + 1))
 return true;
   // Known bits are shared by every demanded subvector element.
-  if (!!DemandedSubElts) {
-Known.One &= Known2.One;
-Known.Zero &= Known2.Zero;
-  }
+  if (!!DemandedSubElts)
+Known = KnownBits::commonBits(Known, Known2);
 }
 break;
   }
@@ -1149,15 +1141,13 @@ bool TargetLowering::SimplifyDemandedBits(
 if (SimplifyDemandedBits(Op0, DemandedBits, DemandedLHS, Known2, TLO,
  Depth + 1))
   return true;
-Known.One &= Known2.One;
-Known.Zero &= Known2.Zero;
+Known = KnownBits::commonBits(Known, Known2);
   }
   if (!!DemandedRHS) {
 if (SimplifyDemandedBits(Op1, DemandedBits, DemandedRHS, Known2, TLO,
  Depth + 1))
   return true;
-Known.One &= Known2.One;
-Known.Zero &= Known2.Zero;
+Known = KnownBits::commonBits(Known, Known2);
   }
 
   // Attempt to avoid multi-use ops if we don't need anything from them.
@@ -1384,8 +1374,7 @@ bool TargetLowering::SimplifyDemandedBits(
   return true;
 
 // Only known if known in both the LHS and RHS.
-Known.One &= Known2.One;
-Known.Zero &= Known2.Zero;
+Known = KnownBits::commonBits(Known, Known2);
 break;
   case ISD::SELECT_CC:
 if (SimplifyDemandedBits(Op.getOperand(3), DemandedBits, Known, TLO,
@@ -1402,8 +1391,7 @@ bool TargetLowering::SimplifyDemandedBits(
   return true;
 
 // Only known if known in both the LHS and RHS.
-Known.One &= Known2.One;
-Known.Zero &= Known2.Zero;
+Known = KnownBits::commonBits(Known, Known2);
 break;
   case ISD::SETCC: {
 SDValue Op0 = Op.getOperand(0);



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 517196e - [Analysis, CodeGen] Make use of KnownBits::makeConstant. NFC.

2021-01-14 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2021-01-14T14:02:43Z
New Revision: 517196e569129677be32d6ebcfa57bac552268a4

URL: 
https://github.com/llvm/llvm-project/commit/517196e569129677be32d6ebcfa57bac552268a4
DIFF: 
https://github.com/llvm/llvm-project/commit/517196e569129677be32d6ebcfa57bac552268a4.diff

LOG: [Analysis,CodeGen] Make use of KnownBits::makeConstant. NFC.

Differential Revision: https://reviews.llvm.org/D94588

Added: 


Modified: 
llvm/lib/Analysis/ValueTracking.cpp
llvm/lib/CodeGen/GlobalISel/GISelKnownBits.cpp
llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

Removed: 




diff  --git a/llvm/lib/Analysis/ValueTracking.cpp 
b/llvm/lib/Analysis/ValueTracking.cpp
index b138caa05610..61c992d0eedf 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -1337,8 +1337,8 @@ static void computeKnownBitsFromOperator(const Operator 
*I,
 AccConstIndices += IndexConst.sextOrTrunc(BitWidth);
 continue;
   } else {
-ScalingFactor.Zero = ~TypeSizeInBytes;
-ScalingFactor.One = TypeSizeInBytes;
+ScalingFactor =
+KnownBits::makeConstant(APInt(IndexBitWidth, TypeSizeInBytes));
   }
   IndexBits = KnownBits::computeForMul(IndexBits, ScalingFactor);
 
@@ -1353,9 +1353,7 @@ static void computeKnownBitsFromOperator(const Operator 
*I,
   /*Add=*/true, /*NSW=*/false, Known, IndexBits);
 }
 if (!Known.isUnknown() && !AccConstIndices.isNullValue()) {
-  KnownBits Index(BitWidth);
-  Index.Zero = ~AccConstIndices;
-  Index.One = AccConstIndices;
+  KnownBits Index = KnownBits::makeConstant(AccConstIndices);
   Known = KnownBits::computeForAddSub(
   /*Add=*/true, /*NSW=*/false, Known, Index);
 }
@@ -1818,8 +1816,7 @@ void computeKnownBits(const Value *V, const APInt 
&DemandedElts,
   const APInt *C;
   if (match(V, m_APInt(C))) {
 // We know all of the bits for a scalar constant or a splat vector 
constant!
-Known.One = *C;
-Known.Zero = ~Known.One;
+Known = KnownBits::makeConstant(*C);
 return;
   }
   // Null and aggregate-zero are all-zeros.

diff  --git a/llvm/lib/CodeGen/GlobalISel/GISelKnownBits.cpp 
b/llvm/lib/CodeGen/GlobalISel/GISelKnownBits.cpp
index 64c7fb486493..aac7a73e858f 100644
--- a/llvm/lib/CodeGen/GlobalISel/GISelKnownBits.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/GISelKnownBits.cpp
@@ -217,8 +217,7 @@ void GISelKnownBits::computeKnownBitsImpl(Register R, 
KnownBits &Known,
 auto CstVal = getConstantVRegVal(R, MRI);
 if (!CstVal)
   break;
-Known.One = *CstVal;
-Known.Zero = ~Known.One;
+Known = KnownBits::makeConstant(*CstVal);
 break;
   }
   case TargetOpcode::G_FRAME_INDEX: {

diff  --git a/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp 
b/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
index 0b830f462c90..32a4f60df097 100644
--- a/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
@@ -458,8 +458,7 @@ void FunctionLoweringInfo::ComputePHILiveOutRegInfo(const 
PHINode *PN) {
   if (ConstantInt *CI = dyn_cast(V)) {
 APInt Val = CI->getValue().zextOrTrunc(BitWidth);
 DestLOI.NumSignBits = Val.getNumSignBits();
-DestLOI.Known.Zero = ~Val;
-DestLOI.Known.One = Val;
+DestLOI.Known = KnownBits::makeConstant(Val);
   } else {
 assert(ValueMap.count(V) && "V should have been placed in ValueMap when 
its"
 "CopyToReg node was created.");

diff  --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index e080408bbe42..7084ab68524b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -3134,13 +3134,10 @@ KnownBits SelectionDAG::computeKnownBits(SDValue Op, 
const APInt &DemandedElts,
   }
 } else if (BitWidth == CstTy->getPrimitiveSizeInBits()) {
   if (auto *CInt = dyn_cast(Cst)) {
-const APInt &Value = CInt->getValue();
-Known.One = Value;
-Known.Zero = ~Value;
+Known = KnownBits::makeConstant(CInt->getValue());
   } else if (auto *CFP = dyn_cast(Cst)) {
-APInt Value = CFP->getValueAPF().bitcastToAPInt();
-Known.One = Value;
-Known.Zero = ~Value;
+Known =
+KnownBits::makeConstant(CFP->getValueAPF().bitcastToAPInt());
   }
 }
   }

diff  --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp 
b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 173e45a4b18e..6ae0a39962b3 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -912,15 +912,14 @@ boo

[llvm-branch-commits] [llvm] 90b310f - [Support] Simplify KnownBits::icmp helpers. NFC.

2021-01-14 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2021-01-14T14:02:43Z
New Revision: 90b310f6caf0b356075c70407c338b3c751eebb3

URL: 
https://github.com/llvm/llvm-project/commit/90b310f6caf0b356075c70407c338b3c751eebb3
DIFF: 
https://github.com/llvm/llvm-project/commit/90b310f6caf0b356075c70407c338b3c751eebb3.diff

LOG: [Support] Simplify KnownBits::icmp helpers. NFC.

Remove some special cases that aren't really any simpler than the
general case.

Differential Revision: https://reviews.llvm.org/D94595

Added: 


Modified: 
llvm/lib/Support/KnownBits.cpp

Removed: 




diff  --git a/llvm/lib/Support/KnownBits.cpp b/llvm/lib/Support/KnownBits.cpp
index 0147d21d153a..0f36c6a9ef1d 100644
--- a/llvm/lib/Support/KnownBits.cpp
+++ b/llvm/lib/Support/KnownBits.cpp
@@ -271,9 +271,6 @@ KnownBits KnownBits::ashr(const KnownBits &LHS, const 
KnownBits &RHS) {
 Optional KnownBits::eq(const KnownBits &LHS, const KnownBits &RHS) {
   if (LHS.isConstant() && RHS.isConstant())
 return Optional(LHS.getConstant() == RHS.getConstant());
-  if (LHS.getMaxValue().ult(RHS.getMinValue()) ||
-  LHS.getMinValue().ugt(RHS.getMaxValue()))
-return Optional(false);
   if (LHS.One.intersects(RHS.Zero) || RHS.One.intersects(LHS.Zero))
 return Optional(false);
   return None;
@@ -286,8 +283,6 @@ Optional KnownBits::ne(const KnownBits &LHS, const 
KnownBits &RHS) {
 }
 
 Optional KnownBits::ugt(const KnownBits &LHS, const KnownBits &RHS) {
-  if (LHS.isConstant() && RHS.isConstant())
-return Optional(LHS.getConstant().ugt(RHS.getConstant()));
   // LHS >u RHS -> false if umax(LHS) <= umax(RHS)
   if (LHS.getMaxValue().ule(RHS.getMinValue()))
 return Optional(false);
@@ -312,8 +307,6 @@ Optional KnownBits::ule(const KnownBits &LHS, const 
KnownBits &RHS) {
 }
 
 Optional KnownBits::sgt(const KnownBits &LHS, const KnownBits &RHS) {
-  if (LHS.isConstant() && RHS.isConstant())
-return Optional(LHS.getConstant().sgt(RHS.getConstant()));
   // LHS >s RHS -> false if smax(LHS) <= smax(RHS)
   if (LHS.getSignedMaxValue().sle(RHS.getSignedMinValue()))
 return Optional(false);



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 868da2e - [SelectionDAG] Remove an early-out from computeKnownBits for smin/smax

2021-01-14 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2021-01-14T18:15:17Z
New Revision: 868da2ea939baf8c71a6dcb878cf6094ede9486e

URL: 
https://github.com/llvm/llvm-project/commit/868da2ea939baf8c71a6dcb878cf6094ede9486e
DIFF: 
https://github.com/llvm/llvm-project/commit/868da2ea939baf8c71a6dcb878cf6094ede9486e.diff

LOG: [SelectionDAG] Remove an early-out from computeKnownBits for smin/smax

Even if we know nothing about LHS, it can still be useful to know that
smax(LHS, RHS) >= RHS and smin(LHS, RHS) <= RHS.

Differential Revision: https://reviews.llvm.org/D87145

Added: 


Modified: 
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
llvm/test/CodeGen/X86/known-bits-vector.ll

Removed: 




diff  --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 7084ab68524b5..82da553954d2f 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -3416,7 +3416,6 @@ KnownBits SelectionDAG::computeKnownBits(SDValue Op, 
const APInt &DemandedElts,
 }
 
 Known = computeKnownBits(Op.getOperand(0), DemandedElts, Depth + 1);
-if (Known.isUnknown()) break; // Early-out
 Known2 = computeKnownBits(Op.getOperand(1), DemandedElts, Depth + 1);
 if (IsMax)
   Known = KnownBits::smax(Known, Known2);

diff  --git a/llvm/test/CodeGen/X86/known-bits-vector.ll 
b/llvm/test/CodeGen/X86/known-bits-vector.ll
index 3b6912a9d9461..05bf984101abc 100644
--- a/llvm/test/CodeGen/X86/known-bits-vector.ll
+++ b/llvm/test/CodeGen/X86/known-bits-vector.ll
@@ -435,11 +435,7 @@ define <4 x float> @knownbits_smax_smin_shuffle_uitofp(<4 
x i32> %a0) {
 ; X32-NEXT:vpminsd {{\.LCPI.*}}, %xmm0, %xmm0
 ; X32-NEXT:vpmaxsd {{\.LCPI.*}}, %xmm0, %xmm0
 ; X32-NEXT:vpshufd {{.*#+}} xmm0 = xmm0[0,0,3,3]
-; X32-NEXT:vpblendw {{.*#+}} xmm1 = 
xmm0[0],mem[1],xmm0[2],mem[3],xmm0[4],mem[5],xmm0[6],mem[7]
-; X32-NEXT:vpsrld $16, %xmm0, %xmm0
-; X32-NEXT:vpblendw {{.*#+}} xmm0 = 
xmm0[0],mem[1],xmm0[2],mem[3],xmm0[4],mem[5],xmm0[6],mem[7]
-; X32-NEXT:vsubps {{\.LCPI.*}}, %xmm0, %xmm0
-; X32-NEXT:vaddps %xmm0, %xmm1, %xmm0
+; X32-NEXT:vcvtdq2ps %xmm0, %xmm0
 ; X32-NEXT:retl
 ;
 ; X64-LABEL: knownbits_smax_smin_shuffle_uitofp:
@@ -447,11 +443,7 @@ define <4 x float> @knownbits_smax_smin_shuffle_uitofp(<4 
x i32> %a0) {
 ; X64-NEXT:vpminsd {{.*}}(%rip), %xmm0, %xmm0
 ; X64-NEXT:vpmaxsd {{.*}}(%rip), %xmm0, %xmm0
 ; X64-NEXT:vpshufd {{.*#+}} xmm0 = xmm0[0,0,3,3]
-; X64-NEXT:vpblendw {{.*#+}} xmm1 = 
xmm0[0],mem[1],xmm0[2],mem[3],xmm0[4],mem[5],xmm0[6],mem[7]
-; X64-NEXT:vpsrld $16, %xmm0, %xmm0
-; X64-NEXT:vpblendw {{.*#+}} xmm0 = 
xmm0[0],mem[1],xmm0[2],mem[3],xmm0[4],mem[5],xmm0[6],mem[7]
-; X64-NEXT:vsubps {{.*}}(%rip), %xmm0, %xmm0
-; X64-NEXT:vaddps %xmm0, %xmm1, %xmm0
+; X64-NEXT:vcvtdq2ps %xmm0, %xmm0
 ; X64-NEXT:retq
   %1 = call <4 x i32> @llvm.x86.sse41.pminsd(<4 x i32> %a0, <4 x i32> )
   %2 = call <4 x i32> @llvm.x86.sse41.pmaxsd(<4 x i32> %1, <4 x i32> )



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 49dce85 - [AMDGPU] Simplify AMDGPUInstPrinter::printExpSrcN. NFC.

2021-01-19 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2021-01-19T10:39:56Z
New Revision: 49dce85584e34ee7fb973da9ba617169fd0f103c

URL: 
https://github.com/llvm/llvm-project/commit/49dce85584e34ee7fb973da9ba617169fd0f103c
DIFF: 
https://github.com/llvm/llvm-project/commit/49dce85584e34ee7fb973da9ba617169fd0f103c.diff

LOG: [AMDGPU] Simplify AMDGPUInstPrinter::printExpSrcN. NFC.

Change-Id: Idd7f47647bc0faa3ad6f61f44728c0f20540ec00

Added: 


Modified: 
llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp 
b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
index 574fba62f5f3..fcca32abdd5a 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
@@ -958,10 +958,9 @@ void AMDGPUInstPrinter::printSDWADstUnused(const MCInst 
*MI, unsigned OpNo,
   }
 }
 
-template 
 void AMDGPUInstPrinter::printExpSrcN(const MCInst *MI, unsigned OpNo,
- const MCSubtargetInfo &STI,
- raw_ostream &O) {
+ const MCSubtargetInfo &STI, raw_ostream 
&O,
+ unsigned N) {
   unsigned Opc = MI->getOpcode();
   int EnIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::en);
   unsigned En = MI->getOperand(EnIdx).getImm();
@@ -969,12 +968,8 @@ void AMDGPUInstPrinter::printExpSrcN(const MCInst *MI, 
unsigned OpNo,
   int ComprIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::compr);
 
   // If compr is set, print as src0, src0, src1, src1
-  if (MI->getOperand(ComprIdx).getImm()) {
-if (N == 1 || N == 2)
-  --OpNo;
-else if (N == 3)
-  OpNo -= 2;
-  }
+  if (MI->getOperand(ComprIdx).getImm())
+OpNo = OpNo - N + N / 2;
 
   if (En & (1 << N))
 printRegOperand(MI->getOperand(OpNo).getReg(), O, MRI);
@@ -985,25 +980,25 @@ void AMDGPUInstPrinter::printExpSrcN(const MCInst *MI, 
unsigned OpNo,
 void AMDGPUInstPrinter::printExpSrc0(const MCInst *MI, unsigned OpNo,
  const MCSubtargetInfo &STI,
  raw_ostream &O) {
-  printExpSrcN<0>(MI, OpNo, STI, O);
+  printExpSrcN(MI, OpNo, STI, O, 0);
 }
 
 void AMDGPUInstPrinter::printExpSrc1(const MCInst *MI, unsigned OpNo,
  const MCSubtargetInfo &STI,
  raw_ostream &O) {
-  printExpSrcN<1>(MI, OpNo, STI, O);
+  printExpSrcN(MI, OpNo, STI, O, 1);
 }
 
 void AMDGPUInstPrinter::printExpSrc2(const MCInst *MI, unsigned OpNo,
  const MCSubtargetInfo &STI,
  raw_ostream &O) {
-  printExpSrcN<2>(MI, OpNo, STI, O);
+  printExpSrcN(MI, OpNo, STI, O, 2);
 }
 
 void AMDGPUInstPrinter::printExpSrc3(const MCInst *MI, unsigned OpNo,
  const MCSubtargetInfo &STI,
  raw_ostream &O) {
-  printExpSrcN<3>(MI, OpNo, STI, O);
+  printExpSrcN(MI, OpNo, STI, O, 3);
 }
 
 void AMDGPUInstPrinter::printExpTgt(const MCInst *MI, unsigned OpNo,

diff  --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h 
b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h
index 64ccb9092ec4..8d13aa682211 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.h
@@ -179,10 +179,8 @@ class AMDGPUInstPrinter : public MCInstPrinter {
   void printDefaultVccOperand(unsigned OpNo, const MCSubtargetInfo &STI,
   raw_ostream &O);
 
-
-  template 
-  void printExpSrcN(const MCInst *MI, unsigned OpNo,
-const MCSubtargetInfo &STI, raw_ostream &O);
+  void printExpSrcN(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo 
&STI,
+raw_ostream &O, unsigned N);
   void printExpSrc0(const MCInst *MI, unsigned OpNo,
 const MCSubtargetInfo &STI, raw_ostream &O);
   void printExpSrc1(const MCInst *MI, unsigned OpNo,



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] de2f942 - [AMDGPU] Simplify test case for D94010

2021-01-19 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2021-01-19T16:36:43Z
New Revision: de2f9423995d52a5457752256815dc54d317c8d1

URL: 
https://github.com/llvm/llvm-project/commit/de2f9423995d52a5457752256815dc54d317c8d1
DIFF: 
https://github.com/llvm/llvm-project/commit/de2f9423995d52a5457752256815dc54d317c8d1.diff

LOG: [AMDGPU] Simplify test case for D94010

Added: 


Modified: 
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll

Removed: 




diff  --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll
index 03584312e2af..8df0215a6fe2 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll
@@ -10,7 +10,6 @@ define float @v_fma(float %a, float %b, float %c)  {
 ; GCN-NEXT:v_fmac_legacy_f32_e64 v2, v0, v1
 ; GCN-NEXT:v_mov_b32_e32 v0, v2
 ; GCN-NEXT:s_setpc_b64 s[30:31]
-;
   %fma = call float @llvm.amdgcn.fma.legacy(float %a, float %b, float %c)
   ret float %fma
 }
@@ -22,7 +21,6 @@ define float @v_fabs_fma(float %a, float %b, float %c)  {
 ; GCN-NEXT:s_waitcnt_vscnt null, 0x0
 ; GCN-NEXT:v_fma_legacy_f32 v0, |v0|, v1, v2
 ; GCN-NEXT:s_setpc_b64 s[30:31]
-;
   %fabs.a = call float @llvm.fabs.f32(float %a)
   %fma = call float @llvm.amdgcn.fma.legacy(float %fabs.a, float %b, float %c)
   ret float %fma
@@ -35,7 +33,6 @@ define float @v_fneg_fabs_fma(float %a, float %b, float %c)  {
 ; GCN-NEXT:s_waitcnt_vscnt null, 0x0
 ; GCN-NEXT:v_fma_legacy_f32 v0, v0, -|v1|, v2
 ; GCN-NEXT:s_setpc_b64 s[30:31]
-;
   %fabs.b = call float @llvm.fabs.f32(float %b)
   %neg.fabs.b = fneg float %fabs.b
   %fma = call float @llvm.amdgcn.fma.legacy(float %a, float %neg.fabs.b, float 
%c)
@@ -49,92 +46,21 @@ define float @v_fneg_fma(float %a, float %b, float %c)  {
 ; GCN-NEXT:s_waitcnt_vscnt null, 0x0
 ; GCN-NEXT:v_fma_legacy_f32 v0, v0, v1, -v2
 ; GCN-NEXT:s_setpc_b64 s[30:31]
-;
   %neg.c = fneg float %c
   %fma = call float @llvm.amdgcn.fma.legacy(float %a, float %b, float %neg.c)
   ret float %fma
 }
 
-define amdgpu_ps <{ i32, i32, i32, i32, i32, float, float, float, float, 
float, float, float, float, float, float, float, float, float, float, float }> 
@main(<4 x i32> addrspace(6)* inreg noalias align 32 
dereferenceable(18446744073709551615) %arg, <8 x i32> addrspace(6)* inreg 
noalias align 32 dereferenceable(18446744073709551615) %arg1, <4 x i32> 
addrspace(6)* inreg noalias align 32 dereferenceable(18446744073709551615) 
%arg2, <8 x i32> addrspace(6)* inreg noalias align 32 
dereferenceable(18446744073709551615) %arg3, i32 inreg %arg4, i32 inreg %arg5, 
<2 x i32> %arg6, <2 x i32> %arg7, <2 x i32> %arg8, <3 x i32> %arg9, <2 x i32> 
%arg10, <2 x i32> %arg11, <2 x i32> %arg12, <3 x float> %arg13, float %arg14, 
float %arg15, float %arg16, float %arg17, i32 %arg18, i32 %arg19, float %arg20, 
i32 %arg21) #0 {
-; SDAG-LABEL: main:
-; SDAG:   ; %bb.0:
-; SDAG-NEXT:s_mov_b32 s16, exec_lo
-; SDAG-NEXT:v_mov_b32_e32 v14, v2
-; SDAG-NEXT:s_mov_b32 s0, s5
-; SDAG-NEXT:s_wqm_b32 exec_lo, exec_lo
-; SDAG-NEXT:s_mov_b32 s1, 0
-; SDAG-NEXT:s_mov_b32 m0, s7
-; SDAG-NEXT:s_clause 0x1
-; SDAG-NEXT:s_load_dwordx8 s[8:15], s[0:1], 0x400
-; SDAG-NEXT:s_load_dwordx4 s[0:3], s[0:1], 0x430
-; SDAG-NEXT:v_interp_p1_f32_e32 v2, v0, attr0.x
-; SDAG-NEXT:v_interp_p1_f32_e32 v3, v0, attr0.y
-; SDAG-NEXT:s_mov_b32 s4, s6
-; SDAG-NEXT:v_interp_p2_f32_e32 v2, v1, attr0.x
-; SDAG-NEXT:v_interp_p2_f32_e32 v3, v1, attr0.y
-; SDAG-NEXT:s_and_b32 exec_lo, exec_lo, s16
-; SDAG-NEXT:s_waitcnt lgkmcnt(0)
-; SDAG-NEXT:image_sample v[0:3], v[2:3], s[8:15], s[0:3] dmask:0xf 
dim:SQ_RSRC_IMG_2D
-; SDAG-NEXT:s_waitcnt vmcnt(0)
-; SDAG-NEXT:v_fma_legacy_f32 v0, v0, 2.0, -1.0
-; SDAG-NEXT:v_fma_legacy_f32 v1, v1, 2.0, -1.0
-; SDAG-NEXT:; return to shader part epilog
-;
-; GISEL-LABEL: main:
-; GISEL:   ; %bb.0:
-; GISEL-NEXT:s_mov_b32 s16, exec_lo
-; GISEL-NEXT:s_mov_b32 s4, s6
-; GISEL-NEXT:s_mov_b32 m0, s7
-; GISEL-NEXT:s_wqm_b32 exec_lo, exec_lo
-; GISEL-NEXT:s_add_u32 s0, s5, 0x400
-; GISEL-NEXT:s_mov_b32 s1, 0
-; GISEL-NEXT:v_interp_p1_f32_e32 v3, v0, attr0.y
-; GISEL-NEXT:s_load_dwordx8 s[8:15], s[0:1], 0x0
-; GISEL-NEXT:s_add_u32 s0, s5, 0x430
-; GISEL-NEXT:v_mov_b32_e32 v14, v2
-; GISEL-NEXT:s_load_dwordx4 s[0:3], s[0:1], 0x0
-; GISEL-NEXT:v_interp_p1_f32_e32 v2, v0, attr0.x
-; GISEL-NEXT:v_interp_p2_f32_e32 v3, v1, attr0.y
-; GISEL-NEXT:v_interp_p2_f32_e32 v2, v1, attr0.x
-; GISEL-NEXT:s_and_b32 exec_lo, exec_lo, s16
-; GISEL-NEXT:s_waitcnt lgkmcnt(0)
-; GISEL-NEXT:image_sample v[0:3], v[2:3], s[8:15], s[0:3] dmask:0xf 
dim:SQ_RSRC_IMG_2D
-; GISEL-NEXT:s_waitcnt vmcnt(0)
-; GISEL-NEXT:v_fma_legacy_f32 v0, v0, 2.0, -1.0
-; GISEL-NEXT:v_fma_legacy_f32 v1, 

[llvm-branch-commits] [llvm] 0808c70 - [AMDGPU] Fix test case for D94010

2021-01-19 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2021-01-19T16:46:47Z
New Revision: 0808c7009a06773e78772c7b74d254fd3572f0ea

URL: 
https://github.com/llvm/llvm-project/commit/0808c7009a06773e78772c7b74d254fd3572f0ea
DIFF: 
https://github.com/llvm/llvm-project/commit/0808c7009a06773e78772c7b74d254fd3572f0ea.diff

LOG: [AMDGPU] Fix test case for D94010

Added: 


Modified: 
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll

Removed: 




diff  --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll 
b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll
index 8df0215a6fe2..5c333f0ce97d 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fma.legacy.ll
@@ -1,6 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1030 < %s | FileCheck 
-check-prefixes=GCN,SDAG %s
-; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1030 < %s | 
FileCheck -check-prefixes=GCN,GISEL %s
+; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1030 < %s | FileCheck 
-check-prefix=GCN %s
+; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1030 < %s | 
FileCheck -check-prefix=GCN %s
 
 define float @v_fma(float %a, float %b, float %c)  {
 ; GCN-LABEL: v_fma:



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 18cb744 - [AMDGPU] Simpler names for arch-specific ttmp registers. NFC.

2021-01-19 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2021-01-19T18:47:14Z
New Revision: 18cb7441b69a22565dcc340bac0e58bc9f301439

URL: 
https://github.com/llvm/llvm-project/commit/18cb7441b69a22565dcc340bac0e58bc9f301439
DIFF: 
https://github.com/llvm/llvm-project/commit/18cb7441b69a22565dcc340bac0e58bc9f301439.diff

LOG: [AMDGPU] Simpler names for arch-specific ttmp registers. NFC.

Rename the *_gfx9_gfx10 ttmp registers to *_gfx9plus for simplicity,
and use the corresponding isGFX9Plus predicate to decide when to use
them instead of the old *_vi versions.

Differential Revision: https://reviews.llvm.org/D94975

Added: 


Modified: 
llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
llvm/lib/Target/AMDGPU/SIDefines.h
llvm/lib/Target/AMDGPU/SIRegisterInfo.td
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp 
b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
index 7f68174e506d..08b340c8fd66 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
@@ -997,8 +997,8 @@ unsigned AMDGPUDisassembler::getTtmpClassId(const OpWidthTy 
Width) const {
 int AMDGPUDisassembler::getTTmpIdx(unsigned Val) const {
   using namespace AMDGPU::EncValues;
 
-  unsigned TTmpMin = isGFX9Plus() ? TTMP_GFX9_GFX10_MIN : TTMP_VI_MIN;
-  unsigned TTmpMax = isGFX9Plus() ? TTMP_GFX9_GFX10_MAX : TTMP_VI_MAX;
+  unsigned TTmpMin = isGFX9Plus() ? TTMP_GFX9PLUS_MIN : TTMP_VI_MIN;
+  unsigned TTmpMax = isGFX9Plus() ? TTMP_GFX9PLUS_MAX : TTMP_VI_MAX;
 
   return (TTmpMin <= Val && Val <= TTmpMax)? Val - TTmpMin : -1;
 }

diff  --git a/llvm/lib/Target/AMDGPU/SIDefines.h 
b/llvm/lib/Target/AMDGPU/SIDefines.h
index b9a2bcf81903..f7555f0453bb 100644
--- a/llvm/lib/Target/AMDGPU/SIDefines.h
+++ b/llvm/lib/Target/AMDGPU/SIDefines.h
@@ -247,8 +247,8 @@ enum : unsigned {
   SGPR_MAX_GFX10 = 105,
   TTMP_VI_MIN = 112,
   TTMP_VI_MAX = 123,
-  TTMP_GFX9_GFX10_MIN = 108,
-  TTMP_GFX9_GFX10_MAX = 123,
+  TTMP_GFX9PLUS_MIN = 108,
+  TTMP_GFX9PLUS_MAX = 123,
   INLINE_INTEGER_C_MIN = 128,
   INLINE_INTEGER_C_POSITIVE_MAX = 192, // 64
   INLINE_INTEGER_C_MAX = 208,

diff  --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td 
b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
index 378fc5df21e5..92390f1f3297 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
@@ -246,9 +246,9 @@ def TMA : RegisterWithSubRegs<"tma", [TMA_LO, TMA_HI]> {
 }
 
 foreach Index = 0...15 in {
-  defm TTMP#Index#_vi : SIRegLoHi16<"ttmp"#Index, !add(112, Index)>;
-  defm TTMP#Index#_gfx9_gfx10 : SIRegLoHi16<"ttmp"#Index, !add(108, Index)>;
-  defm TTMP#Index : SIRegLoHi16<"ttmp"#Index, 0>;
+  defm TTMP#Index#_vi   : SIRegLoHi16<"ttmp"#Index, !add(112, Index)>;
+  defm TTMP#Index#_gfx9plus : SIRegLoHi16<"ttmp"#Index, !add(108, Index)>;
+  defm TTMP#Index   : SIRegLoHi16<"ttmp"#Index, 0>;
 }
 
 multiclass FLAT_SCR_LOHI_m  ci_e, bits<16> vi_e> {
@@ -419,8 +419,8 @@ class TmpRegTuples.ret>;
 
 foreach Index = {0, 2, 4, 6, 8, 10, 12, 14} in {
-  def TTMP#Index#_TTMP#!add(Index,1)#_vi : TmpRegTuples<"_vi",   2, 
Index>;
-  def TTMP#Index#_TTMP#!add(Index,1)#_gfx9_gfx10 : TmpRegTuples<"_gfx9_gfx10", 
2, Index>;
+  def TTMP#Index#_TTMP#!add(Index,1)#_vi   : TmpRegTuples<"_vi",   2, 
Index>;
+  def TTMP#Index#_TTMP#!add(Index,1)#_gfx9plus : TmpRegTuples<"_gfx9plus", 2, 
Index>;
 }
 
 foreach Index = {0, 4, 8, 12} in {
@@ -429,7 +429,7 @@ foreach Index = {0, 4, 8, 12} in {
  _TTMP#!add(Index,3)#_vi : TmpRegTuples<"_vi",   4, Index>;
   def TTMP#Index#_TTMP#!add(Index,1)#
  _TTMP#!add(Index,2)#
- _TTMP#!add(Index,3)#_gfx9_gfx10 : TmpRegTuples<"_gfx9_gfx10", 
4, Index>;
+ _TTMP#!add(Index,3)#_gfx9plus : TmpRegTuples<"_gfx9plus", 4, 
Index>;
 }
 
 foreach Index = {0, 4, 8} in {
@@ -446,7 +446,7 @@ foreach Index = {0, 4, 8} in {
  _TTMP#!add(Index,4)#
  _TTMP#!add(Index,5)#
  _TTMP#!add(Index,6)#
- _TTMP#!add(Index,7)#_gfx9_gfx10 : TmpRegTuples<"_gfx9_gfx10", 
8, Index>;
+ _TTMP#!add(Index,7)#_gfx9plus : TmpRegTuples<"_gfx9plus", 8, 
Index>;
 }
 
 def 
TTMP0_TTMP1_TTMP2_TTMP3_TTMP4_TTMP5_TTMP6_TTMP7_TTMP8_TTMP9_TTMP10_TTMP11_TTMP12_TTMP13_TTMP14_TTMP15_vi
 :
@@ -456,12 +456,12 @@ def 
TTMP0_TTMP1_TTMP2_TTMP3_TTMP4_TTMP5_TTMP6_TTMP7_TTMP8_TTMP9_TTMP10_TTMP11_TT
 TTMP8_vi, TTMP9_vi, TTMP10_vi, TTMP11_vi,
 TTMP12_vi, TTMP13_vi, TTMP14_vi, TTMP15_vi]>;
 
-def 
TTMP0_TTMP1_TTMP2_TTMP3_TTMP4_TTMP5_TTMP6_TTMP7_TTMP8_TTMP9_TTMP10_TTMP11_TTMP12_TTMP13_TTMP14_TTMP15_gfx9_gfx10
 :
+def 
TTMP0_TTMP1_TTMP2_TTMP3_TTMP4_TTMP5_TTMP6_TTMP7_TTMP8_TTMP9_TTMP10_TTMP11_TTMP12_TTMP13_TTMP14_TTMP15_gfx9plu

[llvm-branch-commits] [llvm] c0b3c5a - [AMDGPU][GlobalISel] Run SIAddImgInit

2021-01-21 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2021-01-21T15:54:54Z
New Revision: c0b3c5a06451aad4351e35c74ccf2fe5da917a41

URL: 
https://github.com/llvm/llvm-project/commit/c0b3c5a06451aad4351e35c74ccf2fe5da917a41
DIFF: 
https://github.com/llvm/llvm-project/commit/c0b3c5a06451aad4351e35c74ccf2fe5da917a41.diff

LOG: [AMDGPU][GlobalISel] Run SIAddImgInit

This pass is required to get correct codegen for image instructions with
the tfe or lwe bits set.

Differential Revision: https://reviews.llvm.org/D95132

Added: 


Modified: 
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.1d.d16.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.1d.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.2d.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.2darraymsaa.a16.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.2darraymsaa.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.3d.a16.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.3d.ll

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 58c436836d19..7d8e8486602b 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -1109,6 +1109,10 @@ bool GCNPassConfig::addRegBankSelect() {
 
 bool GCNPassConfig::addGlobalInstructionSelect() {
   addPass(new InstructionSelect());
+  // TODO: Fix instruction selection to do the right thing for image
+  // instructions with tfe or lwe in the first place, instead of running a
+  // separate pass to fix them up?
+  addPass(createSIAddIMGInitPass());
   return false;
 }
 

diff  --git 
a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.1d.d16.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.1d.d16.ll
index 36f3e63598ca..99ab3580b91d 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.1d.d16.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.1d.d16.ll
@@ -655,6 +655,7 @@ define amdgpu_ps <4 x half> @load_1d_v4f16_xyzw(<8 x i32> 
inreg %rsrc, i32 %s) {
 define amdgpu_ps float @load_1d_f16_tfe_dmask_x(<8 x i32> inreg %rsrc, i32 %s) 
{
 ; GFX8-UNPACKED-LABEL: load_1d_f16_tfe_dmask_x:
 ; GFX8-UNPACKED:   ; %bb.0:
+; GFX8-UNPACKED-NEXT:v_mov_b32_e32 v1, 0
 ; GFX8-UNPACKED-NEXT:s_mov_b32 s0, s2
 ; GFX8-UNPACKED-NEXT:s_mov_b32 s1, s3
 ; GFX8-UNPACKED-NEXT:s_mov_b32 s2, s4
@@ -663,13 +664,15 @@ define amdgpu_ps float @load_1d_f16_tfe_dmask_x(<8 x i32> 
inreg %rsrc, i32 %s) {
 ; GFX8-UNPACKED-NEXT:s_mov_b32 s5, s7
 ; GFX8-UNPACKED-NEXT:s_mov_b32 s6, s8
 ; GFX8-UNPACKED-NEXT:s_mov_b32 s7, s9
-; GFX8-UNPACKED-NEXT:image_load v[0:1], v0, s[0:7] dmask:0x1 unorm tfe d16
+; GFX8-UNPACKED-NEXT:v_mov_b32_e32 v2, v1
+; GFX8-UNPACKED-NEXT:image_load v[1:2], v0, s[0:7] dmask:0x1 unorm tfe d16
 ; GFX8-UNPACKED-NEXT:s_waitcnt vmcnt(0)
-; GFX8-UNPACKED-NEXT:v_mov_b32_e32 v0, v1
+; GFX8-UNPACKED-NEXT:v_mov_b32_e32 v0, v2
 ; GFX8-UNPACKED-NEXT:; return to shader part epilog
 ;
 ; GFX8-PACKED-LABEL: load_1d_f16_tfe_dmask_x:
 ; GFX8-PACKED:   ; %bb.0:
+; GFX8-PACKED-NEXT:v_mov_b32_e32 v1, 0
 ; GFX8-PACKED-NEXT:s_mov_b32 s0, s2
 ; GFX8-PACKED-NEXT:s_mov_b32 s1, s3
 ; GFX8-PACKED-NEXT:s_mov_b32 s2, s4
@@ -678,13 +681,15 @@ define amdgpu_ps float @load_1d_f16_tfe_dmask_x(<8 x i32> 
inreg %rsrc, i32 %s) {
 ; GFX8-PACKED-NEXT:s_mov_b32 s5, s7
 ; GFX8-PACKED-NEXT:s_mov_b32 s6, s8
 ; GFX8-PACKED-NEXT:s_mov_b32 s7, s9
-; GFX8-PACKED-NEXT:image_load v[0:1], v0, s[0:7] dmask:0x1 unorm tfe d16
+; GFX8-PACKED-NEXT:v_mov_b32_e32 v2, v1
+; GFX8-PACKED-NEXT:image_load v[1:2], v0, s[0:7] dmask:0x1 unorm tfe d16
 ; GFX8-PACKED-NEXT:s_waitcnt vmcnt(0)
-; GFX8-PACKED-NEXT:v_mov_b32_e32 v0, v1
+; GFX8-PACKED-NEXT:v_mov_b32_e32 v0, v2
 ; GFX8-PACKED-NEXT:; return to shader part epilog
 ;
 ; GFX9-LABEL: load_1d_f16_tfe_dmask_x:
 ; GFX9:   ; %bb.0:
+; GFX9-NEXT:v_mov_b32_e32 v1, 0
 ; GFX9-NEXT:s_mov_b32 s0, s2
 ; GFX9-NEXT:s_mov_b32 s1, s3
 ; GFX9-NEXT:s_mov_b32 s2, s4
@@ -693,13 +698,15 @@ define amdgpu_ps float @load_1d_f16_tfe_dmask_x(<8 x i32> 
inreg %rsrc, i32 %s) {
 ; GFX9-NEXT:s_mov_b32 s5, s7
 ; GFX9-NEXT:s_mov_b32 s6, s8
 ; GFX9-NEXT:s_mov_b32 s7, s9
-; GFX9-NEXT:image_load v[0:1], v0, s[0:7] dmask:0x1 unorm tfe d16
+; GFX9-NEXT:v_mov_b32_e32 v2, v1
+; GFX9-NEXT:image_load v[1:2], v0, s[0:7] dmask:0x1 unorm tfe d16
 ; GFX9-NEXT:s_waitcnt vmcnt(0)
-; GFX9-NEXT:v_mov_b32_e32 v0, v1
+; GFX9-NEXT:v_mov_b32_e32 v0, v2
 ; GFX9-NEXT:; return to shader part epilog
 ;
 ; GFX10-LABEL: load_1d_f16_tfe_dmask_x:
 ; GFX10:   ; %bb.0:
+; GFX10-NEXT:v_mov_b32_e32 v1, 0
 ; GFX10-NEXT:s

[llvm-branch-commits] [llvm] 14eea6b - [LegacyPM] Update InversedLastUser on the fly. NFC.

2021-01-22 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2021-01-22T09:48:54Z
New Revision: 14eea6b0ecddfe7d1c68754a8bfb7c21cde82df8

URL: 
https://github.com/llvm/llvm-project/commit/14eea6b0ecddfe7d1c68754a8bfb7c21cde82df8
DIFF: 
https://github.com/llvm/llvm-project/commit/14eea6b0ecddfe7d1c68754a8bfb7c21cde82df8.diff

LOG: [LegacyPM] Update InversedLastUser on the fly. NFC.

This speeds up setLastUser enough to give a 5% to 10% speed up on
trivial invocations of opt and llc, as measured by:

perf stat -r 100 opt -S -o /dev/null -O3 /dev/null
perf stat -r 100 llc -march=amdgcn /dev/null -filetype null

Don't dump last use information unless -debug-pass=Details to avoid
printing lots of spam that will break some existing lit tests. Before
this patch, dumping last use information was broken anyway, because it
used InversedLastUser before it had been populated.

Differential Revision: https://reviews.llvm.org/D92309

Added: 


Modified: 
llvm/include/llvm/IR/LegacyPassManagers.h
llvm/lib/IR/LegacyPassManager.cpp

Removed: 




diff  --git a/llvm/include/llvm/IR/LegacyPassManagers.h 
b/llvm/include/llvm/IR/LegacyPassManagers.h
index 498e736a0100..f4fae184e428 100644
--- a/llvm/include/llvm/IR/LegacyPassManagers.h
+++ b/llvm/include/llvm/IR/LegacyPassManagers.h
@@ -230,11 +230,11 @@ class PMTopLevelManager {
 
   // Map to keep track of last user of the analysis pass.
   // LastUser->second is the last user of Lastuser->first.
+  // This is kept in sync with InversedLastUser.
   DenseMap LastUser;
 
   // Map to keep track of passes that are last used by a pass.
-  // This inverse map is initialized at PM->run() based on
-  // LastUser map.
+  // This is kept in sync with LastUser.
   DenseMap > InversedLastUser;
 
   /// Immutable passes are managed by top level manager.

diff  --git a/llvm/lib/IR/LegacyPassManager.cpp 
b/llvm/lib/IR/LegacyPassManager.cpp
index 5575bc469a87..4547c3a01239 100644
--- a/llvm/lib/IR/LegacyPassManager.cpp
+++ b/llvm/lib/IR/LegacyPassManager.cpp
@@ -568,7 +568,12 @@ PMTopLevelManager::setLastUser(ArrayRef 
AnalysisPasses, Pass *P) {
 PDepth = P->getResolver()->getPMDataManager().getDepth();
 
   for (Pass *AP : AnalysisPasses) {
-LastUser[AP] = P;
+// Record P as the new last user of AP.
+auto &LastUserOfAP = LastUser[AP];
+if (LastUserOfAP)
+  InversedLastUser[LastUserOfAP].erase(AP);
+LastUserOfAP = P;
+InversedLastUser[P].insert(AP);
 
 if (P == AP)
   continue;
@@ -598,13 +603,13 @@ PMTopLevelManager::setLastUser(ArrayRef 
AnalysisPasses, Pass *P) {
 if (P->getResolver())
   setLastUser(LastPMUses, 
P->getResolver()->getPMDataManager().getAsPass());
 
-
 // If AP is the last user of other passes then make P last user of
 // such passes.
-for (auto &LU : LastUser) {
-  if (LU.second == AP)
-LU.second = P;
-}
+auto &LastUsedByAP = InversedLastUser[AP];
+for (Pass *L : LastUsedByAP)
+  LastUser[L] = P;
+InversedLastUser[P].insert(LastUsedByAP.begin(), LastUsedByAP.end());
+LastUsedByAP.clear();
   }
 }
 
@@ -850,11 +855,6 @@ void PMTopLevelManager::initializeAllAnalysisInfo() {
   // Initailize other pass managers
   for (PMDataManager *IPM : IndirectPassManagers)
 IPM->initializeAnalysisInfo();
-
-  for (auto LU : LastUser) {
-SmallPtrSet &L = InversedLastUser[LU.second];
-L.insert(LU.first);
-  }
 }
 
 /// Destructor
@@ -1151,6 +1151,8 @@ Pass *PMDataManager::findAnalysisPass(AnalysisID AID, 
bool SearchParent) {
 
 // Print list of passes that are last used by P.
 void PMDataManager::dumpLastUses(Pass *P, unsigned Offset) const{
+  if (PassDebugging < Details)
+return;
 
   SmallVector LUses;
 



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 4e6054a - [AMDGPU] Split out new helper function macToMad in SIFoldOperands. NFC.

2021-01-05 Thread Jay Foad via llvm-branch-commits

Author: Jay Foad
Date: 2021-01-05T11:54:48Z
New Revision: 4e6054a86c0cb0697913007c99b59f3f65c9d04b

URL: 
https://github.com/llvm/llvm-project/commit/4e6054a86c0cb0697913007c99b59f3f65c9d04b
DIFF: 
https://github.com/llvm/llvm-project/commit/4e6054a86c0cb0697913007c99b59f3f65c9d04b.diff

LOG: [AMDGPU] Split out new helper function macToMad in SIFoldOperands. NFC.

Differential Revision: https://reviews.llvm.org/D94009

Added: 


Modified: 
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp 
b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index d86527df5c3c..6dc01c3d3c21 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -129,6 +129,21 @@ char SIFoldOperands::ID = 0;
 
 char &llvm::SIFoldOperandsID = SIFoldOperands::ID;
 
+// Map multiply-accumulate opcode to corresponding multiply-add opcode if any.
+static unsigned macToMad(unsigned Opc) {
+  switch (Opc) {
+  case AMDGPU::V_MAC_F32_e64:
+return AMDGPU::V_MAD_F32;
+  case AMDGPU::V_MAC_F16_e64:
+return AMDGPU::V_MAD_F16;
+  case AMDGPU::V_FMAC_F32_e64:
+return AMDGPU::V_FMA_F32;
+  case AMDGPU::V_FMAC_F16_e64:
+return AMDGPU::V_FMA_F16_gfx9;
+  }
+  return AMDGPU::INSTRUCTION_LIST_END;
+}
+
 // Wrapper around isInlineConstant that understands special cases when
 // instruction types are replaced during operand folding.
 static bool isInlineConstantIfFolded(const SIInstrInfo *TII,
@@ -139,31 +154,18 @@ static bool isInlineConstantIfFolded(const SIInstrInfo 
*TII,
 return true;
 
   unsigned Opc = UseMI.getOpcode();
-  switch (Opc) {
-  case AMDGPU::V_MAC_F32_e64:
-  case AMDGPU::V_MAC_F16_e64:
-  case AMDGPU::V_FMAC_F32_e64:
-  case AMDGPU::V_FMAC_F16_e64: {
+  unsigned NewOpc = macToMad(Opc);
+  if (NewOpc != AMDGPU::INSTRUCTION_LIST_END) {
 // Special case for mac. Since this is replaced with mad when folded into
 // src2, we need to check the legality for the final instruction.
 int Src2Idx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::src2);
 if (static_cast(OpNo) == Src2Idx) {
-  bool IsFMA = Opc == AMDGPU::V_FMAC_F32_e64 ||
-   Opc == AMDGPU::V_FMAC_F16_e64;
-  bool IsF32 = Opc == AMDGPU::V_MAC_F32_e64 ||
-   Opc == AMDGPU::V_FMAC_F32_e64;
-
-  unsigned Opc = IsFMA ?
-(IsF32 ? AMDGPU::V_FMA_F32 : AMDGPU::V_FMA_F16_gfx9) :
-(IsF32 ? AMDGPU::V_MAD_F32 : AMDGPU::V_MAD_F16);
-  const MCInstrDesc &MadDesc = TII->get(Opc);
+  const MCInstrDesc &MadDesc = TII->get(NewOpc);
   return TII->isInlineConstant(OpToFold, MadDesc.OpInfo[OpNo].OperandType);
 }
-return false;
-  }
-  default:
-return false;
   }
+
+  return false;
 }
 
 // TODO: Add heuristic that the frame index might not fit in the addressing 
mode
@@ -346,17 +348,8 @@ static bool 
tryAddToFoldList(SmallVectorImpl &FoldList,
   if (!TII->isOperandLegal(*MI, OpNo, OpToFold)) {
 // Special case for v_mac_{f16, f32}_e64 if we are trying to fold into src2
 unsigned Opc = MI->getOpcode();
-if ((Opc == AMDGPU::V_MAC_F32_e64 || Opc == AMDGPU::V_MAC_F16_e64 ||
- Opc == AMDGPU::V_FMAC_F32_e64 || Opc == AMDGPU::V_FMAC_F16_e64) &&
-(int)OpNo == AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::src2)) {
-  bool IsFMA = Opc == AMDGPU::V_FMAC_F32_e64 ||
-   Opc == AMDGPU::V_FMAC_F16_e64;
-  bool IsF32 = Opc == AMDGPU::V_MAC_F32_e64 ||
-   Opc == AMDGPU::V_FMAC_F32_e64;
-  unsigned NewOpc = IsFMA ?
-(IsF32 ? AMDGPU::V_FMA_F32 : AMDGPU::V_FMA_F16_gfx9) :
-(IsF32 ? AMDGPU::V_MAD_F32 : AMDGPU::V_MAD_F16);
-
+unsigned NewOpc = macToMad(Opc);
+if (NewOpc != AMDGPU::INSTRUCTION_LIST_END) {
   // Check if changing this to a v_mad_{f16, f32} instruction will allow us
   // to fold the operand.
   MI->setDesc(TII->get(NewOpc));



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


  1   2   >