[llvm-branch-commits] [llvm] AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (PR #95394)

2024-06-14 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -1669,13 +1670,16 @@ defm : FlatSignedAtomicPatWithAddrSpace 
<"FLAT_ATOMIC_ADD_F32", "int_amdgcn_flat
 }
 
 let OtherPredicates = [HasAtomicFlatPkAdd16Insts] in {
+// FIXME: These do not have signed offsets

rampitec wrote:

Can you just use FlatAtomicPat?

https://github.com/llvm/llvm-project/pull/95394
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (PR #95394)

2024-06-14 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -15931,6 +15931,26 @@ static OptimizationRemark 
emitAtomicRMWLegalRemark(const AtomicRMWInst *RMW) {
  << " operation at memory scope " << MemScope;
 }
 
+static bool isHalf2OrBFloat2(Type *Ty) {

rampitec wrote:

Does the underlying type really matter? Is 2 x 16-bit type sufficient?

https://github.com/llvm/llvm-project/pull/95394
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] AMDGPU: Remove ds atomic fadd intrinsics (PR #95396)

2024-06-14 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.

LGTM contingent the plan to produce atomicrmw.

https://github.com/llvm/llvm-project/pull/95396
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/flat (PR #95394)

2024-06-14 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/95394
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Handle legal v2bf16 atomicrmw fadd for gfx12 (PR #95930)

2024-06-18 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -1735,8 +1737,11 @@ defm : SIBufferAtomicPat<"SIbuffer_atomic_dec", i64, 
"BUFFER_ATOMIC_DEC_X2">;
 let OtherPredicates = [HasAtomicCSubNoRtnInsts] in
 defm : SIBufferAtomicPat<"SIbuffer_atomic_csub", i32, "BUFFER_ATOMIC_CSUB", 
["noret"]>;
 
-let SubtargetPredicate = isGFX12Plus in {
+let SubtargetPredicate = HasAtomicBufferPkAddBF16Inst in {
   defm : SIBufferAtomicPat_Common<"SIbuffer_atomic_fadd", v2bf16, 
"BUFFER_ATOMIC_PK_ADD_BF16_VBUFFER">;

rampitec wrote:

Should it use OtherPredicates = [HasAtomicBufferPkAddBF16Inst] and 
SubtargetPredicate = isGFX12Plus because VBUFFER opcode is used?

https://github.com/llvm/llvm-project/pull/95930
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Handle legal v2bf16 atomicrmw fadd for gfx12 (PR #95930)

2024-06-18 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -743,6 +743,12 @@ def FeatureAtomicGlobalPkAddBF16Inst : 
SubtargetFeature<"atomic-global-pk-add-bf
  [FeatureFlatGlobalInsts]
 >;
 
+def FeatureAtomicBufferPkAddBF16Inst : 
SubtargetFeature<"atomic-buffer-pk-add-bf16-inst",

rampitec wrote:

I believe it is above FeatureAtomicGlobalPkAddBF16Instin downstream. Can you 
fix the order here or there?

https://github.com/llvm/llvm-project/pull/95930
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-06-20 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -886,26 +977,17 @@ multiclass SMRD_Pattern  {
   def : GCNPat <
 (smrd_load (SMRDSgpr i64:$sbase, i32:$soffset)),
 (vt (!cast(Instr#"_SGPR") $sbase, $soffset, 0))> {
-let OtherPredicates = [isNotGFX9Plus];
-  }
-  def : GCNPat <
-(smrd_load (SMRDSgpr i64:$sbase, i32:$soffset)),
-(vt (!cast(Instr#"_SGPR_IMM") $sbase, $soffset, 0, 0))> {
-let OtherPredicates = [isGFX9Plus];
+let OtherPredicates = [isGFX6GFX7];
   }
 
-  // 4. SGPR+IMM offset
+  // 4. No offset
   def : GCNPat <
-(smrd_load (SMRDSgprImm i64:$sbase, i32:$soffset, i32:$offset)),
-(vt (!cast(Instr#"_SGPR_IMM") $sbase, $soffset, $offset, 0))> {
-let OtherPredicates = [isGFX9Plus];
+(vt (smrd_load (i64 SReg_64:$sbase))),
+(vt (!cast(Instr#"_IMM") i64:$sbase, 0, 0))> {
+let OtherPredicates = [isGFX6GFX7];
   }
 
-  // 5. No offset
-  def : GCNPat <
-(vt (smrd_load (i64 SReg_64:$sbase))),
-(vt (!cast(Instr#"_IMM") i64:$sbase, 0, 0))
-  >;
+  defm : SMRD_Align_Pattern;

rampitec wrote:

You can avoid duplicating patterns for aligned case, you just need to check if 
xnack is on (and it is off before gfx8).
I also do not see xnack checked anywhere.

https://github.com/llvm/llvm-project/pull/96163
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-20 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -1701,17 +1732,33 @@ unsigned SILoadStoreOptimizer::getNewOpcode(const 
CombineInfo &CI,
   return AMDGPU::S_BUFFER_LOAD_DWORDX8_SGPR_IMM;
 }
   case S_LOAD_IMM:
-switch (Width) {
-default:
-  return 0;
-case 2:
-  return AMDGPU::S_LOAD_DWORDX2_IMM;
-case 3:
-  return AMDGPU::S_LOAD_DWORDX3_IMM;
-case 4:
-  return AMDGPU::S_LOAD_DWORDX4_IMM;
-case 8:
-  return AMDGPU::S_LOAD_DWORDX8_IMM;
+// For targets that support XNACK replay, use the constrained load opcode.
+if (STI && STI->hasXnackReplay()) {
+  switch (Width) {

rampitec wrote:

You can check alignment on the first load if MMO is available and avoid 
producing _ec version if it is sufficient.

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add a subtarget feature for fine-grained remote memory support (PR #96442)

2024-06-24 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

> We do statically know for some of the targets (mostly gfx12 and gfx940) that 
> it's supposed to work. This is the "scope downgrade" vs. "nop" cases in the 
> atomic support table

Actually not, we do not know the bus. Moreover, we know this is opposite.

https://github.com/llvm/llvm-project/pull/96442
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-24 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -1701,17 +1732,33 @@ unsigned SILoadStoreOptimizer::getNewOpcode(const 
CombineInfo &CI,
   return AMDGPU::S_BUFFER_LOAD_DWORDX8_SGPR_IMM;
 }
   case S_LOAD_IMM:
-switch (Width) {
-default:
-  return 0;
-case 2:
-  return AMDGPU::S_LOAD_DWORDX2_IMM;
-case 3:
-  return AMDGPU::S_LOAD_DWORDX3_IMM;
-case 4:
-  return AMDGPU::S_LOAD_DWORDX4_IMM;
-case 8:
-  return AMDGPU::S_LOAD_DWORDX8_IMM;
+// For targets that support XNACK replay, use the constrained load opcode.
+if (STI && STI->hasXnackReplay()) {
+  switch (Width) {

rampitec wrote:

> > currently the alignment is picked from the first MMO and that'd definitely 
> > be smaller than the natural align requirement for the new load
> 
> You don't know that - the alignment in the first MMO will be whatever 
> alignment the compiler could deduce, which could be large, e.g. if the 
> pointer used for the first load was known to have a large alignment.

Moreover, it can easily be as large as a page. In a case of scalar load and 
kernarg.

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add subtarget feature for memory atomic fadd f64 (PR #96444)

2024-06-24 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

Use it in a predicate when defining pseudos?

https://github.com/llvm/llvm-project/pull/96444
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add subtarget feature for global atomic fadd denormal support (PR #96443)

2024-06-24 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

It is worse than that. It behaves differently depending on where atomic is 
executed. There is no single answer if this instruction supports denorms or not.

https://github.com/llvm/llvm-project/pull/96443
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add subtarget feature for global atomic fadd denormal support (PR #96443)

2024-06-24 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/96443
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add subtarget feature for memory atomic fadd f64 (PR #96444)

2024-06-24 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/96444
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Remove ds_fmin/ds_fmax intrinsics (PR #96739)

2024-06-26 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/96739
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Enable vectorization of v2f16 copysign (PR #100799)

2024-07-29 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/100799
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Correct costs of saturating add/sub intrinsics (PR #100808)

2024-07-29 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/100808
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Support VALU add instructions in localstackalloc (PR #101692)

2024-08-02 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -809,7 +826,59 @@ int64_t SIRegisterInfo::getFrameIndexInstrOffset(const 
MachineInstr *MI,
   return getScratchInstrOffset(MI);
 }
 
+static bool isFIPlusImmOrVGPR(const SIRegisterInfo &TRI,
+  const MachineInstr &MI) {
+  const MachineOperand &Src0 = MI.getOperand(1);

rampitec wrote:

Assert this is an add or move the function inside the needsFrameBaseReg?

https://github.com/llvm/llvm-project/pull/101692
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Support VALU add instructions in localstackalloc (PR #101692)

2024-08-02 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -797,6 +797,23 @@ int64_t SIRegisterInfo::getScratchInstrOffset(const 
MachineInstr *MI) const {
 
 int64_t SIRegisterInfo::getFrameIndexInstrOffset(const MachineInstr *MI,
  int Idx) const {
+  switch (MI->getOpcode()) {

rampitec wrote:

Bail if any modifiers are set?

https://github.com/llvm/llvm-project/pull/101692
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Support VALU add instructions in localstackalloc (PR #101692)

2024-08-02 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -877,6 +948,86 @@ Register 
SIRegisterInfo::materializeFrameBaseRegister(MachineBasicBlock *MBB,
 void SIRegisterInfo::resolveFrameIndex(MachineInstr &MI, Register BaseReg,
int64_t Offset) const {
   const SIInstrInfo *TII = ST.getInstrInfo();
+
+  switch (MI.getOpcode()) {
+  case AMDGPU::V_ADD_U32_e32:
+  case AMDGPU::V_ADD_CO_U32_e32: {
+MachineOperand *FIOp = &MI.getOperand(2);
+MachineOperand *ImmOp = &MI.getOperand(1);
+if (!FIOp->isFI())
+  std::swap(FIOp, ImmOp);
+
+if (!ImmOp->isImm()) {
+  assert(Offset == 0);
+  FIOp->ChangeToRegister(BaseReg, false);
+  TII->legalizeOperandsVOP2(MI.getMF()->getRegInfo(), MI);
+  return;
+}
+
+int64_t TotalOffset = ImmOp->getImm() + Offset;
+if (TotalOffset == 0) {
+  MI.setDesc(TII->get(AMDGPU::COPY));
+  for (unsigned I = MI.getNumOperands() - 1; I != 1; --I)
+MI.removeOperand(I);
+
+  MI.getOperand(1).ChangeToRegister(BaseReg, false);
+  return;
+}
+
+ImmOp->setImm(TotalOffset);
+
+MachineBasicBlock *MBB = MI.getParent();
+MachineFunction *MF = MBB->getParent();
+MachineRegisterInfo &MRI = MF->getRegInfo();
+
+// FIXME: materializeFrameBaseRegister does not know the register class of
+// the uses of the frame index, and assumes SGPR for enableFlatScratch. 
Emit
+// a copy so we have a legal operand and hope the register coalescer can
+// clean it up.
+if (isSGPRReg(MRI, BaseReg)) {
+  Register BaseRegVGPR =
+  MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
+  BuildMI(*MBB, MI, MI.getDebugLoc(), TII->get(AMDGPU::COPY), BaseRegVGPR)
+  .addReg(BaseReg);
+  MI.getOperand(2).ChangeToRegister(BaseRegVGPR, false);
+} else {
+  MI.getOperand(2).ChangeToRegister(BaseReg, false);
+}
+return;
+  }
+  case AMDGPU::V_ADD_U32_e64:
+  case AMDGPU::V_ADD_CO_U32_e64: {
+int Src0Idx = MI.getNumExplicitDefs();

rampitec wrote:

Check that modifiers are clear?

https://github.com/llvm/llvm-project/pull/101692
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Support VALU add instructions in localstackalloc (PR #101692)

2024-08-02 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -797,6 +797,23 @@ int64_t SIRegisterInfo::getScratchInstrOffset(const 
MachineInstr *MI) const {
 
 int64_t SIRegisterInfo::getFrameIndexInstrOffset(const MachineInstr *MI,
  int Idx) const {
+  switch (MI->getOpcode()) {

rampitec wrote:

Ack

https://github.com/llvm/llvm-project/pull/101692
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] InferAddressSpaces: Handle llvm.is.constant (PR #102010)

2024-08-05 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec commented:

Add some tests where argument is not a pointer?

https://github.com/llvm/llvm-project/pull/102010
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] InferAddressSpaces: Handle masked load and store intrinsics (PR #102007)

2024-08-05 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/102007
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] InferAddressSpaces: Handle llvm.is.constant (PR #102010)

2024-08-05 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.

LGTM modulo braces comment.

https://github.com/llvm/llvm-project/pull/102010
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Fold frame indexes into s_or_b32 and s_and_b32 (PR #102345)

2024-08-07 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -190,31 +186,31 @@ body: |
 ; MUBUFW64-LABEL: name: s_and_b32__sgpr__fi_literal_offset
 ; MUBUFW64: liveins: $sgpr8
 ; MUBUFW64-NEXT: {{  $}}
-; MUBUFW64-NEXT: $sgpr4 = S_LSHR_B32 $sgpr32, 6, implicit-def $scc
-; MUBUFW64-NEXT: $sgpr4 = S_ADD_I32 killed $sgpr4, 80, implicit-def $scc
-; MUBUFW64-NEXT: renamable $sgpr7 = S_AND_B32 $sgpr8, killed $sgpr4, 
implicit-def $scc
+; MUBUFW64-NEXT: renamable $sgpr4 = S_LSHR_B32 $sgpr32, 6, implicit-def 
dead $scc
+; MUBUFW64-NEXT: renamable $sgpr7 = S_ADD_I32 $sgpr4, $sgpr8, implicit-def 
$scc
+; MUBUFW64-NEXT: renamable $sgpr7 = S_AND_B32 $sgpr7, 80, implicit-def $scc
 ; MUBUFW64-NEXT: SI_RETURN implicit $sgpr7, implicit $scc
 ;
 ; MUBUFW32-LABEL: name: s_and_b32__sgpr__fi_literal_offset
 ; MUBUFW32: liveins: $sgpr8
 ; MUBUFW32-NEXT: {{  $}}
-; MUBUFW32-NEXT: $sgpr4 = S_LSHR_B32 $sgpr32, 5, implicit-def $scc
-; MUBUFW32-NEXT: $sgpr4 = S_ADD_I32 killed $sgpr4, 80, implicit-def $scc
-; MUBUFW32-NEXT: renamable $sgpr7 = S_AND_B32 $sgpr8, killed $sgpr4, 
implicit-def $scc
+; MUBUFW32-NEXT: renamable $sgpr4 = S_LSHR_B32 $sgpr32, 5, implicit-def 
dead $scc
+; MUBUFW32-NEXT: renamable $sgpr7 = S_ADD_I32 $sgpr4, $sgpr8, implicit-def 
$scc
+; MUBUFW32-NEXT: renamable $sgpr7 = S_AND_B32 $sgpr7, 80, implicit-def $scc
 ; MUBUFW32-NEXT: SI_RETURN implicit $sgpr7, implicit $scc
 ;
 ; FLATSCRW64-LABEL: name: s_and_b32__sgpr__fi_literal_offset
 ; FLATSCRW64: liveins: $sgpr8
 ; FLATSCRW64-NEXT: {{  $}}
-; FLATSCRW64-NEXT: $sgpr4 = S_ADD_I32 $sgpr32, 80, implicit-def $scc
-; FLATSCRW64-NEXT: renamable $sgpr7 = S_AND_B32 $sgpr8, killed $sgpr4, 
implicit-def $scc
+; FLATSCRW64-NEXT: renamable $sgpr7 = S_ADD_I32 $sgpr32, $sgpr8, 
implicit-def $scc
+; FLATSCRW64-NEXT: renamable $sgpr7 = S_AND_B32 $sgpr7, 80, implicit-def 
$scc
 ; FLATSCRW64-NEXT: SI_RETURN implicit $sgpr7, implicit $scc
 ;
 ; FLATSCRW32-LABEL: name: s_and_b32__sgpr__fi_literal_offset
 ; FLATSCRW32: liveins: $sgpr8
 ; FLATSCRW32-NEXT: {{  $}}
-; FLATSCRW32-NEXT: $sgpr4 = S_ADD_I32 $sgpr32, 80, implicit-def $scc
-; FLATSCRW32-NEXT: renamable $sgpr7 = S_AND_B32 $sgpr8, killed $sgpr4, 
implicit-def $scc
+; FLATSCRW32-NEXT: renamable $sgpr7 = S_ADD_I32 $sgpr32, $sgpr8, 
implicit-def $scc

rampitec wrote:

I do not understand this. The transformation is `(s8 & (sp + 80)) ->((s8 + sp) 
& 80)` does not look immediately obvious.

https://github.com/llvm/llvm-project/pull/102345
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Preserve atomicrmw name when specializing address space (PR #102470)

2024-08-08 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/102470
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add noalias.addrspace metadata when autoupgrading atomic intrinsics (PR #102599)

2024-08-09 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/102599
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] NewPM/AMDGPU: Port AMDGPUPerfHintAnalysis to new pass manager (PR #102645)

2024-08-09 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -22,6 +22,7 @@ MODULE_PASS("amdgpu-lower-buffer-fat-pointers",
 AMDGPULowerBufferFatPointersPass(*this))
 MODULE_PASS("amdgpu-lower-ctor-dtor", AMDGPUCtorDtorLoweringPass())
 MODULE_PASS("amdgpu-lower-module-lds", AMDGPULowerModuleLDSPass(*this))
+MODULE_PASS("amdgpu-perf-hint", AMDGPUPerfHintAnalysisPass(*static_cast(this)))

rampitec wrote:

Exceeds 80 chars per line.

https://github.com/llvm/llvm-project/pull/102645
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] NewPM/AMDGPU: Port AMDGPUPerfHintAnalysis to new pass manager (PR #102645)

2024-08-09 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -413,18 +439,57 @@ bool AMDGPUPerfHintAnalysis::runOnSCC(CallGraphSCC &SCC) {
   return Changed;
 }
 
-bool AMDGPUPerfHintAnalysis::isMemoryBound(const Function *F) const {
-  auto FI = FIM.find(F);
-  if (FI == FIM.end())
-return false;
+bool AMDGPUPerfHintAnalysis::run(const GCNTargetMachine &TM,
+ LazyCallGraph &CG) {
 
-  return AMDGPUPerfHint::isMemBound(FI->second);
+  SmallVector Worklist;
+  CG.buildRefSCCs();
+  for (LazyCallGraph::RefSCC &RC : CG.postorder_ref_sccs()) {
+for (LazyCallGraph::SCC &SCC : RC) {
+  if (SCC.size() != 1)
+continue;
+  Function &F = SCC.begin()->getFunction();
+  if (!F.isDeclaration() && !F.doesNotRecurse() && F.hasInternalLinkage())

rampitec wrote:

Why is it limited to internal linkage?

https://github.com/llvm/llvm-project/pull/102645
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/NewPM: Port AMDGPUAnnotateUniformValues to new pass manager (PR #102654)

2024-08-09 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/102654
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/NewPM: Port SILowerI1Copies to new pass manager (PR #102663)

2024-08-09 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/102663
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] CodeGen/NewPM: Add ExpandLarge* passes to isel IR passes (PR #102815)

2024-08-12 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/102815
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/NewPM: Start implementing addCodeGenPrepare (PR #102816)

2024-08-12 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/102816
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Declare pass control flags in header (PR #102865)

2024-08-12 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

> I don't really like needing to expose these globally like this; maybe it 
> would be better to just move TargetPassConfig and the CodeGenPassBuilder into 
> one common file?

Yep, I also do not like extern cl::opt.

https://github.com/llvm/llvm-project/pull/102865
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/NewPM: Fill out passes in addCodeGenPrepare (PR #102867)

2024-08-12 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/102867
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/NewPM: Start filling out addIRPasses (PR #102884)

2024-08-12 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/102884
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Fix sign confusion in performMulLoHiCombine (PR #106977)

2024-09-03 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/106977
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 607bec0 - Change materializeFrameBaseRegister() to return register

2021-01-22 Thread Stanislav Mekhanoshin via llvm-branch-commits

Author: Stanislav Mekhanoshin
Date: 2021-01-22T15:51:06-08:00
New Revision: 607bec0bb9f787acca95f53dabe6a5c227f6b6b2

URL: 
https://github.com/llvm/llvm-project/commit/607bec0bb9f787acca95f53dabe6a5c227f6b6b2
DIFF: 
https://github.com/llvm/llvm-project/commit/607bec0bb9f787acca95f53dabe6a5c227f6b6b2.diff

LOG: Change materializeFrameBaseRegister() to return register

The only caller of this function is in the LocalStackSlotAllocation
and it creates base register of class returned by the target's
getPointerRegClass(). AMDGPU wants to use a different reg class
here so let materializeFrameBaseRegister to just create and return
whatever it wants.

Differential Revision: https://reviews.llvm.org/D95268

Added: 


Modified: 
llvm/include/llvm/CodeGen/TargetRegisterInfo.h
llvm/lib/CodeGen/LocalStackSlotAllocation.cpp
llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
llvm/lib/Target/AArch64/AArch64RegisterInfo.h
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
llvm/lib/Target/AMDGPU/SIRegisterInfo.h
llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp
llvm/lib/Target/ARM/ARMBaseRegisterInfo.h
llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp
llvm/lib/Target/PowerPC/PPCRegisterInfo.h

Removed: 




diff  --git a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h 
b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
index 253f71cb5f1a..8790e2f09eb6 100644
--- a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
@@ -911,11 +911,11 @@ class TargetRegisterInfo : public MCRegisterInfo {
 return false;
   }
 
-  /// Insert defining instruction(s) for BaseReg to be a pointer to FrameIdx
-  /// before insertion point I.
-  virtual void materializeFrameBaseRegister(MachineBasicBlock *MBB,
-Register BaseReg, int FrameIdx,
-int64_t Offset) const {
+  /// Insert defining instruction(s) for a pointer to FrameIdx before
+  /// insertion point I. Return materialized frame pointer.
+  virtual Register materializeFrameBaseRegister(MachineBasicBlock *MBB,
+int FrameIdx,
+int64_t Offset) const {
 llvm_unreachable("materializeFrameBaseRegister does not exist on this "
  "target");
   }

diff  --git a/llvm/lib/CodeGen/LocalStackSlotAllocation.cpp 
b/llvm/lib/CodeGen/LocalStackSlotAllocation.cpp
index ec3cce3fa1f1..ec6e693e8a46 100644
--- a/llvm/lib/CodeGen/LocalStackSlotAllocation.cpp
+++ b/llvm/lib/CodeGen/LocalStackSlotAllocation.cpp
@@ -416,15 +416,16 @@ bool 
LocalStackSlotPass::insertFrameReferenceRegisters(MachineFunction &Fn) {
   const TargetRegisterClass *RC = TRI->getPointerRegClass(*MF);
   BaseReg = Fn.getRegInfo().createVirtualRegister(RC);
 
-  LLVM_DEBUG(dbgs() << "  Materializing base register " << BaseReg
+  LLVM_DEBUG(dbgs() << "  Materializing base register"
 << " at frame local offset "
-<< LocalOffset + InstrOffset << "\n");
+<< LocalOffset + InstrOffset);
 
   // Tell the target to insert the instruction to initialize
   // the base register.
   //MachineBasicBlock::iterator InsertionPt = Entry->begin();
-  TRI->materializeFrameBaseRegister(Entry, BaseReg, FrameIdx,
-InstrOffset);
+  BaseReg = TRI->materializeFrameBaseRegister(Entry, FrameIdx, 
InstrOffset);
+
+  LLVM_DEBUG(dbgs() << " into " << printReg(BaseReg, TRI) << '\n');
 
   // The base register already includes any offset specified
   // by the instruction, so account for that so it doesn't get

diff  --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp 
b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
index 231e8b3089f6..f90856d14b2f 100644
--- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
@@ -531,10 +531,10 @@ bool AArch64RegisterInfo::isFrameOffsetLegal(const 
MachineInstr *MI,
 
 /// Insert defining instruction(s) for BaseReg to be a pointer to FrameIdx
 /// at the beginning of the basic block.
-void AArch64RegisterInfo::materializeFrameBaseRegister(MachineBasicBlock *MBB,
-   Register BaseReg,
-   int FrameIdx,
-   int64_t Offset) const {
+Register
+AArch64RegisterInfo::materializeFrameBaseRegister(MachineBasicBlock *MBB,
+  int FrameIdx,
+  int64_t Offset) const {
   MachineBasicBlock::iterator Ins = MBB->begin();
   DebugLoc DL; // Defaults to "unknown"
   if (Ins != MBB->end())
@@ -544,6 +544,7 @@ void 
AArch64RegisterI

[llvm-branch-commits] [llvm] ca904b8 - [AMDGPU] Fix FP materialization/resolve with flat scratch

2021-01-22 Thread Stanislav Mekhanoshin via llvm-branch-commits

Author: Stanislav Mekhanoshin
Date: 2021-01-22T16:06:47-08:00
New Revision: ca904b81e6488b45cbfe846dc86f1406b8e9c03d

URL: 
https://github.com/llvm/llvm-project/commit/ca904b81e6488b45cbfe846dc86f1406b8e9c03d
DIFF: 
https://github.com/llvm/llvm-project/commit/ca904b81e6488b45cbfe846dc86f1406b8e9c03d.diff

LOG: [AMDGPU] Fix FP materialization/resolve with flat scratch

Differential Revision: https://reviews.llvm.org/D95266

Added: 


Modified: 
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
llvm/test/CodeGen/AMDGPU/flat-scratch.ll
llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
index 8911917cffb0..7a45d8c54f9a 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
@@ -417,7 +417,7 @@ bool SIRegisterInfo::needsFrameBaseReg(MachineInstr *MI, 
int64_t Offset) const {
 return !SIInstrInfo::isLegalMUBUFImmOffset(FullOffset);
 
   const SIInstrInfo *TII = ST.getInstrInfo();
-  return TII->isLegalFLATOffset(FullOffset, AMDGPUAS::PRIVATE_ADDRESS, true);
+  return !TII->isLegalFLATOffset(FullOffset, AMDGPUAS::PRIVATE_ADDRESS, true);
 }
 
 Register SIRegisterInfo::materializeFrameBaseRegister(MachineBasicBlock *MBB,
@@ -496,7 +496,6 @@ void SIRegisterInfo::resolveFrameIndex(MachineInstr &MI, 
Register BaseReg,
   MachineOperand *OffsetOp = TII->getNamedOperand(MI, AMDGPU::OpName::offset);
   int64_t NewOffset = OffsetOp->getImm() + Offset;
 
-#ifndef NDEBUG
   assert(FIOp && FIOp->isFI() && "frame index must be address operand");
   assert(TII->isMUBUF(MI) || TII->isFLATScratch(MI));
 
@@ -508,6 +507,7 @@ void SIRegisterInfo::resolveFrameIndex(MachineInstr &MI, 
Register BaseReg,
 return;
   }
 
+#ifndef NDEBUG
   MachineOperand *SOffset = TII->getNamedOperand(MI, AMDGPU::OpName::soffset);
   assert(SOffset->isImm() && SOffset->getImm() == 0);
 #endif
@@ -522,7 +522,7 @@ void SIRegisterInfo::resolveFrameIndex(MachineInstr &MI, 
Register BaseReg,
 bool SIRegisterInfo::isFrameOffsetLegal(const MachineInstr *MI,
 Register BaseReg,
 int64_t Offset) const {
-  if (!SIInstrInfo::isMUBUF(*MI) && !!SIInstrInfo::isFLATScratch(*MI))
+  if (!SIInstrInfo::isMUBUF(*MI) && !SIInstrInfo::isFLATScratch(*MI))
 return false;
 
   int64_t NewOffset = Offset + getScratchInstrOffset(MI);

diff  --git a/llvm/test/CodeGen/AMDGPU/flat-scratch.ll 
b/llvm/test/CodeGen/AMDGPU/flat-scratch.ll
index 916c2d43a4c0..4244d8f4deb5 100644
--- a/llvm/test/CodeGen/AMDGPU/flat-scratch.ll
+++ b/llvm/test/CodeGen/AMDGPU/flat-scratch.ll
@@ -1185,7 +1185,7 @@ define amdgpu_kernel void 
@zero_init_large_offset_kernel() {
 ; GFX9-NEXT:s_add_u32 flat_scratch_lo, s0, s3
 ; GFX9-NEXT:s_addc_u32 flat_scratch_hi, s1, 0
 ; GFX9-NEXT:s_mov_b32 vcc_hi, 0
-; GFX9-NEXT:scratch_load_dword v0, off, vcc_hi offset:4 glc
+; GFX9-NEXT:scratch_load_dword v0, off, vcc_hi offset:16 glc
 ; GFX9-NEXT:s_waitcnt vmcnt(0)
 ; GFX9-NEXT:s_mov_b32 s0, 0
 ; GFX9-NEXT:s_mov_b32 s1, s0
@@ -1211,7 +1211,7 @@ define amdgpu_kernel void 
@zero_init_large_offset_kernel() {
 ; GFX10-NEXT:s_addc_u32 s1, s1, 0
 ; GFX10-NEXT:s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
 ; GFX10-NEXT:s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1
-; GFX10-NEXT:scratch_load_dword v0, off, off offset:4 glc dlc
+; GFX10-NEXT:scratch_load_dword v0, off, off offset:16 glc dlc
 ; GFX10-NEXT:s_waitcnt vmcnt(0)
 ; GFX10-NEXT:s_mov_b32 s0, 0
 ; GFX10-NEXT:s_movk_i32 vcc_lo, 0x4010
@@ -1242,7 +1242,7 @@ define amdgpu_kernel void 
@zero_init_large_offset_kernel() {
 ; GFX9-PAL-NEXT:s_and_b32 s3, s3, 0x
 ; GFX9-PAL-NEXT:s_add_u32 flat_scratch_lo, s2, s1
 ; GFX9-PAL-NEXT:s_addc_u32 flat_scratch_hi, s3, 0
-; GFX9-PAL-NEXT:scratch_load_dword v0, off, vcc_hi offset:4 glc
+; GFX9-PAL-NEXT:scratch_load_dword v0, off, vcc_hi offset:16 glc
 ; GFX9-PAL-NEXT:s_waitcnt vmcnt(0)
 ; GFX9-PAL-NEXT:s_mov_b32 s1, s0
 ; GFX9-PAL-NEXT:s_mov_b32 s2, s0
@@ -1272,7 +1272,7 @@ define amdgpu_kernel void 
@zero_init_large_offset_kernel() {
 ; GFX10-PAL-NEXT:s_addc_u32 s3, s3, 0
 ; GFX10-PAL-NEXT:s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
 ; GFX10-PAL-NEXT:s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
-; GFX10-PAL-NEXT:scratch_load_dword v0, off, off offset:4 glc dlc
+; GFX10-PAL-NEXT:scratch_load_dword v0, off, off offset:16 glc dlc
 ; GFX10-PAL-NEXT:s_waitcnt vmcnt(0)
 ; GFX10-PAL-NEXT:s_mov_b32 s0, 0
 ; GFX10-PAL-NEXT:s_movk_i32 vcc_lo, 0x4010

diff  --git a/llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll 
b/llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll
index 5e4b5f70de0b..4385500e71b1 100644
--- a/llvm/test/CodeGen/AMDGPU/local-st

[llvm-branch-commits] [llvm] eb66bf0 - [AMDGPU] Print SCRATCH_EN field after the kernel

2020-12-15 Thread Stanislav Mekhanoshin via llvm-branch-commits

Author: Stanislav Mekhanoshin
Date: 2020-12-15T22:44:30-08:00
New Revision: eb66bf0802f96458b24a9c6eb9bd6451d8f90110

URL: 
https://github.com/llvm/llvm-project/commit/eb66bf0802f96458b24a9c6eb9bd6451d8f90110
DIFF: 
https://github.com/llvm/llvm-project/commit/eb66bf0802f96458b24a9c6eb9bd6451d8f90110.diff

LOG: [AMDGPU] Print SCRATCH_EN field after the kernel

Differential Revision: https://reviews.llvm.org/D93353

Added: 


Modified: 
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch-init.ll

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index a14f846b76d1..7ca049280744 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -538,6 +538,9 @@ bool AMDGPUAsmPrinter::runOnMachineFunction(MachineFunction 
&MF) {
 OutStreamer->emitRawComment(
   " WaveLimiterHint : " + Twine(MFI->needsWaveLimiter()), false);
 
+OutStreamer->emitRawComment(
+  " COMPUTE_PGM_RSRC2:SCRATCH_EN: " +
+  Twine(G_00B84C_SCRATCH_EN(CurrentProgramInfo.ComputePGMRSrc2)), false);
 OutStreamer->emitRawComment(
   " COMPUTE_PGM_RSRC2:USER_SGPR: " +
   Twine(G_00B84C_USER_SGPR(CurrentProgramInfo.ComputePGMRSrc2)), false);

diff  --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch-init.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch-init.ll
index 39029e359889..455c19fcdfc2 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch-init.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch-init.ll
@@ -3,7 +3,14 @@
 ; Make sure flat_scratch_init is set
 
 ; GCN-LABEL: {{^}}stack_object_addrspacecast_in_kernel_no_calls:
-; GCN: .amdhsa_user_sgpr_flat_scratch_init 1
+; GCN: s_add_u32 flat_scratch_lo, s4, s7
+; GCN: s_addc_u32 flat_scratch_hi, s5, 0
+; GCN: flat_store_dword
+; GCN: .amdhsa_user_sgpr_flat_scratch_init 1
+; GCN: .amdhsa_system_sgpr_private_segment_wavefront_offset
+; GCN-NOT: .amdhsa_reserve_flat_scratch
+; GCN: COMPUTE_PGM_RSRC2:SCRATCH_EN: 1
+; GCN: COMPUTE_PGM_RSRC2:USER_SGPR: 6
 define amdgpu_kernel void @stack_object_addrspacecast_in_kernel_no_calls() {
   %alloca = alloca i32, addrspace(5)
   %cast = addrspacecast i32 addrspace(5)* %alloca to i32*
@@ -13,7 +20,15 @@ define amdgpu_kernel void 
@stack_object_addrspacecast_in_kernel_no_calls() {
 
 ; TODO: Could optimize out in this case
 ; GCN-LABEL: {{^}}stack_object_in_kernel_no_calls:
-; GCN: .amdhsa_user_sgpr_flat_scratch_init 1
+; GCN: s_add_u32 flat_scratch_lo, s4, s7
+; GCN: s_addc_u32 flat_scratch_hi, s5, 0
+; GCN: buffer_store_dword
+; GCN: .amdhsa_user_sgpr_private_segment_buffer 1
+; GCN: .amdhsa_user_sgpr_flat_scratch_init 1
+; GCN: .amdhsa_system_sgpr_private_segment_wavefront_offset 1
+; GCN-NOT: .amdhsa_reserve_flat_scratch
+; GCN: COMPUTE_PGM_RSRC2:SCRATCH_EN: 1
+; GCN: COMPUTE_PGM_RSRC2:USER_SGPR: 6
 define amdgpu_kernel void @stack_object_in_kernel_no_calls() {
   %alloca = alloca i32, addrspace(5)
   store volatile i32 0, i32 addrspace(5)* %alloca
@@ -21,7 +36,13 @@ define amdgpu_kernel void @stack_object_in_kernel_no_calls() 
{
 }
 
 ; GCN-LABEL: {{^}}kernel_no_calls_no_stack:
-; GCN: .amdhsa_user_sgpr_flat_scratch_init 0
+; GCN-NOT: flat_scratch
+; GCN: .amdhsa_user_sgpr_private_segment_buffer 1
+; GCN: .amdhsa_user_sgpr_flat_scratch_init 0
+; GCN: .amdhsa_system_sgpr_private_segment_wavefront_offset 0
+; GCN: .amdhsa_reserve_flat_scratch 0
+; GCN: COMPUTE_PGM_RSRC2:SCRATCH_EN: 0
+; GCN: COMPUTE_PGM_RSRC2:USER_SGPR: 4
 define amdgpu_kernel void @kernel_no_calls_no_stack() {
   ret void
 }



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] ae8f4b2 - [AMDGPU] Folding of FI operand with flat scratch

2020-12-22 Thread Stanislav Mekhanoshin via llvm-branch-commits

Author: Stanislav Mekhanoshin
Date: 2020-12-22T10:48:04-08:00
New Revision: ae8f4b2178c46da1f10eb9279c9b44fab8b85417

URL: 
https://github.com/llvm/llvm-project/commit/ae8f4b2178c46da1f10eb9279c9b44fab8b85417
DIFF: 
https://github.com/llvm/llvm-project/commit/ae8f4b2178c46da1f10eb9279c9b44fab8b85417.diff

LOG: [AMDGPU] Folding of FI operand with flat scratch

Differential Revision: https://reviews.llvm.org/D93501

Added: 
llvm/test/CodeGen/AMDGPU/flat-scratch-fold-fi.mir

Modified: 
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
llvm/lib/Target/AMDGPU/SIInstrInfo.h
llvm/lib/Target/AMDGPU/SIInstrInfo.td
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
llvm/test/CodeGen/AMDGPU/frame-index-elimination.ll

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp 
b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index bfba432848d4..06cce54e540c 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -172,9 +172,23 @@ static bool frameIndexMayFold(const SIInstrInfo *TII,
   const MachineInstr &UseMI,
   int OpNo,
   const MachineOperand &OpToFold) {
-  return OpToFold.isFI() &&
-TII->isMUBUF(UseMI) &&
-OpNo == AMDGPU::getNamedOperandIdx(UseMI.getOpcode(), 
AMDGPU::OpName::vaddr);
+  if (!OpToFold.isFI())
+return false;
+
+  if (TII->isMUBUF(UseMI))
+return OpNo == AMDGPU::getNamedOperandIdx(UseMI.getOpcode(),
+  AMDGPU::OpName::vaddr);
+  if (!TII->isFLATScratch(UseMI))
+return false;
+
+  int SIdx = AMDGPU::getNamedOperandIdx(UseMI.getOpcode(),
+AMDGPU::OpName::saddr);
+  if (OpNo == SIdx)
+return true;
+
+  int VIdx = AMDGPU::getNamedOperandIdx(UseMI.getOpcode(),
+AMDGPU::OpName::vaddr);
+  return OpNo == VIdx && SIdx == -1;
 }
 
 FunctionPass *llvm::createSIFoldOperandsPass() {
@@ -631,25 +645,36 @@ void SIFoldOperands::foldOperand(
 // Sanity check that this is a stack access.
 // FIXME: Should probably use stack pseudos before frame lowering.
 
-if (TII->getNamedOperand(*UseMI, AMDGPU::OpName::srsrc)->getReg() !=
-MFI->getScratchRSrcReg())
-  return;
+if (TII->isMUBUF(*UseMI)) {
+  if (TII->getNamedOperand(*UseMI, AMDGPU::OpName::srsrc)->getReg() !=
+  MFI->getScratchRSrcReg())
+return;
 
-// Ensure this is either relative to the current frame or the current wave.
-MachineOperand &SOff =
-*TII->getNamedOperand(*UseMI, AMDGPU::OpName::soffset);
-if ((!SOff.isReg() || SOff.getReg() != MFI->getStackPtrOffsetReg()) &&
-(!SOff.isImm() || SOff.getImm() != 0))
-  return;
+  // Ensure this is either relative to the current frame or the current
+  // wave.
+  MachineOperand &SOff =
+  *TII->getNamedOperand(*UseMI, AMDGPU::OpName::soffset);
+  if ((!SOff.isReg() || SOff.getReg() != MFI->getStackPtrOffsetReg()) &&
+  (!SOff.isImm() || SOff.getImm() != 0))
+return;
+
+  // If this is relative to the current wave, update it to be relative to
+  // the current frame.
+  if (SOff.isImm())
+SOff.ChangeToRegister(MFI->getStackPtrOffsetReg(), false);
+}
 
 // A frame index will resolve to a positive constant, so it should always 
be
 // safe to fold the addressing mode, even pre-GFX9.
 UseMI->getOperand(UseOpIdx).ChangeToFrameIndex(OpToFold.getIndex());
 
-// If this is relative to the current wave, update it to be relative to the
-// current frame.
-if (SOff.isImm())
-  SOff.ChangeToRegister(MFI->getStackPtrOffsetReg(), false);
+if (TII->isFLATScratch(*UseMI) &&
+AMDGPU::getNamedOperandIdx(UseMI->getOpcode(),
+   AMDGPU::OpName::vaddr) != -1) {
+  unsigned NewOpc = AMDGPU::getFlatScratchInstSSfromSV(UseMI->getOpcode());
+  UseMI->setDesc(TII->get(NewOpc));
+}
+
 return;
   }
 

diff  --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index 4625cefa1e3e..75aedee1ec6b 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
@@ -1184,6 +1184,9 @@ namespace AMDGPU {
   LLVM_READONLY
   int getFlatScratchInstSTfromSS(uint16_t Opcode);
 
+  LLVM_READONLY
+  int getFlatScratchInstSSfromSV(uint16_t Opcode);
+
   const uint64_t RSRC_DATA_FORMAT = 0xf000LL;
   const uint64_t RSRC_ELEMENT_SIZE_SHIFT = (32 + 19);
   const uint64_t RSRC_INDEX_STRIDE_SHIFT = (32 + 21);

diff  --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.td 
b/llvm/lib/Target/AMDGPU/SIInstrInfo.td
index 746d08b8ce0e..e48138e56d71 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.td
@@ -2524,6 +2524,13 @@ def getFlatScratchInstSTfromSS : InstrMapping

[llvm-branch-commits] [llvm] ca4bf58 - [AMDGPU] Support unaligned flat scratch in TLI

2020-12-22 Thread Stanislav Mekhanoshin via llvm-branch-commits

Author: Stanislav Mekhanoshin
Date: 2020-12-22T16:12:31-08:00
New Revision: ca4bf58e4ee5951473a861716193063c5ef83e9a

URL: 
https://github.com/llvm/llvm-project/commit/ca4bf58e4ee5951473a861716193063c5ef83e9a
DIFF: 
https://github.com/llvm/llvm-project/commit/ca4bf58e4ee5951473a861716193063c5ef83e9a.diff

LOG: [AMDGPU] Support unaligned flat scratch in TLI

Adjust SITargetLowering::allowsMisalignedMemoryAccessesImpl for
unaligned flat scratch support. Mostly needed for global isel.

Differential Revision: https://reviews.llvm.org/D93669

Added: 


Modified: 
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll
llvm/test/CodeGen/AMDGPU/unaligned-load-store.ll
llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/adjust-alloca-alignment.ll

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 5fb1924bdd9f..81fdfa0343b3 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -1470,12 +1470,21 @@ bool 
SITargetLowering::allowsMisalignedMemoryAccessesImpl(
 }
   }
 
+  if (AddrSpace == AMDGPUAS::PRIVATE_ADDRESS) {
+bool AlignedBy4 = Alignment >= Align(4);
+if (IsFast)
+  *IsFast = AlignedBy4;
+
+return AlignedBy4 ||
+   Subtarget->enableFlatScratch() ||
+   Subtarget->hasUnalignedScratchAccess();
+  }
+
   // FIXME: We have to be conservative here and assume that flat operations
   // will access scratch.  If we had access to the IR function, then we
   // could determine if any private memory was used in the function.
-  if (!Subtarget->hasUnalignedScratchAccess() &&
-  (AddrSpace == AMDGPUAS::PRIVATE_ADDRESS ||
-   AddrSpace == AMDGPUAS::FLAT_ADDRESS)) {
+  if (AddrSpace == AMDGPUAS::FLAT_ADDRESS &&
+  !Subtarget->hasUnalignedScratchAccess()) {
 bool AlignedBy4 = Alignment >= Align(4);
 if (IsFast)
   *IsFast = AlignedBy4;

diff  --git a/llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll 
b/llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll
index 271f6c703980..8e37b413ddf5 100644
--- a/llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll
+++ b/llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll
@@ -271,16 +271,9 @@ define amdgpu_kernel void @vload2_private(i16 
addrspace(1)* nocapture readonly %
 ; FLATSCR-NEXT:s_waitcnt vmcnt(0)
 ; FLATSCR-NEXT:scratch_store_short off, v0, vcc_hi offset:8
 ; FLATSCR-NEXT:s_mov_b32 vcc_hi, 0
-; FLATSCR-NEXT:scratch_load_ushort v0, off, vcc_hi offset:4
+; FLATSCR-NEXT:scratch_load_dword v0, off, vcc_hi offset:4
 ; FLATSCR-NEXT:s_mov_b32 vcc_hi, 0
-; FLATSCR-NEXT:scratch_load_ushort v3, off, vcc_hi offset:6
-; FLATSCR-NEXT:s_mov_b32 vcc_hi, 0
-; FLATSCR-NEXT:s_waitcnt vmcnt(1)
-; FLATSCR-NEXT:v_and_b32_e32 v0, 0x, v0
-; FLATSCR-NEXT:s_waitcnt vmcnt(0)
-; FLATSCR-NEXT:v_mov_b32_e32 v1, v3
-; FLATSCR-NEXT:scratch_load_short_d16_hi v1, off, vcc_hi offset:8
-; FLATSCR-NEXT:v_lshl_or_b32 v0, v3, 16, v0
+; FLATSCR-NEXT:scratch_load_dword v1, off, vcc_hi offset:6
 ; FLATSCR-NEXT:s_waitcnt vmcnt(0)
 ; FLATSCR-NEXT:global_store_dwordx2 v2, v[0:1], s[2:3]
 ; FLATSCR-NEXT:s_endpgm

diff  --git a/llvm/test/CodeGen/AMDGPU/unaligned-load-store.ll 
b/llvm/test/CodeGen/AMDGPU/unaligned-load-store.ll
index 5d5cfd318edf..645eead8c297 100644
--- a/llvm/test/CodeGen/AMDGPU/unaligned-load-store.ll
+++ b/llvm/test/CodeGen/AMDGPU/unaligned-load-store.ll
@@ -1,6 +1,7 @@
-; RUN: llc -march=amdgcn -verify-machineinstrs< %s | FileCheck 
-check-prefix=SI -check-prefix=ALIGNED %s
-; RUN: llc -march=amdgcn -mcpu=bonaire -mattr=+unaligned-access-mode 
-verify-machineinstrs< %s | FileCheck -check-prefix=SI -check-prefix=UNALIGNED 
%s
-; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global 
-verify-machineinstrs< %s | FileCheck -check-prefix=SI -check-prefix=ALIGNED %s
+; RUN: llc -march=amdgcn -verify-machineinstrs< %s | FileCheck 
-check-prefixes=SI,MUBUF,ALIGNED %s
+; RUN: llc -march=amdgcn -mcpu=bonaire -mattr=+unaligned-access-mode 
-verify-machineinstrs< %s | FileCheck -check-prefixes=SI,MUBUF,UNALIGNED %s
+; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global 
-verify-machineinstrs< %s | FileCheck -check-prefixes=SI,MUBUF,ALIGNED %s
+; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global 
-amdgpu-enable-flat-scratch -verify-machineinstrs < %s | FileCheck 
-check-prefixes=SI,FLATSCR,ALIGNED %s
 
 ; SI-LABEL: {{^}}local_unaligned_load_store_i16:
 ; SI: ds_read_u8
@@ -602,64 +603,70 @@ define amdgpu_kernel void @local_store_align1_v16i8(<16 x 
i8> addrspace(3)* %out
 }
 
 ; SI-LABEL: {{^}}private_load_align1_f64:
-; SI: buffer_load_ubyte
-; SI: buffer_load_ubyte
-; SI: buffer_load_ubyte
-; SI: buffer_load_ubyte
-; SI: buffer_load_ubyte
-; SI: buffer_load_ubyte
-; SI: buffer_load_ubyte
-; SI: buffer_load_ubyte
+; MUBUF: buffer_load_ubyte
+; MUB

[llvm-branch-commits] [llvm] d15119a - [AMDGPU][GlobalISel] GlobalISel for flat scratch

2020-12-22 Thread Stanislav Mekhanoshin via llvm-branch-commits

Author: Stanislav Mekhanoshin
Date: 2020-12-22T16:33:06-08:00
New Revision: d15119a02d92274cd7f779f4bb8485b1020110e0

URL: 
https://github.com/llvm/llvm-project/commit/d15119a02d92274cd7f779f4bb8485b1020110e0
DIFF: 
https://github.com/llvm/llvm-project/commit/d15119a02d92274cd7f779f4bb8485b1020110e0.diff

LOG: [AMDGPU][GlobalISel] GlobalISel for flat scratch

It does not seem to fold offsets but this is not specific
to the flat scratch as getPtrBaseWithConstantOffset() does
not return the split for these tests unlike its SDag
counterpart.

Differential Revision: https://reviews.llvm.org/D93670

Added: 
llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch.ll

Modified: 
llvm/lib/Target/AMDGPU/AMDGPUGISel.td
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/AMDGPUGISel.td 
b/llvm/lib/Target/AMDGPU/AMDGPUGISel.td
index 661b96a6a98e..bba03736d01a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUGISel.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUGISel.td
@@ -85,6 +85,14 @@ def gi_mubuf_scratch_offen :
 GIComplexOperandMatcher,
 GIComplexPatternEquiv;
 
+def gi_flat_scratch_offset :
+GIComplexOperandMatcher,
+GIComplexPatternEquiv;
+
+def gi_flat_scratch_saddr :
+GIComplexOperandMatcher,
+GIComplexPatternEquiv;
+
 def gi_ds_1addr_1offset :
 GIComplexOperandMatcher,
 GIComplexPatternEquiv;

diff  --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
index b157c03672d1..6c2ff0972ae5 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
@@ -3589,6 +3589,67 @@ 
AMDGPUInstructionSelector::selectGlobalSAddr(MachineOperand &Root) const {
}}};
 }
 
+InstructionSelector::ComplexRendererFns
+AMDGPUInstructionSelector::selectScratchSAddr(MachineOperand &Root) const {
+  Register Addr = Root.getReg();
+  Register PtrBase;
+  int64_t ConstOffset;
+  int64_t ImmOffset = 0;
+
+  // Match the immediate offset first, which canonically is moved as low as
+  // possible.
+  std::tie(PtrBase, ConstOffset) = getPtrBaseWithConstantOffset(Addr, *MRI);
+
+  if (ConstOffset != 0 &&
+  TII.isLegalFLATOffset(ConstOffset, AMDGPUAS::PRIVATE_ADDRESS, true)) {
+Addr = PtrBase;
+ImmOffset = ConstOffset;
+  }
+
+  auto AddrDef = getDefSrcRegIgnoringCopies(Addr, *MRI);
+  if (!AddrDef)
+return None;
+
+  if (AddrDef->MI->getOpcode() == AMDGPU::G_FRAME_INDEX) {
+int FI = AddrDef->MI->getOperand(1).getIndex();
+return {{
+[=](MachineInstrBuilder &MIB) { MIB.addFrameIndex(FI); }, // saddr
+[=](MachineInstrBuilder &MIB) { MIB.addImm(ImmOffset); } // offset
+}};
+  }
+
+  Register SAddr = AddrDef->Reg;
+
+  if (AddrDef->MI->getOpcode() == AMDGPU::G_PTR_ADD) {
+Register LHS = AddrDef->MI->getOperand(1).getReg();
+Register RHS = AddrDef->MI->getOperand(2).getReg();
+auto LHSDef = getDefSrcRegIgnoringCopies(LHS, *MRI);
+auto RHSDef = getDefSrcRegIgnoringCopies(RHS, *MRI);
+
+if (LHSDef && RHSDef &&
+LHSDef->MI->getOpcode() == AMDGPU::G_FRAME_INDEX &&
+isSGPR(RHSDef->Reg)) {
+  int FI = LHSDef->MI->getOperand(1).getIndex();
+  MachineInstr &I = *Root.getParent();
+  MachineBasicBlock *BB = I.getParent();
+  const DebugLoc &DL = I.getDebugLoc();
+  SAddr = MRI->createVirtualRegister(&AMDGPU::SReg_32RegClass);
+
+  BuildMI(*BB, &I, DL, TII.get(AMDGPU::S_ADD_U32), SAddr)
+.addFrameIndex(FI)
+.addReg(RHSDef->Reg);
+}
+  }
+
+  if (!isSGPR(SAddr))
+return None;
+
+  return {{
+  [=](MachineInstrBuilder &MIB) { MIB.addReg(SAddr); }, // saddr
+  [=](MachineInstrBuilder &MIB) { MIB.addImm(ImmOffset); } // offset
+  }};
+}
+
 static bool isStackPtrRelative(const MachinePointerInfo &PtrInfo) {
   auto PSV = PtrInfo.V.dyn_cast();
   return PSV && PSV->isStack();

diff  --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h 
b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h
index c575e7e9c8a5..c6b26ea70659 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h
@@ -200,6 +200,9 @@ class AMDGPUInstructionSelector final : public 
InstructionSelector {
   InstructionSelector::ComplexRendererFns
   selectGlobalSAddr(MachineOperand &Root) const;
 
+  InstructionSelector::ComplexRendererFns
+  selectScratchSAddr(MachineOperand &Root) const;
+
   InstructionSelector::ComplexRendererFns
   selectMUBUFScratchOffen(MachineOperand &Root) const;
   InstructionSelector::ComplexRendererFns

diff  --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index 9b39b86ae28f..28cd867d40be 100644
--- a/llvm/lib/Target/AMDGP

[llvm-branch-commits] [llvm] 747f67e - [AMDGPU] Fix adjustWritemask subreg handling

2020-12-23 Thread Stanislav Mekhanoshin via llvm-branch-commits

Author: Stanislav Mekhanoshin
Date: 2020-12-23T14:43:31-08:00
New Revision: 747f67e034a924cf308f4c0f1bb6b1fa46bd9fbe

URL: 
https://github.com/llvm/llvm-project/commit/747f67e034a924cf308f4c0f1bb6b1fa46bd9fbe
DIFF: 
https://github.com/llvm/llvm-project/commit/747f67e034a924cf308f4c0f1bb6b1fa46bd9fbe.diff

LOG: [AMDGPU] Fix adjustWritemask subreg handling

If we happen to extract a non-dword subreg that breaks the
logic of the function and it may shrink the dmask because
it does not recognize the use of a lane(s).

This bug is next to impossible to trigger with the current
lowering in the BE, but it breaks in one of my future patches.

Differential Revision: https://reviews.llvm.org/D93782

Added: 


Modified: 
llvm/lib/Target/AMDGPU/SIISelLowering.cpp

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 81fdfa0343b3..c7abc585d0d1 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -10862,7 +10862,7 @@ SDValue SITargetLowering::PerformDAGCombine(SDNode *N,
 /// Helper function for adjustWritemask
 static unsigned SubIdx2Lane(unsigned Idx) {
   switch (Idx) {
-  default: return 0;
+  default: return ~0u;
   case AMDGPU::sub0: return 0;
   case AMDGPU::sub1: return 1;
   case AMDGPU::sub2: return 2;
@@ -10922,6 +10922,8 @@ SDNode *SITargetLowering::adjustWritemask(MachineSDNode 
*&Node,
 // in OldDmask, so it can be any of X,Y,Z,W; Lane==1 is the second bit
 // set, etc.
 Lane = SubIdx2Lane(I->getConstantOperandVal(1));
+if (Lane == ~0u)
+  return Node;
 
 // Check if the use is for the TFE/LWE generated result at VGPRn+1.
 if (UsesTFC && Lane == TFCLane) {



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] dd89249 - [AMDGPU] Annotate vgpr<->agpr spills in asm

2020-12-07 Thread Stanislav Mekhanoshin via llvm-branch-commits

Author: Stanislav Mekhanoshin
Date: 2020-12-07T11:25:25-08:00
New Revision: dd892494983a2e64d1e1eb3d05ce9577357336d2

URL: 
https://github.com/llvm/llvm-project/commit/dd892494983a2e64d1e1eb3d05ce9577357336d2
DIFF: 
https://github.com/llvm/llvm-project/commit/dd892494983a2e64d1e1eb3d05ce9577357336d2.diff

LOG: [AMDGPU] Annotate vgpr<->agpr spills in asm

Differential Revision: https://reviews.llvm.org/D92125

Added: 


Modified: 
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
llvm/test/CodeGen/AMDGPU/spill-agpr.ll
llvm/test/CodeGen/AMDGPU/spill-vgpr-to-agpr.ll

Removed: 




diff  --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
index 9d7a041390ca..18be7c23c94e 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
@@ -697,8 +697,10 @@ static MachineInstrBuilder spillVGPRtoAGPR(const 
GCNSubtarget &ST,
   unsigned Opc = (IsStore ^ TRI->isVGPR(MRI, Reg)) ? 
AMDGPU::V_ACCVGPR_WRITE_B32
: 
AMDGPU::V_ACCVGPR_READ_B32;
 
-  return BuildMI(*MBB, MI, MI->getDebugLoc(), TII->get(Opc), Dst)
-   .addReg(Src, getKillRegState(IsKill));
+  auto MIB = BuildMI(*MBB, MI, MI->getDebugLoc(), TII->get(Opc), Dst)
+   .addReg(Src, getKillRegState(IsKill));
+  MIB->setAsmPrinterFlag(MachineInstr::ReloadReuse);
+  return MIB;
 }
 
 // This 
diff ers from buildSpillLoadStore by only scavenging a VGPR. It does not
@@ -871,10 +873,12 @@ void 
SIRegisterInfo::buildSpillLoadStore(MachineBasicBlock::iterator MI,
   RS->setRegUsed(TmpReg);
 }
 if (IsStore) {
-  auto AccRead = BuildMI(*MBB, MI, DL, 
TII->get(AMDGPU::V_ACCVGPR_READ_B32), TmpReg)
+  auto AccRead = BuildMI(*MBB, MI, DL,
+ TII->get(AMDGPU::V_ACCVGPR_READ_B32), TmpReg)
 .addReg(SubReg, getKillRegState(IsKill));
   if (NeedSuperRegDef)
 AccRead.addReg(ValueReg, RegState::ImplicitDefine);
+  AccRead->setAsmPrinterFlag(MachineInstr::ReloadReuse);
 }
 SubReg = TmpReg;
   }
@@ -908,10 +912,12 @@ void 
SIRegisterInfo::buildSpillLoadStore(MachineBasicBlock::iterator MI,
   if (!IsAGPR && NeedSuperRegDef)
 MIB.addReg(ValueReg, RegState::ImplicitDefine);
 
-  if (!IsStore && TmpReg != AMDGPU::NoRegister)
+  if (!IsStore && TmpReg != AMDGPU::NoRegister) {
 MIB = BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_ACCVGPR_WRITE_B32),
   FinalReg)
   .addReg(TmpReg, RegState::Kill);
+MIB->setAsmPrinterFlag(MachineInstr::ReloadReuse);
+  }
 } else {
   if (NeedSuperRegDef)
 MIB.addReg(ValueReg, RegState::ImplicitDefine);

diff  --git a/llvm/test/CodeGen/AMDGPU/spill-agpr.ll 
b/llvm/test/CodeGen/AMDGPU/spill-agpr.ll
index 3e7b381a45fe..511d02a104b3 100644
--- a/llvm/test/CodeGen/AMDGPU/spill-agpr.ll
+++ b/llvm/test/CodeGen/AMDGPU/spill-agpr.ll
@@ -5,10 +5,10 @@
 ; A2M-DAG:s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD0
 ; A2M-DAG:s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD1
 ; A2V-NOT:SCRATCH_RSRC
-; GFX908-DAG: v_accvgpr_read_b32 v[[VSPILL:[0-9]+]], a0
+; GFX908-DAG: v_accvgpr_read_b32 v[[VSPILL:[0-9]+]], a0 ; Reload Reuse
 ; A2M:buffer_store_dword v[[VSPILL]], off, s[{{[0-9:]+}}], 0 
offset:[[FI:[0-9]+]] ; 4-byte Folded Spill
 ; A2M:buffer_load_dword v[[VSPILL:[0-9]+]], off, s[{{[0-9:]+}}], 0 
offset:[[FI]] ; 4-byte Folded Reload
-; GFX908: v_accvgpr_write_b32 a{{[0-9]+}}, v[[VSPILL]]
+; GFX908: v_accvgpr_write_b32 a{{[0-9]+}}, v[[VSPILL]] ; Reload Reuse
 ; A2V:ScratchSize: 0
 define amdgpu_kernel void @max_24regs_32a_used(<16 x float> addrspace(1)* 
%arg, float addrspace(1)* %out) #0 {
 bb:
@@ -34,10 +34,10 @@ bb:
 ; A2M-DAG:s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD0
 ; A2M-DAG:s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD1
 ; A2V-NOT:SCRATCH_RSRC
-; GFX908-DAG: v_accvgpr_read_b32 v[[VSPILL:[0-9]+]], a{{[0-9]+}}
+; GFX908-DAG: v_accvgpr_read_b32 v[[VSPILL:[0-9]+]], a{{[0-9]+}} ; Reload Reuse
 ; A2M:buffer_store_dword v[[VSPILL]], off, s[{{[0-9:]+}}], 0 
offset:[[FI:[0-9]+]] ; 4-byte Folded Spill
 ; A2M:buffer_load_dword v[[VSPILL:[0-9]+]], off, s[{{[0-9:]+}}], 0 
offset:[[FI]] ; 4-byte Folded Reload
-; A2V:v_accvgpr_write_b32 a{{[0-9]+}}, v[[VSPILL]]
+; A2V:v_accvgpr_write_b32 a{{[0-9]+}}, v[[VSPILL]] ; Reload Reuse
 ; A2V:ScratchSize: 0
 define amdgpu_kernel void @max_12regs_13a_used(i32 %cond, <4 x float> 
addrspace(1)* %arg, <4 x float> addrspace(1)* %out) #2 {
 bb:
@@ -55,8 +55,7 @@ use:
 st:
   %gep1 = getelementptr <4 x float>, <4 x float> addrspace(1)* %out, i64 16
   %gep2 = getelementptr <4 x float>, <4 x float> addrspace(1)* %out, i64 32
-  store <4 x float> %mai.1, <4 x float> addrspace(1)* %gep1
-  store <4 x float> %mai.2, <4 x flo

[llvm-branch-commits] [llvm] 87d7757 - [SLP] Control maximum vectorization factor from TTI

2020-12-14 Thread Stanislav Mekhanoshin via llvm-branch-commits

Author: Stanislav Mekhanoshin
Date: 2020-12-14T08:49:40-08:00
New Revision: 87d7757bbe14fed420092071ded3430072053316

URL: 
https://github.com/llvm/llvm-project/commit/87d7757bbe14fed420092071ded3430072053316
DIFF: 
https://github.com/llvm/llvm-project/commit/87d7757bbe14fed420092071ded3430072053316.diff

LOG: [SLP] Control maximum vectorization factor from TTI

D82227 has added a proper check to limit PHI vectorization to the
maximum vector register size. That unfortunately resulted in at
least a couple of regressions on SystemZ and x86.

This change reverts PHI handling from D82227 and replaces it with
a more general check in SLPVectorizerPass::tryToVectorizeList().
Moved to tryToVectorizeList() it allows to restart vectorization
if initial chunk fails.

However, this function is more general and handles not only PHI
but everything which SLP handles. If vectorization factor would
be limited to maximum vector register size it would limit much
more vectorization than before leading to further regressions.
Therefore a new TTI callback getMaximumVF() is added with the
default 0 to preserve current behavior and limit nothing. Then
targets can decide what is better for them.

The callback gets ElementSize just like a similar getMinimumVF()
function and the main opcode of the chain. The latter is to avoid
regressions at least on the AMDGPU. We can have loads and stores
up to 128 bit wide, and <2 x 16> bit vector math on some
subtargets, where the rest shall not be vectorized. I.e. we need
to differentiate based on the element size and operation itself.

Differential Revision: https://reviews.llvm.org/D92059

Added: 


Modified: 
llvm/include/llvm/Analysis/TargetTransformInfo.h
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
llvm/lib/Analysis/TargetTransformInfo.cpp
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
llvm/test/Transforms/SLPVectorizer/AMDGPU/add_sub_sat.ll
llvm/test/Transforms/SLPVectorizer/AMDGPU/round.ll
llvm/test/Transforms/SLPVectorizer/slp-max-phi-size.ll

Removed: 




diff  --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h 
b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 3ba77c9a8dc9..b9b9df35cdb0 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -941,6 +941,11 @@ class TargetTransformInfo {
   /// applies when shouldMaximizeVectorBandwidth returns true.
   unsigned getMinimumVF(unsigned ElemWidth) const;
 
+  /// \return The maximum vectorization factor for types of given element
+  /// bit width and opcode, or 0 if there is no maximum VF.
+  /// Currently only used by the SLP vectorizer.
+  unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const;
+
   /// \return True if it should be considered for address type promotion.
   /// \p AllowPromotionWithoutCommonHeader Set true if promoting \p I is
   /// profitable without finding other extensions fed by the same input.
@@ -1498,6 +1503,7 @@ class TargetTransformInfo::Concept {
   virtual unsigned getMinVectorRegisterBitWidth() = 0;
   virtual bool shouldMaximizeVectorBandwidth(bool OptSize) const = 0;
   virtual unsigned getMinimumVF(unsigned ElemWidth) const = 0;
+  virtual unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const = 0;
   virtual bool shouldConsiderAddressTypePromotion(
   const Instruction &I, bool &AllowPromotionWithoutCommonHeader) = 0;
   virtual unsigned getCacheLineSize() const = 0;
@@ -1917,6 +1923,9 @@ class TargetTransformInfo::Model final : public 
TargetTransformInfo::Concept {
   unsigned getMinimumVF(unsigned ElemWidth) const override {
 return Impl.getMinimumVF(ElemWidth);
   }
+  unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const override {
+return Impl.getMaximumVF(ElemWidth, Opcode);
+  }
   bool shouldConsiderAddressTypePromotion(
   const Instruction &I, bool &AllowPromotionWithoutCommonHeader) override {
 return Impl.shouldConsiderAddressTypePromotion(

diff  --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h 
b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index b4847844cd0e..2c206094ac4a 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -356,6 +356,8 @@ class TargetTransformInfoImplBase {
 
   unsigned getMinimumVF(unsigned ElemWidth) const { return 0; }
 
+  unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const { return 0; 
}
+
   bool
   shouldConsiderAddressTypePromotion(const Instruction &I,
  bool &AllowPromotionWithoutCommonHeader) {

diff  --git a/llvm/lib/Analysis/TargetTransformInfo.cpp 
b/llvm/lib/Analysis/TargetTransformInfo.cpp
index f327d0cad426..086a212ee65b 100644
--- a/llvm/lib/Analysi

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Handle atomic sextload and zextload (PR #111721)

2024-10-09 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

> > Missing test for buffer loads?
> 
> Those are the gfx7 global cases. There aren't any atomic buffer load 
> intrinsics

But patch adds several MUBUF_Pseudo_Load_Pats which are not covered by tests?

https://github.com/llvm/llvm-project/pull/111721
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Fold more scalar operations on frame index to VALU (PR #115059)

2024-11-05 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/115059
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Default to selecting frame indexes to SGPRs (PR #115060)

2024-11-05 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/115060
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [AMDGPU] Simplify dpp builtin handling (PR #115090)

2024-11-05 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec edited 
https://github.com/llvm/llvm-project/pull/115090
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Expand flat atomics that may access private memory (PR #109407)

2024-09-23 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

> > Is it legal and defined behavior to target private memory with an atomic?
> 
> In the IR it would have to be, and this is the expected behavior in OpenMP 
> and C++. It's UB in OpenCL, and UB in CUDA/HIP for old style atomics, but 
> defined for new std::atomic style cases

Is there a plan that OpenCL and HIP FE will produce noalias metadata to avoid 
the expansion?

https://github.com/llvm/llvm-project/pull/109407
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Expand flat atomics that may access private memory (PR #109407)

2024-09-23 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.

Thanks. Can this be landed after 
https://github.com/llvm/llvm-project/pull/102462?

https://github.com/llvm/llvm-project/pull/109407
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Expand flat atomics that may access private memory (PR #109407)

2024-09-20 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

Is it legal and defined behavior to target private memory with an atomic?

https://github.com/llvm/llvm-project/pull/109407
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add baseline tests for cmpxchg custom expansion (PR #109408)

2024-09-20 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/109408
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add baseline tests for flat-may-alias private atomic expansions (PR #109406)

2024-09-20 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -0,0 +1,6911 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=amdgcn -mcpu=bonaire < %s | FileCheck -check-prefix=GCN1 %s

rampitec wrote:

Why GCN1 and GCN2? GFX7 and GFX8 are easier to understand.

https://github.com/llvm/llvm-project/pull/109406
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [AMDGPU] Simplify dpp builtin handling (PR #115090)

2024-11-06 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/115090

>From f3d99e4ae92e407ebc2ef3f6b8e4017b397d34eb Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Nov 2024 12:28:07 -0800
Subject: [PATCH] [AMDGPU] Simplify dpp builtin handling

DPP intrinsics can handle any type now, so no need to cast to
integer.

The caveat is that intrinsics only handle backend legal types,
but it does not work with i8 for example.
---
 clang/lib/CodeGen/CGBuiltin.cpp   | 23 ++-
 .../CodeGenOpenCL/builtins-amdgcn-gfx10.cl| 30 --
 .../test/CodeGenOpenCL/builtins-amdgcn-vi.cl  | 60 +++
 3 files changed, 38 insertions(+), 75 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 5c3df5124517d6..8c0e76c9e8c3d7 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -19211,37 +19211,24 @@ Value 
*CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
 getContext().GetBuiltinType(BuiltinID, Error, &ICEArguments);
 assert(Error == ASTContext::GE_None && "Should not codegen an error");
 llvm::Type *DataTy = ConvertType(E->getArg(0)->getType());
-unsigned Size = DataTy->getPrimitiveSizeInBits();
-llvm::Type *IntTy =
-llvm::IntegerType::get(Builder.getContext(), std::max(Size, 32u));
 Function *F =
 CGM.getIntrinsic(BuiltinID == AMDGPU::BI__builtin_amdgcn_mov_dpp8
  ? Intrinsic::amdgcn_mov_dpp8
  : Intrinsic::amdgcn_update_dpp,
- IntTy);
+ DataTy);
 assert(E->getNumArgs() == 5 || E->getNumArgs() == 6 ||
E->getNumArgs() == 2);
 bool InsertOld = BuiltinID == AMDGPU::BI__builtin_amdgcn_mov_dpp;
 if (InsertOld)
-  Args.push_back(llvm::PoisonValue::get(IntTy));
-for (unsigned I = 0; I != E->getNumArgs(); ++I) {
+  Args.push_back(llvm::PoisonValue::get(DataTy));
+Args.push_back(EmitScalarOrConstFoldImmArg(ICEArguments, 0, E));
+for (unsigned I = 1; I != E->getNumArgs(); ++I) {
   llvm::Value *V = EmitScalarOrConstFoldImmArg(ICEArguments, I, E);
-  if (I < (BuiltinID == AMDGPU::BI__builtin_amdgcn_update_dpp ? 2u : 1u) &&
-  Size < 32) {
-if (!DataTy->isIntegerTy())
-  V = Builder.CreateBitCast(
-  V, llvm::IntegerType::get(Builder.getContext(), Size));
-V = Builder.CreateZExtOrBitCast(V, IntTy);
-  }
   llvm::Type *ExpTy =
   F->getFunctionType()->getFunctionParamType(I + InsertOld);
   Args.push_back(Builder.CreateTruncOrBitCast(V, ExpTy));
 }
-Value *V = Builder.CreateCall(F, Args);
-if (Size < 32 && !DataTy->isIntegerTy())
-  V = Builder.CreateTrunc(
-  V, llvm::IntegerType::get(Builder.getContext(), Size));
-return Builder.CreateTruncOrBitCast(V, DataTy);
+return Builder.CreateCall(F, Args);
   }
   case AMDGPU::BI__builtin_amdgcn_permlane16:
   case AMDGPU::BI__builtin_amdgcn_permlanex16:
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl
index a4054cba236dd2..7e4ee6f4a942db 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl
@@ -36,45 +36,37 @@ void test_mov_dpp8_long(global long* out, long a) {
 }
 
 // CHECK-LABEL: @test_mov_dpp8_float(
-// CHECK:  %0 = bitcast float %a to i32
-// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 
1)
-// CHECK-NEXT: store i32 %1,
+// CHECK:  %0 = tail call{{.*}} float @llvm.amdgcn.mov.dpp8.f32(float %a, 
i32 1)
+// CHECK-NEXT: store float %0,
 void test_mov_dpp8_float(global float* out, float a) {
   *out = __builtin_amdgcn_mov_dpp8(a, 1);
 }
 
 // CHECK-LABEL: @test_mov_dpp8_double
-// CHECK:  %0 = bitcast double %x to i64
-// CHECK-NEXT: %1 = tail call{{.*}} i64 @llvm.amdgcn.mov.dpp8.i64(i64 %0, i32 
1)
-// CHECK-NEXT: store i64 %1,
+// CHECK:  %0 = tail call{{.*}} double @llvm.amdgcn.mov.dpp8.f64(double 
%x, i32 1)
+// CHECK-NEXT: store double %0,
 void test_mov_dpp8_double(double x, global double *p) {
   *p = __builtin_amdgcn_mov_dpp8(x, 1);
 }
 
 // CHECK-LABEL: @test_mov_dpp8_short
-// CHECK:  %0 = zext i16 %x to i32
-// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 
1)
-// CHECK-NEXT: %2 = trunc i32 %1 to i16
-// CHECK-NEXT: store i16 %2,
+// CHECK:  %0 = tail call{{.*}} i16 @llvm.amdgcn.mov.dpp8.i16(i16 %x, i32 
1)
+// CHECK-NEXT: store i16 %0,
 void test_mov_dpp8_short(short x, global short *p) {
   *p = __builtin_amdgcn_mov_dpp8(x, 1);
 }
 
 // CHECK-LABEL: @test_mov_dpp8_char
-// CHECK:  %0 = zext i8 %x to i32
-// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 
1)
-// CHECK-NEXT: %2 = trunc i32 %1 to i8
-// CHECK-NEXT: store i8 %2,
+// CHECK:  %0 = tail call{{.*}} i8 @llvm.amdgcn.mov.dpp8.i8(i8 %x, 

[llvm-branch-commits] [clang] [AMDGPU] Simplify dpp builtin handling (PR #115090)

2024-11-06 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/115090

>From 084e347f5fb6e9068313ad4dbc53b44c2d4cee69 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Nov 2024 12:28:07 -0800
Subject: [PATCH] [AMDGPU] Simplify dpp builtin handling

DPP intrinsics can handle any type now, so no need to cast to
integer.

The caveat is that intrinsics only handle backend legal types,
but it does not work with i8 for example.
---
 clang/lib/CodeGen/CGBuiltin.cpp   | 23 ++-
 .../CodeGenOpenCL/builtins-amdgcn-gfx10.cl| 30 --
 .../test/CodeGenOpenCL/builtins-amdgcn-vi.cl  | 60 +++
 3 files changed, 38 insertions(+), 75 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 82770a75af23e4..7e3e6463799fb6 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -19193,37 +19193,24 @@ Value 
*CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
 getContext().GetBuiltinType(BuiltinID, Error, &ICEArguments);
 assert(Error == ASTContext::GE_None && "Should not codegen an error");
 llvm::Type *DataTy = ConvertType(E->getArg(0)->getType());
-unsigned Size = DataTy->getPrimitiveSizeInBits();
-llvm::Type *IntTy =
-llvm::IntegerType::get(Builder.getContext(), std::max(Size, 32u));
 Function *F =
 CGM.getIntrinsic(BuiltinID == AMDGPU::BI__builtin_amdgcn_mov_dpp8
  ? Intrinsic::amdgcn_mov_dpp8
  : Intrinsic::amdgcn_update_dpp,
- IntTy);
+ DataTy);
 assert(E->getNumArgs() == 5 || E->getNumArgs() == 6 ||
E->getNumArgs() == 2);
 bool InsertOld = BuiltinID == AMDGPU::BI__builtin_amdgcn_mov_dpp;
 if (InsertOld)
-  Args.push_back(llvm::PoisonValue::get(IntTy));
-for (unsigned I = 0; I != E->getNumArgs(); ++I) {
+  Args.push_back(llvm::PoisonValue::get(DataTy));
+Args.push_back(EmitScalarOrConstFoldImmArg(ICEArguments, 0, E));
+for (unsigned I = 1; I != E->getNumArgs(); ++I) {
   llvm::Value *V = EmitScalarOrConstFoldImmArg(ICEArguments, I, E);
-  if (I < (BuiltinID == AMDGPU::BI__builtin_amdgcn_update_dpp ? 2u : 1u) &&
-  Size < 32) {
-if (!DataTy->isIntegerTy())
-  V = Builder.CreateBitCast(
-  V, llvm::IntegerType::get(Builder.getContext(), Size));
-V = Builder.CreateZExtOrBitCast(V, IntTy);
-  }
   llvm::Type *ExpTy =
   F->getFunctionType()->getFunctionParamType(I + InsertOld);
   Args.push_back(Builder.CreateTruncOrBitCast(V, ExpTy));
 }
-Value *V = Builder.CreateCall(F, Args);
-if (Size < 32 && !DataTy->isIntegerTy())
-  V = Builder.CreateTrunc(
-  V, llvm::IntegerType::get(Builder.getContext(), Size));
-return Builder.CreateTruncOrBitCast(V, DataTy);
+return Builder.CreateCall(F, Args);
   }
   case AMDGPU::BI__builtin_amdgcn_permlane16:
   case AMDGPU::BI__builtin_amdgcn_permlanex16:
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl
index a4054cba236dd2..7e4ee6f4a942db 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl
@@ -36,45 +36,37 @@ void test_mov_dpp8_long(global long* out, long a) {
 }
 
 // CHECK-LABEL: @test_mov_dpp8_float(
-// CHECK:  %0 = bitcast float %a to i32
-// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 
1)
-// CHECK-NEXT: store i32 %1,
+// CHECK:  %0 = tail call{{.*}} float @llvm.amdgcn.mov.dpp8.f32(float %a, 
i32 1)
+// CHECK-NEXT: store float %0,
 void test_mov_dpp8_float(global float* out, float a) {
   *out = __builtin_amdgcn_mov_dpp8(a, 1);
 }
 
 // CHECK-LABEL: @test_mov_dpp8_double
-// CHECK:  %0 = bitcast double %x to i64
-// CHECK-NEXT: %1 = tail call{{.*}} i64 @llvm.amdgcn.mov.dpp8.i64(i64 %0, i32 
1)
-// CHECK-NEXT: store i64 %1,
+// CHECK:  %0 = tail call{{.*}} double @llvm.amdgcn.mov.dpp8.f64(double 
%x, i32 1)
+// CHECK-NEXT: store double %0,
 void test_mov_dpp8_double(double x, global double *p) {
   *p = __builtin_amdgcn_mov_dpp8(x, 1);
 }
 
 // CHECK-LABEL: @test_mov_dpp8_short
-// CHECK:  %0 = zext i16 %x to i32
-// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 
1)
-// CHECK-NEXT: %2 = trunc i32 %1 to i16
-// CHECK-NEXT: store i16 %2,
+// CHECK:  %0 = tail call{{.*}} i16 @llvm.amdgcn.mov.dpp8.i16(i16 %x, i32 
1)
+// CHECK-NEXT: store i16 %0,
 void test_mov_dpp8_short(short x, global short *p) {
   *p = __builtin_amdgcn_mov_dpp8(x, 1);
 }
 
 // CHECK-LABEL: @test_mov_dpp8_char
-// CHECK:  %0 = zext i8 %x to i32
-// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 
1)
-// CHECK-NEXT: %2 = trunc i32 %1 to i8
-// CHECK-NEXT: store i8 %2,
+// CHECK:  %0 = tail call{{.*}} i8 @llvm.amdgcn.mov.dpp8.i8(i8 %x, 

[llvm-branch-commits] [clang] [AMDGPU] Simplify dpp builtin handling (PR #115090)

2024-11-06 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/115090

>From 7ccac58706b2d7e54c8498818b560af490a70eac Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Nov 2024 12:28:07 -0800
Subject: [PATCH] [AMDGPU] Simplify dpp builtin handling

DPP intrinsics can handle any type now, so no need to cast to
integer.

The caveat is that intrinsics only handle backend legal types,
but it does not work with i8 for example.
---
 clang/lib/CodeGen/CGBuiltin.cpp   | 23 ++-
 .../CodeGenOpenCL/builtins-amdgcn-gfx10.cl| 30 --
 .../test/CodeGenOpenCL/builtins-amdgcn-vi.cl  | 60 +++
 3 files changed, 38 insertions(+), 75 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 5c3df5124517d6..8c0e76c9e8c3d7 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -19211,37 +19211,24 @@ Value 
*CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
 getContext().GetBuiltinType(BuiltinID, Error, &ICEArguments);
 assert(Error == ASTContext::GE_None && "Should not codegen an error");
 llvm::Type *DataTy = ConvertType(E->getArg(0)->getType());
-unsigned Size = DataTy->getPrimitiveSizeInBits();
-llvm::Type *IntTy =
-llvm::IntegerType::get(Builder.getContext(), std::max(Size, 32u));
 Function *F =
 CGM.getIntrinsic(BuiltinID == AMDGPU::BI__builtin_amdgcn_mov_dpp8
  ? Intrinsic::amdgcn_mov_dpp8
  : Intrinsic::amdgcn_update_dpp,
- IntTy);
+ DataTy);
 assert(E->getNumArgs() == 5 || E->getNumArgs() == 6 ||
E->getNumArgs() == 2);
 bool InsertOld = BuiltinID == AMDGPU::BI__builtin_amdgcn_mov_dpp;
 if (InsertOld)
-  Args.push_back(llvm::PoisonValue::get(IntTy));
-for (unsigned I = 0; I != E->getNumArgs(); ++I) {
+  Args.push_back(llvm::PoisonValue::get(DataTy));
+Args.push_back(EmitScalarOrConstFoldImmArg(ICEArguments, 0, E));
+for (unsigned I = 1; I != E->getNumArgs(); ++I) {
   llvm::Value *V = EmitScalarOrConstFoldImmArg(ICEArguments, I, E);
-  if (I < (BuiltinID == AMDGPU::BI__builtin_amdgcn_update_dpp ? 2u : 1u) &&
-  Size < 32) {
-if (!DataTy->isIntegerTy())
-  V = Builder.CreateBitCast(
-  V, llvm::IntegerType::get(Builder.getContext(), Size));
-V = Builder.CreateZExtOrBitCast(V, IntTy);
-  }
   llvm::Type *ExpTy =
   F->getFunctionType()->getFunctionParamType(I + InsertOld);
   Args.push_back(Builder.CreateTruncOrBitCast(V, ExpTy));
 }
-Value *V = Builder.CreateCall(F, Args);
-if (Size < 32 && !DataTy->isIntegerTy())
-  V = Builder.CreateTrunc(
-  V, llvm::IntegerType::get(Builder.getContext(), Size));
-return Builder.CreateTruncOrBitCast(V, DataTy);
+return Builder.CreateCall(F, Args);
   }
   case AMDGPU::BI__builtin_amdgcn_permlane16:
   case AMDGPU::BI__builtin_amdgcn_permlanex16:
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl
index a4054cba236dd2..7e4ee6f4a942db 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl
@@ -36,45 +36,37 @@ void test_mov_dpp8_long(global long* out, long a) {
 }
 
 // CHECK-LABEL: @test_mov_dpp8_float(
-// CHECK:  %0 = bitcast float %a to i32
-// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 
1)
-// CHECK-NEXT: store i32 %1,
+// CHECK:  %0 = tail call{{.*}} float @llvm.amdgcn.mov.dpp8.f32(float %a, 
i32 1)
+// CHECK-NEXT: store float %0,
 void test_mov_dpp8_float(global float* out, float a) {
   *out = __builtin_amdgcn_mov_dpp8(a, 1);
 }
 
 // CHECK-LABEL: @test_mov_dpp8_double
-// CHECK:  %0 = bitcast double %x to i64
-// CHECK-NEXT: %1 = tail call{{.*}} i64 @llvm.amdgcn.mov.dpp8.i64(i64 %0, i32 
1)
-// CHECK-NEXT: store i64 %1,
+// CHECK:  %0 = tail call{{.*}} double @llvm.amdgcn.mov.dpp8.f64(double 
%x, i32 1)
+// CHECK-NEXT: store double %0,
 void test_mov_dpp8_double(double x, global double *p) {
   *p = __builtin_amdgcn_mov_dpp8(x, 1);
 }
 
 // CHECK-LABEL: @test_mov_dpp8_short
-// CHECK:  %0 = zext i16 %x to i32
-// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 
1)
-// CHECK-NEXT: %2 = trunc i32 %1 to i16
-// CHECK-NEXT: store i16 %2,
+// CHECK:  %0 = tail call{{.*}} i16 @llvm.amdgcn.mov.dpp8.i16(i16 %x, i32 
1)
+// CHECK-NEXT: store i16 %0,
 void test_mov_dpp8_short(short x, global short *p) {
   *p = __builtin_amdgcn_mov_dpp8(x, 1);
 }
 
 // CHECK-LABEL: @test_mov_dpp8_char
-// CHECK:  %0 = zext i8 %x to i32
-// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 
1)
-// CHECK-NEXT: %2 = trunc i32 %1 to i8
-// CHECK-NEXT: store i8 %2,
+// CHECK:  %0 = tail call{{.*}} i8 @llvm.amdgcn.mov.dpp8.i8(i8 %x, 

[llvm-branch-commits] [clang] [AMDGPU] Simplify dpp builtin handling (PR #115090)

2024-11-06 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

> Should also teach instcombine to fold bitcast + app

It still needs downstack change to handle i8: 
https://github.com/llvm/llvm-project/pull/114887

https://github.com/llvm/llvm-project/pull/115090
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add baseline test for treating v_pk_mov_b32 like reg_sequence (PR #125656)

2025-02-04 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/125656
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Custom lower 32-bit element shuffles (PR #123711)

2025-01-21 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

Is there any way at all to test it?

https://github.com/llvm/llvm-project/pull/123711
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Custom lower 32-bit element shuffles (PR #123711)

2025-01-21 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/123711
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Custom lower 32-bit element shuffles (PR #123711)

2025-01-21 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

> > Is there any way at all to test it?
> 
> Many shuffle tests were added in 
> [7786266](https://github.com/llvm/llvm-project/commit/7786266dc7b4e89feadcb01ff21f9e3cf2022a6b),
>  this shows they are a no-op. The expected test changes from this are in 
> #123711

OK, I see. LGTM.

https://github.com/llvm/llvm-project/pull/123711
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Add test for VALU hoisiting from WWM region. NFC. (PR #123234)

2025-01-17 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -0,0 +1,43 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py 
UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 
-run-pass=early-machinelicm,si-wqm -o - %s | FileCheck -check-prefix=GCN %s
+

rampitec wrote:

Done

https://github.com/llvm/llvm-project/pull/123234
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Add test for VALU hoisiting from WWM region. NFC. (PR #123234)

2025-01-17 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/123234

>From 7501423b29230f37273094e1b15e8bca0fcc90bd Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Thu, 16 Jan 2025 10:49:05 -0800
Subject: [PATCH] [AMDGPU] Add test for VALU hoisiting from WWM region. NFC.

The test demonstraits a suboptimal VALU hoisting from a WWM
region. As a result we have 2 WWM regions instead of one.
---
 llvm/test/CodeGen/AMDGPU/licm-wwm.mir | 46 +++
 1 file changed, 46 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/licm-wwm.mir

diff --git a/llvm/test/CodeGen/AMDGPU/licm-wwm.mir 
b/llvm/test/CodeGen/AMDGPU/licm-wwm.mir
new file mode 100644
index 00..fc20674971a716
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/licm-wwm.mir
@@ -0,0 +1,46 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py 
UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 
-run-pass=early-machinelicm,si-wqm -o - %s | FileCheck -check-prefix=GCN %s
+
+# Machine LICM may hoist an intruction from a WWM region, which will force 
SI-WQM pass
+# to create a second WWM region. This is an unwanted hoisting.
+
+---
+name: licm_move_wwm
+tracksRegLiveness: true
+body: |
+  ; GCN-LABEL: name: licm_move_wwm
+  ; GCN: bb.0:
+  ; GCN-NEXT:   successors: %bb.1(0x8000)
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT:   [[ENTER_STRICT_WWM:%[0-9]+]]:sreg_32 = ENTER_STRICT_WWM -1, 
implicit-def $exec, implicit-def $scc, implicit $exec
+  ; GCN-NEXT:   [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1, implicit 
$exec
+  ; GCN-NEXT:   $exec_lo = EXIT_STRICT_WWM [[ENTER_STRICT_WWM]]
+  ; GCN-NEXT:   S_BRANCH %bb.1
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT: bb.1:
+  ; GCN-NEXT:   successors: %bb.1(0x4000), %bb.2(0x4000)
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT:   [[ENTER_STRICT_WWM1:%[0-9]+]]:sreg_32 = ENTER_STRICT_WWM -1, 
implicit-def $exec, implicit-def $scc, implicit $exec
+  ; GCN-NEXT:   [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32 = V_READFIRSTLANE_B32 
[[V_MOV_B32_e32_]], implicit $exec
+  ; GCN-NEXT:   $exec_lo = EXIT_STRICT_WWM [[ENTER_STRICT_WWM1]]
+  ; GCN-NEXT:   [[COPY:%[0-9]+]]:sreg_32 = COPY [[V_READFIRSTLANE_B32_]]
+  ; GCN-NEXT:   $exec_lo = S_OR_B32 $exec_lo, [[COPY]], implicit-def $scc
+  ; GCN-NEXT:   S_CBRANCH_EXECNZ %bb.1, implicit $exec
+  ; GCN-NEXT:   S_BRANCH %bb.2
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT: bb.2:
+  ; GCN-NEXT:   S_ENDPGM 0
+  bb.0:
+S_BRANCH %bb.1
+
+  bb.1:
+%0:vgpr_32 = V_MOV_B32_e32 1, implicit $exec
+%1:sreg_32 = V_READFIRSTLANE_B32 killed %0:vgpr_32, implicit $exec
+early-clobber %2:sreg_32 = STRICT_WWM killed %1:sreg_32, implicit $exec
+$exec_lo = S_OR_B32 $exec_lo, %2, implicit-def $scc
+S_CBRANCH_EXECNZ %bb.1, implicit $exec
+S_BRANCH %bb.2
+
+  bb.2:
+S_ENDPGM 0
+...

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Disable VALU sinking and hoisting with WWM (PR #123124)

2025-01-17 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -2773,6 +2773,9 @@ void AMDGPUDAGToDAGISel::SelectINTRINSIC_WO_CHAIN(SDNode 
*N) {
   case Intrinsic::amdgcn_wwm:
   case Intrinsic::amdgcn_strict_wwm:
 Opcode = AMDGPU::STRICT_WWM;
+CurDAG->getMachineFunction()
+.getInfo()
+->setInitWholeWave();

rampitec wrote:

Ack. I can create a separate property HasWWM, but I really want to hear if we 
even want to go that way.

https://github.com/llvm/llvm-project/pull/123124
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Disable VALU sinking and hoisting with WWM (PR #123124)

2025-01-17 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

> I guess my concern is performance regressions if any use of WWM (e.g. atomic 
> optimizer) essentially turns off Machine LICM.

I agree. But when moving the code llvm thinks it is something cheap, and its is 
not, which is also a performance problem. Things would be much easier if we 
could tell an instruction belongs to a WWM region.

https://github.com/llvm/llvm-project/pull/123124
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Disable VALU sinking and hoisting with WWM (PR #123124)

2025-01-17 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec edited 
https://github.com/llvm/llvm-project/pull/123124
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Add test for VALU hoisiting from WWM region. NFC. (PR #123234)

2025-01-16 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/123234?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#123234** https://app.graphite.dev/github/pr/llvm/llvm-project/123234?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/123234?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#123232** https://app.graphite.dev/github/pr/llvm/llvm-project/123232?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/123234
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Disable VALU sinking and hoisting with WWM (PR #123124)

2025-01-16 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec edited 
https://github.com/llvm/llvm-project/pull/123124
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Add test for VALU hoisiting from WWM region. NFC. (PR #123234)

2025-01-16 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec created 
https://github.com/llvm/llvm-project/pull/123234

The test demonstraits a suboptimal VALU hoisting from a WWM
region. As a result we have 2 WWM regions instead of one.

>From 263a43571303c16c3295cb0a88261504c4aef322 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Thu, 16 Jan 2025 10:49:05 -0800
Subject: [PATCH] [AMDGPU] Add test for VALU hoisiting from WWM region. NFC.

The test demonstraits a suboptimal VALU hoisting from a WWM
region. As a result we have 2 WWM regions instead of one.
---
 llvm/test/CodeGen/AMDGPU/licm-wwm.mir | 43 +++
 1 file changed, 43 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/licm-wwm.mir

diff --git a/llvm/test/CodeGen/AMDGPU/licm-wwm.mir 
b/llvm/test/CodeGen/AMDGPU/licm-wwm.mir
new file mode 100644
index 00..96659fcb716450
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/licm-wwm.mir
@@ -0,0 +1,43 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py 
UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 
-run-pass=early-machinelicm,si-wqm -o - %s | FileCheck -check-prefix=GCN %s
+
+---
+name: licm_move_wwm
+tracksRegLiveness: true
+body: |
+  ; GCN-LABEL: name: licm_move_wwm
+  ; GCN: bb.0:
+  ; GCN-NEXT:   successors: %bb.1(0x8000)
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT:   [[ENTER_STRICT_WWM:%[0-9]+]]:sreg_32 = ENTER_STRICT_WWM -1, 
implicit-def $exec, implicit-def $scc, implicit $exec
+  ; GCN-NEXT:   [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1, implicit 
$exec
+  ; GCN-NEXT:   $exec_lo = EXIT_STRICT_WWM [[ENTER_STRICT_WWM]]
+  ; GCN-NEXT:   S_BRANCH %bb.1
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT: bb.1:
+  ; GCN-NEXT:   successors: %bb.1(0x4000), %bb.2(0x4000)
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT:   [[ENTER_STRICT_WWM1:%[0-9]+]]:sreg_32 = ENTER_STRICT_WWM -1, 
implicit-def $exec, implicit-def $scc, implicit $exec
+  ; GCN-NEXT:   [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32 = V_READFIRSTLANE_B32 
[[V_MOV_B32_e32_]], implicit $exec
+  ; GCN-NEXT:   $exec_lo = EXIT_STRICT_WWM [[ENTER_STRICT_WWM1]]
+  ; GCN-NEXT:   [[COPY:%[0-9]+]]:sreg_32 = COPY [[V_READFIRSTLANE_B32_]]
+  ; GCN-NEXT:   $exec_lo = S_OR_B32 $exec_lo, [[COPY]], implicit-def $scc
+  ; GCN-NEXT:   S_CBRANCH_EXECNZ %bb.1, implicit $exec
+  ; GCN-NEXT:   S_BRANCH %bb.2
+  ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT: bb.2:
+  ; GCN-NEXT:   S_ENDPGM 0
+  bb.0:
+S_BRANCH %bb.1
+
+  bb.1:
+%0:vgpr_32 = V_MOV_B32_e32 1, implicit $exec
+%1:sreg_32 = V_READFIRSTLANE_B32 killed %0:vgpr_32, implicit $exec
+early-clobber %2:sreg_32 = STRICT_WWM killed %1:sreg_32, implicit $exec
+$exec_lo = S_OR_B32 $exec_lo, %2, implicit-def $scc
+S_CBRANCH_EXECNZ %bb.1, implicit $exec
+S_BRANCH %bb.2
+
+  bb.2:
+S_ENDPGM 0
+...

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Add test for VALU hoisiting from WWM region. NFC. (PR #123234)

2025-01-16 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec ready_for_review 
https://github.com/llvm/llvm-project/pull/123234
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Disable VALU sinking and hoisting with WWM (PR #123124)

2025-01-16 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

> Missing new test?

Tests added.

https://github.com/llvm/llvm-project/pull/123124
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Set inst_pref_size to maximum (PR #126981)

2025-02-13 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -199,3 +201,28 @@ const MCExpr *SIProgramInfo::getPGMRSrc2(CallingConv::ID 
CC,
 
   return MCConstantExpr::create(0, Ctx);
 }
+
+uint64_t SIProgramInfo::getFunctionCodeSize(const MachineFunction &MF) {

rampitec wrote:

I wanted to look at this separately. Right now the problem is AsmPrinter emits 
end function label into an incorrect place, actually into a kernel descriptor 
in .rodata. This is even a wrong section. That will take more and really a 
separate thing, but when fixed I could replace that with MCExpr. I.e., I can 
emit a separate end label, but this is also a hack.

https://github.com/llvm/llvm-project/pull/126981
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Set inst_pref_size to maximum (PR #126981)

2025-02-13 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -199,3 +201,28 @@ const MCExpr *SIProgramInfo::getPGMRSrc2(CallingConv::ID 
CC,
 
   return MCConstantExpr::create(0, Ctx);
 }
+
+uint64_t SIProgramInfo::getFunctionCodeSize(const MachineFunction &MF) {
+  if (!CodeSizeInBytes.has_value()) {
+const GCNSubtarget &STM = MF.getSubtarget();
+const SIInstrInfo *TII = STM.getInstrInfo();
+
+uint64_t CodeSize = 0;
+
+for (const MachineBasicBlock &MBB : MF) {
+  for (const MachineInstr &MI : MBB) {
+// TODO: CodeSize should account for multiple functions.
+
+// TODO: Should we count size of debug info?
+if (MI.isDebugInstr())

rampitec wrote:

That said, the function was simply moved as is, the only added functionality is 
caching. And yes. it is incorrect and always was, at least because it does not 
correctly handle inline asm.

https://github.com/llvm/llvm-project/pull/126981
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Set inst_pref_size to maximum (PR #126981)

2025-02-13 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -199,3 +201,28 @@ const MCExpr *SIProgramInfo::getPGMRSrc2(CallingConv::ID 
CC,
 
   return MCConstantExpr::create(0, Ctx);
 }
+
+uint64_t SIProgramInfo::getFunctionCodeSize(const MachineFunction &MF) {
+  if (!CodeSizeInBytes.has_value()) {
+const GCNSubtarget &STM = MF.getSubtarget();
+const SIInstrInfo *TII = STM.getInstrInfo();
+
+uint64_t CodeSize = 0;
+
+for (const MachineBasicBlock &MBB : MF) {
+  for (const MachineInstr &MI : MBB) {
+// TODO: CodeSize should account for multiple functions.
+
+// TODO: Should we count size of debug info?
+if (MI.isDebugInstr())

rampitec wrote:

Since this is really somewhat unrelated changes, I have split it into a 
separate https://github.com/llvm/llvm-project/pull/127111, which is just move 
of the code, and will create yet another PR to address the functional comments.

https://github.com/llvm/llvm-project/pull/126981
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Set inst_pref_size to maximum (PR #126981)

2025-02-13 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec edited 
https://github.com/llvm/llvm-project/pull/126981
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [AMDGPU][clang] Replace gfx940 and gfx941 with gfx942 in clang (PR #126762)

2025-02-11 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/126762
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [AMDGPU] Replace gfx940 and gfx941 with gfx942 in llvm (PR #126763)

2025-02-11 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

> Should just leave the subtarget feature name alone. It's not worth the 
> trouble, and this will now start spewing warnings on old IR (due to 
> unnecessary target-features spam clang should stop emitting). It really 
> should have been named 94-insts, but I think it's best to leave it alone

I agree we can keep feature name and all these 'gfx940' checks, just remove 
targets.

https://github.com/llvm/llvm-project/pull/126763
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [AMDGPU] Replace gfx940 and gfx941 with gfx942 in llvm (PR #126763)

2025-02-11 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -1619,28 +1613,6 @@ def FeatureISAVersion9_5_Common : FeatureSet<
FeatureAtomicBufferPkAddBF16Inst
])>;
 
-def FeatureISAVersion9_4_0 : FeatureSet<
-  !listconcat(FeatureISAVersion9_4_Common.Features,
-[
-  FeatureAddressableLocalMemorySize65536,
-  FeatureForceStoreSC0SC1,

rampitec wrote:

FeatureForceStoreSC0SC1 can also be removed along with all the code handling it 
in a separate change.

https://github.com/llvm/llvm-project/pull/126763
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Remove the pass `AMDGPUPromoteKernelArguments` (PR #137655)

2025-04-28 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -11,11 +10,9 @@ define amdgpu_kernel void @ptr_nest_3(ptr addrspace(1) 
nocapture readonly %Arg)
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:[[I:%.*]] = tail call i32 @llvm.amdgcn.workitem.id.x()
 ; CHECK-NEXT:[[P1:%.*]] = getelementptr inbounds ptr, ptr addrspace(1) 
[[ARG:%.*]], i32 [[I]]
-; CHECK-NEXT:[[P2:%.*]] = load ptr, ptr addrspace(1) [[P1]], align 8, 
!amdgpu.noclobber [[META0:![0-9]+]]
-; CHECK-NEXT:[[P2_GLOBAL:%.*]] = addrspacecast ptr [[P2]] to ptr 
addrspace(1)
-; CHECK-NEXT:[[P3:%.*]] = load ptr, ptr addrspace(1) [[P2_GLOBAL]], align 
8, !amdgpu.noclobber [[META0]]
-; CHECK-NEXT:[[P3_GLOBAL:%.*]] = addrspacecast ptr [[P3]] to ptr 
addrspace(1)
-; CHECK-NEXT:store float 0.00e+00, ptr addrspace(1) [[P3_GLOBAL]], 
align 4
+; CHECK-NEXT:[[P2:%.*]] = load ptr, ptr addrspace(1) [[P1]], align 8
+; CHECK-NEXT:[[P3:%.*]] = load ptr, ptr [[P2]], align 8

rampitec wrote:

I think you can have an invalid pointer anywhere, but that is up to the program 
not to dereference an invalid pointer. On practice it cannot be anything but 
global as passed from host. Even if another kernel place there any other 
pointer it is illegal to use it, and it is up to the developer not to do it. It 
should not prevent the optimization.

https://github.com/llvm/llvm-project/pull/137655
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Remove the pass `AMDGPUPromoteKernelArguments` (PR #137655)

2025-04-28 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -11,11 +10,9 @@ define amdgpu_kernel void @ptr_nest_3(ptr addrspace(1) 
nocapture readonly %Arg)
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:[[I:%.*]] = tail call i32 @llvm.amdgcn.workitem.id.x()
 ; CHECK-NEXT:[[P1:%.*]] = getelementptr inbounds ptr, ptr addrspace(1) 
[[ARG:%.*]], i32 [[I]]
-; CHECK-NEXT:[[P2:%.*]] = load ptr, ptr addrspace(1) [[P1]], align 8, 
!amdgpu.noclobber [[META0:![0-9]+]]
-; CHECK-NEXT:[[P2_GLOBAL:%.*]] = addrspacecast ptr [[P2]] to ptr 
addrspace(1)
-; CHECK-NEXT:[[P3:%.*]] = load ptr, ptr addrspace(1) [[P2_GLOBAL]], align 
8, !amdgpu.noclobber [[META0]]
-; CHECK-NEXT:[[P3_GLOBAL:%.*]] = addrspacecast ptr [[P3]] to ptr 
addrspace(1)
-; CHECK-NEXT:store float 0.00e+00, ptr addrspace(1) [[P3_GLOBAL]], 
align 4
+; CHECK-NEXT:[[P2:%.*]] = load ptr, ptr addrspace(1) [[P1]], align 8
+; CHECK-NEXT:[[P3:%.*]] = load ptr, ptr [[P2]], align 8

rampitec wrote:

The pass is important for performance, especially for HIP. A pointer passed 
from host cannot be anything but global and be valid. So, this is a surprising 
change.

https://github.com/llvm/llvm-project/pull/137655
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize() (PR #127142)

2025-02-17 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

Which one do you prefer, this or 
https://github.com/llvm/llvm-project/pull/127246? They are mutually exclusive.

https://github.com/llvm/llvm-project/pull/127142
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Handle subregister uses in SIFoldOperands constant folding (PR #127485)

2025-02-17 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/127485
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Handle brev and not cases in getConstValDefinedInReg (PR #127483)

2025-02-17 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/127483
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize() (PR #127142)

2025-02-18 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/127142

>From b574a4b4afbf4cd0a6e128ea5d1e1579698124bc Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Thu, 13 Feb 2025 14:46:37 -0800
Subject: [PATCH] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize()

---
 llvm/lib/Target/AMDGPU/SIProgramInfo.cpp  |  6 ++
 .../CodeGen/AMDGPU/code-size-estimate.mir | 89 +++
 2 files changed, 95 insertions(+)

diff --git a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
index 1123696509818..b4d740422b94a 100644
--- a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
@@ -212,6 +212,12 @@ uint64_t SIProgramInfo::getFunctionCodeSize(const 
MachineFunction &MF) {
   uint64_t CodeSize = 0;
 
   for (const MachineBasicBlock &MBB : MF) {
+// The amount of padding to align code can be both underestimated and
+// overestimated. In case of inline asm used getInstSizeInBytes() will
+// return a maximum size of a single instruction, where the real size may
+// differ. At this point CodeSize may be already off.
+CodeSize = alignTo(CodeSize, MBB.getAlignment());
+
 for (const MachineInstr &MI : MBB) {
   // TODO: CodeSize should account for multiple functions.
 
diff --git a/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir 
b/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir
index 76eaf350301e4..9ae536af6f0e9 100644
--- a/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir
+++ b/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir
@@ -31,3 +31,92 @@ body: |
 
   WAVE_BARRIER
 ...
+
+# CHECK: align4: ; @align4
+# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: 
[0x00,0x00,0x8c,0xbf]
+# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}}  ; encoding: [A,A,0x85,0xbf]
+# CHECK: s_barrier   ; encoding: 
[0x00,0x00,0x8a,0xbf]
+# CHECK: .p2align2
+# CHECK: s_endpgm; encoding: 
[0x00,0x00,0x81,0xbf]
+# CHECK: ; codeLenInByte = 16
+
+---
+name:align4
+tracksRegLiveness: true
+body: |
+  bb.0:
+$scc = IMPLICIT_DEF
+S_CBRANCH_SCC1 %bb.2, implicit $scc
+
+  bb.1:
+S_BARRIER
+
+  bb.2 (align 4):
+S_ENDPGM 0
+...
+
+# CHECK: align8: ; @align8
+# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: 
[0x00,0x00,0x8c,0xbf]
+# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}}  ; encoding: [A,A,0x85,0xbf]
+# CHECK: s_barrier   ; encoding: 
[0x00,0x00,0x8a,0xbf]
+# CHECK: .p2align3
+# CHECK: s_endpgm; encoding: 
[0x00,0x00,0x81,0xbf]
+# CHECK: ; codeLenInByte = 20
+---
+name:align8
+tracksRegLiveness: true
+body: |
+  bb.0:
+$scc = IMPLICIT_DEF
+S_CBRANCH_SCC1 %bb.2, implicit $scc
+
+  bb.1:
+S_BARRIER
+
+  bb.2 (align 8):
+S_ENDPGM 0
+...
+
+# CHECK: align16:; @align16
+# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: 
[0x00,0x00,0x8c,0xbf]
+# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}}  ; encoding: [A,A,0x85,0xbf]
+# CHECK: s_barrier   ; encoding: 
[0x00,0x00,0x8a,0xbf]
+# CHECK: .p2align4
+# CHECK: s_endpgm; encoding: 
[0x00,0x00,0x81,0xbf]
+# CHECK: ; codeLenInByte = 20
+---
+name:align16
+tracksRegLiveness: true
+body: |
+  bb.0:
+$scc = IMPLICIT_DEF
+S_CBRANCH_SCC1 %bb.2, implicit $scc
+
+  bb.1:
+S_BARRIER
+
+  bb.2 (align 16):
+S_ENDPGM 0
+...
+
+# CHECK: align32:; @align32
+# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: 
[0x00,0x00,0x8c,0xbf]
+# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}}  ; encoding: [A,A,0x85,0xbf]
+# CHECK: s_barrier   ; encoding: 
[0x00,0x00,0x8a,0xbf]
+# CHECK: .p2align5
+# CHECK: s_endpgm; encoding: 
[0x00,0x00,0x81,0xbf]
+# CHECK: ; codeLenInByte = 36
+---
+name:align32
+tracksRegLiveness: true
+body: |
+  bb.0:
+$scc = IMPLICIT_DEF
+S_CBRANCH_SCC1 %bb.2, implicit $scc
+
+  bb.1:
+S_BARRIER
+
+  bb.2 (align 32):
+S_ENDPGM 0
+...

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize() (PR #127142)

2025-02-18 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/127142

>From b574a4b4afbf4cd0a6e128ea5d1e1579698124bc Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Thu, 13 Feb 2025 14:46:37 -0800
Subject: [PATCH] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize()

---
 llvm/lib/Target/AMDGPU/SIProgramInfo.cpp  |  6 ++
 .../CodeGen/AMDGPU/code-size-estimate.mir | 89 +++
 2 files changed, 95 insertions(+)

diff --git a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
index 1123696509818..b4d740422b94a 100644
--- a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
@@ -212,6 +212,12 @@ uint64_t SIProgramInfo::getFunctionCodeSize(const 
MachineFunction &MF) {
   uint64_t CodeSize = 0;
 
   for (const MachineBasicBlock &MBB : MF) {
+// The amount of padding to align code can be both underestimated and
+// overestimated. In case of inline asm used getInstSizeInBytes() will
+// return a maximum size of a single instruction, where the real size may
+// differ. At this point CodeSize may be already off.
+CodeSize = alignTo(CodeSize, MBB.getAlignment());
+
 for (const MachineInstr &MI : MBB) {
   // TODO: CodeSize should account for multiple functions.
 
diff --git a/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir 
b/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir
index 76eaf350301e4..9ae536af6f0e9 100644
--- a/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir
+++ b/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir
@@ -31,3 +31,92 @@ body: |
 
   WAVE_BARRIER
 ...
+
+# CHECK: align4: ; @align4
+# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: 
[0x00,0x00,0x8c,0xbf]
+# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}}  ; encoding: [A,A,0x85,0xbf]
+# CHECK: s_barrier   ; encoding: 
[0x00,0x00,0x8a,0xbf]
+# CHECK: .p2align2
+# CHECK: s_endpgm; encoding: 
[0x00,0x00,0x81,0xbf]
+# CHECK: ; codeLenInByte = 16
+
+---
+name:align4
+tracksRegLiveness: true
+body: |
+  bb.0:
+$scc = IMPLICIT_DEF
+S_CBRANCH_SCC1 %bb.2, implicit $scc
+
+  bb.1:
+S_BARRIER
+
+  bb.2 (align 4):
+S_ENDPGM 0
+...
+
+# CHECK: align8: ; @align8
+# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: 
[0x00,0x00,0x8c,0xbf]
+# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}}  ; encoding: [A,A,0x85,0xbf]
+# CHECK: s_barrier   ; encoding: 
[0x00,0x00,0x8a,0xbf]
+# CHECK: .p2align3
+# CHECK: s_endpgm; encoding: 
[0x00,0x00,0x81,0xbf]
+# CHECK: ; codeLenInByte = 20
+---
+name:align8
+tracksRegLiveness: true
+body: |
+  bb.0:
+$scc = IMPLICIT_DEF
+S_CBRANCH_SCC1 %bb.2, implicit $scc
+
+  bb.1:
+S_BARRIER
+
+  bb.2 (align 8):
+S_ENDPGM 0
+...
+
+# CHECK: align16:; @align16
+# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: 
[0x00,0x00,0x8c,0xbf]
+# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}}  ; encoding: [A,A,0x85,0xbf]
+# CHECK: s_barrier   ; encoding: 
[0x00,0x00,0x8a,0xbf]
+# CHECK: .p2align4
+# CHECK: s_endpgm; encoding: 
[0x00,0x00,0x81,0xbf]
+# CHECK: ; codeLenInByte = 20
+---
+name:align16
+tracksRegLiveness: true
+body: |
+  bb.0:
+$scc = IMPLICIT_DEF
+S_CBRANCH_SCC1 %bb.2, implicit $scc
+
+  bb.1:
+S_BARRIER
+
+  bb.2 (align 16):
+S_ENDPGM 0
+...
+
+# CHECK: align32:; @align32
+# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: 
[0x00,0x00,0x8c,0xbf]
+# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}}  ; encoding: [A,A,0x85,0xbf]
+# CHECK: s_barrier   ; encoding: 
[0x00,0x00,0x8a,0xbf]
+# CHECK: .p2align5
+# CHECK: s_endpgm; encoding: 
[0x00,0x00,0x81,0xbf]
+# CHECK: ; codeLenInByte = 36
+---
+name:align32
+tracksRegLiveness: true
+body: |
+  bb.0:
+$scc = IMPLICIT_DEF
+S_CBRANCH_SCC1 %bb.2, implicit $scc
+
+  bb.1:
+S_BARRIER
+
+  bb.2 (align 32):
+S_ENDPGM 0
+...

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize() (PR #127142)

2025-02-17 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

> > Which one do you prefer, this or #127246? They are mutually exclusive.
> 
> They're not really. This one is the incremental step which adds the test, 
> #127246 is the final form

The test is meaningless if we overestimate.

https://github.com/llvm/llvm-project/pull/127142
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize() (PR #127142)

2025-02-17 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

And in any case it is a moot until baseline change is accepted.

https://github.com/llvm/llvm-project/pull/127142
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Stop introducing v_accvgpr_write_b32 for reg-to-reg copy (PR #129059)

2025-02-27 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/129059
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [AMDGPU] Simplify dpp builtin handling (PR #115090)

2025-03-01 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/115090

>From f7e10b1e26159442945c2682ca1ed463bd152605 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Nov 2024 12:28:07 -0800
Subject: [PATCH] [AMDGPU] Simplify dpp builtin handling

DPP intrinsics can handle any type now, so no need to cast to
integer.

The caveat is that intrinsics only handle backend legal types,
but it does not work with i8 for example.
---
 clang/lib/CodeGen/CGBuiltin.cpp   | 23 ++-
 .../CodeGenOpenCL/builtins-amdgcn-gfx10.cl| 30 --
 .../test/CodeGenOpenCL/builtins-amdgcn-vi.cl  | 60 +++
 3 files changed, 38 insertions(+), 75 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 03b8d16b76e0d..bff48f2e16524 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -20003,37 +20003,24 @@ Value 
*CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
 getContext().GetBuiltinType(BuiltinID, Error, &ICEArguments);
 assert(Error == ASTContext::GE_None && "Should not codegen an error");
 llvm::Type *DataTy = ConvertType(E->getArg(0)->getType());
-unsigned Size = DataTy->getPrimitiveSizeInBits();
-llvm::Type *IntTy =
-llvm::IntegerType::get(Builder.getContext(), std::max(Size, 32u));
 Function *F =
 CGM.getIntrinsic(BuiltinID == AMDGPU::BI__builtin_amdgcn_mov_dpp8
  ? Intrinsic::amdgcn_mov_dpp8
  : Intrinsic::amdgcn_update_dpp,
- IntTy);
+ DataTy);
 assert(E->getNumArgs() == 5 || E->getNumArgs() == 6 ||
E->getNumArgs() == 2);
 bool InsertOld = BuiltinID == AMDGPU::BI__builtin_amdgcn_mov_dpp;
 if (InsertOld)
-  Args.push_back(llvm::PoisonValue::get(IntTy));
-for (unsigned I = 0; I != E->getNumArgs(); ++I) {
+  Args.push_back(llvm::PoisonValue::get(DataTy));
+Args.push_back(EmitScalarOrConstFoldImmArg(ICEArguments, 0, E));
+for (unsigned I = 1; I != E->getNumArgs(); ++I) {
   llvm::Value *V = EmitScalarOrConstFoldImmArg(ICEArguments, I, E);
-  if (I < (BuiltinID == AMDGPU::BI__builtin_amdgcn_update_dpp ? 2u : 1u) &&
-  Size < 32) {
-if (!DataTy->isIntegerTy())
-  V = Builder.CreateBitCast(
-  V, llvm::IntegerType::get(Builder.getContext(), Size));
-V = Builder.CreateZExtOrBitCast(V, IntTy);
-  }
   llvm::Type *ExpTy =
   F->getFunctionType()->getFunctionParamType(I + InsertOld);
   Args.push_back(Builder.CreateTruncOrBitCast(V, ExpTy));
 }
-Value *V = Builder.CreateCall(F, Args);
-if (Size < 32 && !DataTy->isIntegerTy())
-  V = Builder.CreateTrunc(
-  V, llvm::IntegerType::get(Builder.getContext(), Size));
-return Builder.CreateTruncOrBitCast(V, DataTy);
+return Builder.CreateCall(F, Args);
   }
   case AMDGPU::BI__builtin_amdgcn_permlane16:
   case AMDGPU::BI__builtin_amdgcn_permlanex16:
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl
index a4054cba236dd..7e4ee6f4a942d 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl
@@ -36,45 +36,37 @@ void test_mov_dpp8_long(global long* out, long a) {
 }
 
 // CHECK-LABEL: @test_mov_dpp8_float(
-// CHECK:  %0 = bitcast float %a to i32
-// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 
1)
-// CHECK-NEXT: store i32 %1,
+// CHECK:  %0 = tail call{{.*}} float @llvm.amdgcn.mov.dpp8.f32(float %a, 
i32 1)
+// CHECK-NEXT: store float %0,
 void test_mov_dpp8_float(global float* out, float a) {
   *out = __builtin_amdgcn_mov_dpp8(a, 1);
 }
 
 // CHECK-LABEL: @test_mov_dpp8_double
-// CHECK:  %0 = bitcast double %x to i64
-// CHECK-NEXT: %1 = tail call{{.*}} i64 @llvm.amdgcn.mov.dpp8.i64(i64 %0, i32 
1)
-// CHECK-NEXT: store i64 %1,
+// CHECK:  %0 = tail call{{.*}} double @llvm.amdgcn.mov.dpp8.f64(double 
%x, i32 1)
+// CHECK-NEXT: store double %0,
 void test_mov_dpp8_double(double x, global double *p) {
   *p = __builtin_amdgcn_mov_dpp8(x, 1);
 }
 
 // CHECK-LABEL: @test_mov_dpp8_short
-// CHECK:  %0 = zext i16 %x to i32
-// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 
1)
-// CHECK-NEXT: %2 = trunc i32 %1 to i16
-// CHECK-NEXT: store i16 %2,
+// CHECK:  %0 = tail call{{.*}} i16 @llvm.amdgcn.mov.dpp8.i16(i16 %x, i32 
1)
+// CHECK-NEXT: store i16 %0,
 void test_mov_dpp8_short(short x, global short *p) {
   *p = __builtin_amdgcn_mov_dpp8(x, 1);
 }
 
 // CHECK-LABEL: @test_mov_dpp8_char
-// CHECK:  %0 = zext i8 %x to i32
-// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 
1)
-// CHECK-NEXT: %2 = trunc i32 %1 to i8
-// CHECK-NEXT: store i8 %2,
+// CHECK:  %0 = tail call{{.*}} i8 @llvm.amdgcn.mov.dpp8.i8(i8 %x, i32 

[llvm-branch-commits] [clang] [AMDGPU] Simplify dpp builtin handling (PR #115090)

2025-03-01 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec updated 
https://github.com/llvm/llvm-project/pull/115090

>From f7e10b1e26159442945c2682ca1ed463bd152605 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Mon, 4 Nov 2024 12:28:07 -0800
Subject: [PATCH] [AMDGPU] Simplify dpp builtin handling

DPP intrinsics can handle any type now, so no need to cast to
integer.

The caveat is that intrinsics only handle backend legal types,
but it does not work with i8 for example.
---
 clang/lib/CodeGen/CGBuiltin.cpp   | 23 ++-
 .../CodeGenOpenCL/builtins-amdgcn-gfx10.cl| 30 --
 .../test/CodeGenOpenCL/builtins-amdgcn-vi.cl  | 60 +++
 3 files changed, 38 insertions(+), 75 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 03b8d16b76e0d..bff48f2e16524 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -20003,37 +20003,24 @@ Value 
*CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
 getContext().GetBuiltinType(BuiltinID, Error, &ICEArguments);
 assert(Error == ASTContext::GE_None && "Should not codegen an error");
 llvm::Type *DataTy = ConvertType(E->getArg(0)->getType());
-unsigned Size = DataTy->getPrimitiveSizeInBits();
-llvm::Type *IntTy =
-llvm::IntegerType::get(Builder.getContext(), std::max(Size, 32u));
 Function *F =
 CGM.getIntrinsic(BuiltinID == AMDGPU::BI__builtin_amdgcn_mov_dpp8
  ? Intrinsic::amdgcn_mov_dpp8
  : Intrinsic::amdgcn_update_dpp,
- IntTy);
+ DataTy);
 assert(E->getNumArgs() == 5 || E->getNumArgs() == 6 ||
E->getNumArgs() == 2);
 bool InsertOld = BuiltinID == AMDGPU::BI__builtin_amdgcn_mov_dpp;
 if (InsertOld)
-  Args.push_back(llvm::PoisonValue::get(IntTy));
-for (unsigned I = 0; I != E->getNumArgs(); ++I) {
+  Args.push_back(llvm::PoisonValue::get(DataTy));
+Args.push_back(EmitScalarOrConstFoldImmArg(ICEArguments, 0, E));
+for (unsigned I = 1; I != E->getNumArgs(); ++I) {
   llvm::Value *V = EmitScalarOrConstFoldImmArg(ICEArguments, I, E);
-  if (I < (BuiltinID == AMDGPU::BI__builtin_amdgcn_update_dpp ? 2u : 1u) &&
-  Size < 32) {
-if (!DataTy->isIntegerTy())
-  V = Builder.CreateBitCast(
-  V, llvm::IntegerType::get(Builder.getContext(), Size));
-V = Builder.CreateZExtOrBitCast(V, IntTy);
-  }
   llvm::Type *ExpTy =
   F->getFunctionType()->getFunctionParamType(I + InsertOld);
   Args.push_back(Builder.CreateTruncOrBitCast(V, ExpTy));
 }
-Value *V = Builder.CreateCall(F, Args);
-if (Size < 32 && !DataTy->isIntegerTy())
-  V = Builder.CreateTrunc(
-  V, llvm::IntegerType::get(Builder.getContext(), Size));
-return Builder.CreateTruncOrBitCast(V, DataTy);
+return Builder.CreateCall(F, Args);
   }
   case AMDGPU::BI__builtin_amdgcn_permlane16:
   case AMDGPU::BI__builtin_amdgcn_permlanex16:
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl 
b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl
index a4054cba236dd..7e4ee6f4a942d 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl
@@ -36,45 +36,37 @@ void test_mov_dpp8_long(global long* out, long a) {
 }
 
 // CHECK-LABEL: @test_mov_dpp8_float(
-// CHECK:  %0 = bitcast float %a to i32
-// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 
1)
-// CHECK-NEXT: store i32 %1,
+// CHECK:  %0 = tail call{{.*}} float @llvm.amdgcn.mov.dpp8.f32(float %a, 
i32 1)
+// CHECK-NEXT: store float %0,
 void test_mov_dpp8_float(global float* out, float a) {
   *out = __builtin_amdgcn_mov_dpp8(a, 1);
 }
 
 // CHECK-LABEL: @test_mov_dpp8_double
-// CHECK:  %0 = bitcast double %x to i64
-// CHECK-NEXT: %1 = tail call{{.*}} i64 @llvm.amdgcn.mov.dpp8.i64(i64 %0, i32 
1)
-// CHECK-NEXT: store i64 %1,
+// CHECK:  %0 = tail call{{.*}} double @llvm.amdgcn.mov.dpp8.f64(double 
%x, i32 1)
+// CHECK-NEXT: store double %0,
 void test_mov_dpp8_double(double x, global double *p) {
   *p = __builtin_amdgcn_mov_dpp8(x, 1);
 }
 
 // CHECK-LABEL: @test_mov_dpp8_short
-// CHECK:  %0 = zext i16 %x to i32
-// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 
1)
-// CHECK-NEXT: %2 = trunc i32 %1 to i16
-// CHECK-NEXT: store i16 %2,
+// CHECK:  %0 = tail call{{.*}} i16 @llvm.amdgcn.mov.dpp8.i16(i16 %x, i32 
1)
+// CHECK-NEXT: store i16 %0,
 void test_mov_dpp8_short(short x, global short *p) {
   *p = __builtin_amdgcn_mov_dpp8(x, 1);
 }
 
 // CHECK-LABEL: @test_mov_dpp8_char
-// CHECK:  %0 = zext i8 %x to i32
-// CHECK-NEXT: %1 = tail call{{.*}} i32 @llvm.amdgcn.mov.dpp8.i32(i32 %0, i32 
1)
-// CHECK-NEXT: %2 = trunc i32 %1 to i8
-// CHECK-NEXT: store i8 %2,
+// CHECK:  %0 = tail call{{.*}} i8 @llvm.amdgcn.mov.dpp8.i8(i8 %x, i32 

[llvm-branch-commits] [llvm] AMDGPU: Replace amdgpu-no-agpr with amdgpu-num-agpr (PR #129893)

2025-03-05 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.


https://github.com/llvm/llvm-project/pull/129893
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Early bail in getFunctionCodeSize for meta inst. NFC. (PR #127129)

2025-02-13 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/127129?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#127129** https://app.graphite.dev/github/pr/llvm/llvm-project/127129?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/127129?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#127111** https://app.graphite.dev/github/pr/llvm/llvm-project/127111?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>: 1 other dependent PR 
([#126981](https://github.com/llvm/llvm-project/pull/126981) https://app.graphite.dev/github/pr/llvm/llvm-project/126981?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>)
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/127129
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Early bail in getFunctionCodeSize for meta inst. NFC. (PR #127129)

2025-02-13 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec created 
https://github.com/llvm/llvm-project/pull/127129

It does not change the estimate because getInstSizeInBytes() already
returns 0 for meta instructions, but added a test and early bail.

>From c0489545755c98dc2f87ffcd83af929816643074 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Thu, 13 Feb 2025 13:19:26 -0800
Subject: [PATCH] [AMDGPU] Early bail in getFunctionCodeSize for meta inst.
 NFC.

It does not change the estimate because getInstSizeInBytes() already
returns 0 for meta instructions, but added a test and early bail.
---
 llvm/lib/Target/AMDGPU/SIProgramInfo.cpp|  2 +-
 llvm/test/CodeGen/AMDGPU/code-size-estimate.mir | 13 +
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
index 5179288084010..b995687e71780 100644
--- a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
@@ -216,7 +216,7 @@ uint64_t SIProgramInfo::getFunctionCodeSize(const 
MachineFunction &MF) {
   // TODO: CodeSize should account for multiple functions.
 
   // TODO: Should we count size of debug info?
-  if (MI.isDebugInstr())
+  if (MI.isDebugInstr() || MI.isMetaInstruction())
 continue;
 
   CodeSize += TII->getInstSizeInBytes(MI);
diff --git a/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir 
b/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir
index 9e46c58b6b5a9..76eaf350301e4 100644
--- a/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir
+++ b/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir
@@ -18,3 +18,16 @@ body: |
   $vgpr16 = V_MOV_B32_indirect_read undef $vgpr1, implicit $exec, implicit 
$m0, implicit 
$vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15
   V_MOV_B32_indirect_write undef $vgpr0, undef $vgpr3, implicit $exec, 
implicit $m0, implicit-def $vgpr0_vgpr1_vgpr2_vgpr3, implicit killed 
$vgpr0_vgpr1_vgpr2_vgpr3(tied-def 4)
 ...
+
+# CHECK: meta:   ; @meta
+# CHECK: ; wave barrier
+# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: 
[0x00,0x00,0x8c,0xbf]
+# CHECK: ; codeLenInByte = 4
+---
+name:meta
+tracksRegLiveness: true
+body: |
+  bb.0:
+
+  WAVE_BARRIER
+...

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize() (PR #127142)

2025-02-13 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -212,6 +212,8 @@ uint64_t SIProgramInfo::getFunctionCodeSize(const 
MachineFunction &MF) {
   uint64_t CodeSize = 0;
 
   for (const MachineBasicBlock &MBB : MF) {
+CodeSize = alignTo(CodeSize, MBB.getAlignment());

rampitec wrote:

Pessimistic overestimate is actually worse for some applications of this 
function. For what I am doing now it may result prefetching memory far beyond 
the program. I believe our estimates shall be correct except for inline asm...

https://github.com/llvm/llvm-project/pull/127142
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize() (PR #127142)

2025-02-13 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec created 
https://github.com/llvm/llvm-project/pull/127142

None

>From d01d16815ade61a599b94bb18bc292e326767f15 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin 
Date: Thu, 13 Feb 2025 14:46:37 -0800
Subject: [PATCH] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize()

---
 llvm/lib/Target/AMDGPU/SIProgramInfo.cpp  |  2 +
 .../CodeGen/AMDGPU/code-size-estimate.mir | 89 +++
 2 files changed, 91 insertions(+)

diff --git a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
index b995687e71780..9d9b4c83ac388 100644
--- a/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIProgramInfo.cpp
@@ -212,6 +212,8 @@ uint64_t SIProgramInfo::getFunctionCodeSize(const 
MachineFunction &MF) {
   uint64_t CodeSize = 0;
 
   for (const MachineBasicBlock &MBB : MF) {
+CodeSize = alignTo(CodeSize, MBB.getAlignment());
+
 for (const MachineInstr &MI : MBB) {
   // TODO: CodeSize should account for multiple functions.
 
diff --git a/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir 
b/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir
index 76eaf350301e4..9ae536af6f0e9 100644
--- a/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir
+++ b/llvm/test/CodeGen/AMDGPU/code-size-estimate.mir
@@ -31,3 +31,92 @@ body: |
 
   WAVE_BARRIER
 ...
+
+# CHECK: align4: ; @align4
+# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: 
[0x00,0x00,0x8c,0xbf]
+# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}}  ; encoding: [A,A,0x85,0xbf]
+# CHECK: s_barrier   ; encoding: 
[0x00,0x00,0x8a,0xbf]
+# CHECK: .p2align2
+# CHECK: s_endpgm; encoding: 
[0x00,0x00,0x81,0xbf]
+# CHECK: ; codeLenInByte = 16
+
+---
+name:align4
+tracksRegLiveness: true
+body: |
+  bb.0:
+$scc = IMPLICIT_DEF
+S_CBRANCH_SCC1 %bb.2, implicit $scc
+
+  bb.1:
+S_BARRIER
+
+  bb.2 (align 4):
+S_ENDPGM 0
+...
+
+# CHECK: align8: ; @align8
+# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: 
[0x00,0x00,0x8c,0xbf]
+# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}}  ; encoding: [A,A,0x85,0xbf]
+# CHECK: s_barrier   ; encoding: 
[0x00,0x00,0x8a,0xbf]
+# CHECK: .p2align3
+# CHECK: s_endpgm; encoding: 
[0x00,0x00,0x81,0xbf]
+# CHECK: ; codeLenInByte = 20
+---
+name:align8
+tracksRegLiveness: true
+body: |
+  bb.0:
+$scc = IMPLICIT_DEF
+S_CBRANCH_SCC1 %bb.2, implicit $scc
+
+  bb.1:
+S_BARRIER
+
+  bb.2 (align 8):
+S_ENDPGM 0
+...
+
+# CHECK: align16:; @align16
+# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: 
[0x00,0x00,0x8c,0xbf]
+# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}}  ; encoding: [A,A,0x85,0xbf]
+# CHECK: s_barrier   ; encoding: 
[0x00,0x00,0x8a,0xbf]
+# CHECK: .p2align4
+# CHECK: s_endpgm; encoding: 
[0x00,0x00,0x81,0xbf]
+# CHECK: ; codeLenInByte = 20
+---
+name:align16
+tracksRegLiveness: true
+body: |
+  bb.0:
+$scc = IMPLICIT_DEF
+S_CBRANCH_SCC1 %bb.2, implicit $scc
+
+  bb.1:
+S_BARRIER
+
+  bb.2 (align 16):
+S_ENDPGM 0
+...
+
+# CHECK: align32:; @align32
+# CHECK: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; encoding: 
[0x00,0x00,0x8c,0xbf]
+# CHECK: s_cbranch_scc1 .LBB{{[0-9_]+}}  ; encoding: [A,A,0x85,0xbf]
+# CHECK: s_barrier   ; encoding: 
[0x00,0x00,0x8a,0xbf]
+# CHECK: .p2align5
+# CHECK: s_endpgm; encoding: 
[0x00,0x00,0x81,0xbf]
+# CHECK: ; codeLenInByte = 36
+---
+name:align32
+tracksRegLiveness: true
+body: |
+  bb.0:
+$scc = IMPLICIT_DEF
+S_CBRANCH_SCC1 %bb.2, implicit $scc
+
+  bb.1:
+S_BARRIER
+
+  bb.2 (align 32):
+S_ENDPGM 0
+...

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize() (PR #127142)

2025-02-13 Thread Stanislav Mekhanoshin via llvm-branch-commits

rampitec wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/127142?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#127142** https://app.graphite.dev/github/pr/llvm/llvm-project/127142?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/127142?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#127129** https://app.graphite.dev/github/pr/llvm/llvm-project/127129?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#127111** https://app.graphite.dev/github/pr/llvm/llvm-project/127111?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>: 1 other dependent PR 
([#126981](https://github.com/llvm/llvm-project/pull/126981) https://app.graphite.dev/github/pr/llvm/llvm-project/126981?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>)
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/127142
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Set inst_pref_size to maximum (PR #126981)

2025-02-13 Thread Stanislav Mekhanoshin via llvm-branch-commits


@@ -199,3 +201,28 @@ const MCExpr *SIProgramInfo::getPGMRSrc2(CallingConv::ID 
CC,
 
   return MCConstantExpr::create(0, Ctx);
 }
+
+uint64_t SIProgramInfo::getFunctionCodeSize(const MachineFunction &MF) {
+  if (!CodeSizeInBytes.has_value()) {
+const GCNSubtarget &STM = MF.getSubtarget();
+const SIInstrInfo *TII = STM.getInstrInfo();
+
+uint64_t CodeSize = 0;
+
+for (const MachineBasicBlock &MBB : MF) {
+  for (const MachineInstr &MI : MBB) {

rampitec wrote:

https://github.com/llvm/llvm-project/pull/127142

https://github.com/llvm/llvm-project/pull/126981
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Respect MBB alignment in the getFunctionCodeSize() (PR #127142)

2025-02-13 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec ready_for_review 
https://github.com/llvm/llvm-project/pull/127142
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


  1   2   >