[llvm-branch-commits] [llvm] AMDGPU: Create pseudo to real mapping for flat/buffer atomic fmin/fmax (PR #95591)
https://github.com/Sisyph edited https://github.com/llvm/llvm-project/pull/95591 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Create pseudo to real mapping for flat/buffer atomic fmin/fmax (PR #95591)
https://github.com/Sisyph approved this pull request. https://github.com/llvm/llvm-project/pull/95591 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Create pseudo to real mapping for flat/buffer atomic fmin/fmax (PR #95591)
@@ -1608,14 +1598,14 @@ defm : FlatSignedAtomicIntrPat <"FLAT_ATOMIC_FMAX", "int_amdgcn_flat_atomic_fmax } let OtherPredicates = [isGFX10Only] in { -defm : GlobalFLATAtomicPats <"GLOBAL_ATOMIC_FMIN_X2", "atomic_load_fmin_global", f64>; -defm : GlobalFLATAtomicPats <"GLOBAL_ATOMIC_FMAX_X2", "atomic_load_fmax_global", f64>; -defm : GlobalFLATAtomicIntrPats <"GLOBAL_ATOMIC_FMIN_X2", "int_amdgcn_global_atomic_fmin", f64>; -defm : GlobalFLATAtomicIntrPats <"GLOBAL_ATOMIC_FMAX_X2", "int_amdgcn_global_atomic_fmax", f64>; -defm : FlatSignedAtomicPat <"FLAT_ATOMIC_FMIN_X2", "atomic_load_fmin_flat", f64>; -defm : FlatSignedAtomicPat <"FLAT_ATOMIC_FMAX_X2", "atomic_load_fmax_flat", f64>; -defm : FlatSignedAtomicIntrPat <"FLAT_ATOMIC_FMIN_X2", "int_amdgcn_flat_atomic_fmin", f64>; -defm : FlatSignedAtomicIntrPat <"FLAT_ATOMIC_FMAX_X2", "int_amdgcn_flat_atomic_fmax", f64>; +defm : GlobalFLATAtomicPats <"GLOBAL_ATOMIC_MIN_F64", "atomic_load_fmin_global", f64>; Sisyph wrote: Can you deduplicate these somehow with the patterns at L1641? They look essentially the same, just with a different predicate. Otherwise LGTM https://github.com/llvm/llvm-project/pull/95591 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Cleanup immediate selection patterns (PR #100787)
https://github.com/Sisyph approved this pull request. https://github.com/llvm/llvm-project/pull/100787 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Select all constants in tablegen (PR #100788)
https://github.com/Sisyph approved this pull request. https://github.com/llvm/llvm-project/pull/100788 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 60466fa - [AMDGPU] Remove deprecated V_MUL_LO_I32 from GFX10
Author: Joe Nash Date: 2021-01-05T11:59:57-05:00 New Revision: 60466fad2dc155329cc870ea733d4f41561bd46d URL: https://github.com/llvm/llvm-project/commit/60466fad2dc155329cc870ea733d4f41561bd46d DIFF: https://github.com/llvm/llvm-project/commit/60466fad2dc155329cc870ea733d4f41561bd46d.diff LOG: [AMDGPU] Remove deprecated V_MUL_LO_I32 from GFX10 It was removed in GFX10 GPUs, but LLVM could generate it. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D94020 Change-Id: Id1c716d71313edcfb768b2b175a6789ef9b01f3c Added: Modified: llvm/lib/Target/AMDGPU/AMDGPU.td llvm/lib/Target/AMDGPU/VOP3Instructions.td llvm/test/MC/AMDGPU/gfx1030_unsupported.s llvm/test/MC/AMDGPU/gfx10_asm_vop3.s Removed: diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td index 42d134de9229..0a212a41ab6a 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPU.td +++ b/llvm/lib/Target/AMDGPU/AMDGPU.td @@ -1131,6 +1131,11 @@ def isGFX10Plus : Predicate<"Subtarget->getGeneration() >= AMDGPUSubtarget::GFX10">, AssemblerPredicate<(all_of FeatureGFX10Insts)>; +def isGFX10Before1030 : + Predicate<"Subtarget->getGeneration() == AMDGPUSubtarget::GFX10 &&" +"!Subtarget->hasGFX10_3Insts()">, + AssemblerPredicate<(all_of FeatureGFX10Insts,(not FeatureGFX10_3Insts))>; + def HasFlatAddressSpace : Predicate<"Subtarget->hasFlatAddressSpace()">, AssemblerPredicate<(all_of FeatureFlatAddressSpace)>; diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td b/llvm/lib/Target/AMDGPU/VOP3Instructions.td index 28e4a09069a8..f349a0f54fa7 100644 --- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td +++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td @@ -867,6 +867,10 @@ let InOperandList = (ins SSrcOrLds_b32:$src0, SCSrc_b32:$src1, VGPR_32:$vdst_in) defm V_WRITELANE_B32 : VOP3_Real_gfx10<0x361>; } // End InOperandList = (ins SSrcOrLds_b32:$src0, SCSrc_b32:$src1, VGPR_32:$vdst_in) +let SubtargetPredicate = isGFX10Before1030 in { + defm V_MUL_LO_I32 : VOP3_Real_gfx10<0x16b>; +} + defm V_XOR3_B32 : VOP3_Real_gfx10<0x178>; defm V_LSHLREV_B64: VOP3_Real_gfx10<0x2ff>; defm V_LSHRREV_B64: VOP3_Real_gfx10<0x300>; @@ -992,6 +996,7 @@ multiclass VOP3be_Real_gfx6_gfx7_gfx10 op> : defm V_LSHL_B64: VOP3_Real_gfx6_gfx7<0x161>; defm V_LSHR_B64: VOP3_Real_gfx6_gfx7<0x162>; defm V_ASHR_I64: VOP3_Real_gfx6_gfx7<0x163>; +defm V_MUL_LO_I32 : VOP3_Real_gfx6_gfx7<0x16b>; defm V_MAD_LEGACY_F32 : VOP3_Real_gfx6_gfx7_gfx10<0x140>; defm V_MAD_F32 : VOP3_Real_gfx6_gfx7_gfx10<0x141>; @@ -1033,7 +1038,6 @@ defm V_MAX_F64 : VOP3_Real_gfx6_gfx7_gfx10<0x167>; defm V_LDEXP_F64 : VOP3_Real_gfx6_gfx7_gfx10<0x168>; defm V_MUL_LO_U32 : VOP3_Real_gfx6_gfx7_gfx10<0x169>; defm V_MUL_HI_U32 : VOP3_Real_gfx6_gfx7_gfx10<0x16a>; -defm V_MUL_LO_I32 : VOP3_Real_gfx6_gfx7_gfx10<0x16b>; defm V_MUL_HI_I32 : VOP3_Real_gfx6_gfx7_gfx10<0x16c>; defm V_DIV_FMAS_F32: VOP3_Real_gfx6_gfx7_gfx10<0x16f>; defm V_DIV_FMAS_F64: VOP3_Real_gfx6_gfx7_gfx10<0x170>; diff --git a/llvm/test/MC/AMDGPU/gfx1030_unsupported.s b/llvm/test/MC/AMDGPU/gfx1030_unsupported.s index b3660d66f21d..57cfb2f2514c 100644 --- a/llvm/test/MC/AMDGPU/gfx1030_unsupported.s +++ b/llvm/test/MC/AMDGPU/gfx1030_unsupported.s @@ -1,6 +1,9 @@ // RUN: not llvm-mc -arch=amdgcn -mcpu=gfx1030 -mattr=+wavefrontsize32,-wavefrontsize64 %s 2>&1 | FileCheck --implicit-check-not=error: %s // RUN: not llvm-mc -arch=amdgcn -mcpu=gfx1030 -mattr=-wavefrontsize32,+wavefrontsize64 %s 2>&1 | FileCheck --implicit-check-not=error: %s +v_mul_lo_i32 v0, v1, v2 +// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: instruction not supported on this GPU + //===--===// // Unsupported dpp variants. //===--===// diff --git a/llvm/test/MC/AMDGPU/gfx10_asm_vop3.s b/llvm/test/MC/AMDGPU/gfx10_asm_vop3.s index a4f77a4bbaad..be5b3d4a7cf3 100644 --- a/llvm/test/MC/AMDGPU/gfx10_asm_vop3.s +++ b/llvm/test/MC/AMDGPU/gfx10_asm_vop3.s @@ -6685,6 +6685,30 @@ v_mul_hi_u32 v5, v1, 0.5 v_mul_hi_u32 v5, v1, -4.0 // GFX10: encoding: [0x05,0x00,0x6a,0xd5,0x01,0xef,0x01,0x00] +v_mul_lo_i32 v5, v1, v2 +// GFX10: encoding: [0x05,0x00,0x6b,0xd5,0x01,0x05,0x02,0x00] + +v_mul_lo_i32 v255, v1, v2 +// GFX10: encoding: [0xff,0x00,0x6b,0xd5,0x01,0x05,0x02,0x00] + +v_mul_lo_i32 v5, v255, v2 +// GFX10: encoding: [0x05,0x00,0x6b,0xd5,0xff,0x05,0x02,0x00] + +v_mul_lo_i32 v5, s1, v2 +// GFX10: encoding: [0x05,0x00,0x6b,0xd5,0x01,0x04,0x02,0x00] + +v_mul_lo_i32 v5, s103, v2 +// GFX10: encoding: [0x05,0x00,0x6b,0xd5,0x67,0x04,0x02,0x00] + +v_mul_lo_i32 v5, vcc_lo, v2 +// GFX10: encoding: [0x05,0x00,0x6b,0xd5,0x6a,0x04,0x02,0x00] + +v_mul_lo_i32 v5, vcc_hi,
[llvm-branch-commits] [llvm] bcec0f2 - [AMDGPU] Deduplicate VOP tablegen asm & ins
Author: Joe Nash Date: 2021-01-11T13:49:26-05:00 New Revision: bcec0f27a2c37b64d5e8b84bbbfa563edae6affe URL: https://github.com/llvm/llvm-project/commit/bcec0f27a2c37b64d5e8b84bbbfa563edae6affe DIFF: https://github.com/llvm/llvm-project/commit/bcec0f27a2c37b64d5e8b84bbbfa563edae6affe.diff LOG: [AMDGPU] Deduplicate VOP tablegen asm & ins VOP3 and VOP DPP subroutines to generate input operands and asm strings were essentially copy pasted several times. They are deduplicated to reduce the maintenance burden and allow faster development. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D94102 Change-Id: I76225eed3c33239d9573351e0c8a0abfad0146ea Added: Modified: llvm/lib/Target/AMDGPU/SIInstrInfo.td llvm/lib/Target/AMDGPU/VOP3Instructions.td Removed: diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.td b/llvm/lib/Target/AMDGPU/SIInstrInfo.td index e48138e56d71..78600bebdad2 100644 --- a/llvm/lib/Target/AMDGPU/SIInstrInfo.td +++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.td @@ -1587,7 +1587,7 @@ class getIns32 { // Returns the input arguments for VOP3 instructions for the given SrcVT. class getIns64 { dag ret = @@ -1602,7 +1602,7 @@ class getIns64 { + // getInst64 handles clamp and omod. implicit mutex between vop3p and omod + dag base = getIns64 .ret; + dag opsel = (ins op_sel0:$op_sel); + dag vop3pFields = (ins op_sel_hi0:$op_sel_hi, neg_lo0:$neg_lo, neg_hi0:$neg_hi); + dag ret = !con(base, + !if(HasOpSel, opsel,(ins)), + !if(IsVOP3P, vop3pFields,(ins))); +} -// The modifiers (except clamp) are dummy operands for the benefit of -// printing and parsing. They defer their values to looking at the -// srcN_modifiers for what to print. class getInsVOP3P { - dag ret = !if (!eq(NumSrcArgs, 2), -!if (HasClamp, - (ins Src0Mod:$src0_modifiers, Src0RC:$src0, - Src1Mod:$src1_modifiers, Src1RC:$src1, - clampmod0:$clamp, - op_sel0:$op_sel, op_sel_hi0:$op_sel_hi, - neg_lo0:$neg_lo, neg_hi0:$neg_hi), - (ins Src0Mod:$src0_modifiers, Src0RC:$src0, - Src1Mod:$src1_modifiers, Src1RC:$src1, - op_sel0:$op_sel, op_sel_hi0:$op_sel_hi, - neg_lo0:$neg_lo, neg_hi0:$neg_hi)), -// else NumSrcArgs == 3 -!if (HasClamp, - (ins Src0Mod:$src0_modifiers, Src0RC:$src0, - Src1Mod:$src1_modifiers, Src1RC:$src1, - Src2Mod:$src2_modifiers, Src2RC:$src2, - clampmod0:$clamp, - op_sel0:$op_sel, op_sel_hi0:$op_sel_hi, - neg_lo0:$neg_lo, neg_hi0:$neg_hi), - (ins Src0Mod:$src0_modifiers, Src0RC:$src0, - Src1Mod:$src1_modifiers, Src1RC:$src1, - Src2Mod:$src2_modifiers, Src2RC:$src2, - op_sel0:$op_sel, op_sel_hi0:$op_sel_hi, - neg_lo0:$neg_lo, neg_hi0:$neg_hi)) - ); + dag ret = getInsVOP3Base.ret; } -class getInsVOP3OpSel { - dag ret = !if (!eq(NumSrcArgs, 2), -!if (HasClamp, - (ins Src0Mod:$src0_modifiers, Src0RC:$src0, - Src1Mod:$src1_modifiers, Src1RC:$src1, - clampmod0:$clamp, - op_sel0:$op_sel), - (ins Src0Mod:$src0_modifiers, Src0RC:$src0, - Src1Mod:$src1_modifiers, Src1RC:$src1, - op_sel0:$op_sel)), -// else NumSrcArgs == 3 -!if (HasClamp, - (ins Src0Mod:$src0_modifiers, Src0RC:$src0, - Src1Mod:$src1_modifiers, Src1RC:$src1, - Src2Mod:$src2_modifiers, Src2RC:$src2, - clampmod0:$clamp, - op_sel0:$op_sel), - (ins Src0Mod:$src0_modifiers, Src0RC:$src0, - Src1Mod:$src1_modifiers, Src1RC:$src1, - Src2Mod:$src2_modifiers, Src2RC:$src2, - op_sel0:$op_sel)) - ); +class getInsVOP3OpSel { + dag ret = getInsVOP3Base.ret; } -class getInsDPP { dag ret = !if (!eq(NumSrcArgs, 0), // VOP1 without input operands (V_NOP) -(ins dpp_ctrl:$dpp_ctrl, row_mask:$row_mask, - bank_mask:$bank_mask, bound_ctrl:$bound_ctrl), +(ins ), !if (!eq(NumSrcArgs, 1), !if (HasModifiers, // VOP1_DPP with modifiers (ins DstRC:$old, Src0Mod:$src0_modifiers, - Src0RC:$src0, dpp_ctrl:$dpp_ctrl, row_mask:$row_mask, - bank_mask:$bank_mask, bound_ctrl:$bound_ctrl) + Src0RC:$src0) /* else */, // VOP1_DPP without modifiers -(ins DstRC:$old, Src0RC:$src0, - dpp_ctrl:$dpp_ctrl, row_mask:$row_mask, - bank_mask:$bank_mask, bound_ctrl:$bound_ctrl) - /* endif */) - /* NumSrcArgs == 2 */, +(ins DstRC:$old, Src0RC:$src0) + /* endif */), !if (HasModifiers, // VOP2_DPP with modifiers (ins DstRC:$old,
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add first gfx950 mfma instructions (PR #116312)
https://github.com/Sisyph approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/116312 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Make vector_shuffle legal for v2i32 with v_pk_mov_b32 (PR #123684)
@@ -489,6 +489,90 @@ void AMDGPUDAGToDAGISel::SelectBuildVector(SDNode *N, unsigned RegClassID) { CurDAG->SelectNodeTo(N, AMDGPU::REG_SEQUENCE, N->getVTList(), RegSeqArgs); } +void AMDGPUDAGToDAGISel::SelectVectorShuffle(SDNode *N) { + EVT VT = N->getValueType(0); + EVT EltVT = VT.getVectorElementType(); + + // TODO: Handle 16-bit element vectors with even aligned masks. + if (!Subtarget->hasPkMovB32() || !EltVT.bitsEq(MVT::i32) || + VT.getVectorNumElements() != 2) { +SelectCode(N); +return; + } + + auto *SVN = cast(N); + + SDValue Src0 = SVN->getOperand(0); + SDValue Src1 = SVN->getOperand(1); + ArrayRef Mask = SVN->getMask(); + SDLoc DL(N); + + assert(Src0.getValueType().getVectorNumElements() == 2 && Mask.size() == 2 && + Mask[0] < 4 && Mask[1] < 4); + + SDValue VSrc0 = Mask[0] < 2 ? Src0 : Src1; + SDValue VSrc1 = Mask[1] < 2 ? Src0 : Src1; + unsigned Src0SubReg = Mask[0] & 1 ? AMDGPU::sub1 : AMDGPU::sub0; + unsigned Src1SubReg = Mask[1] & 1 ? AMDGPU::sub1 : AMDGPU::sub0; + + if (Mask[0] < 0) { +Src0SubReg = Src1SubReg; +MachineSDNode *ImpDef = +CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, DL, VT); +VSrc0 = SDValue(ImpDef, 0); + } + + if (Mask[1] < 0) { +Src1SubReg = Src0SubReg; +MachineSDNode *ImpDef = +CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, DL, VT); +VSrc1 = SDValue(ImpDef, 0); + } + + // SGPR case needs to lower to copies. + // + // Also use subregister extract when we can directly blend the registers with + // a simple subregister copy. + // + // TODO: Maybe we should fold this out earlier + if (N->isDivergent() && Src0SubReg == AMDGPU::sub1 && + Src1SubReg == AMDGPU::sub0) { +// The low element of the result always comes from src0. +// The high element of the result always comes from src1. +// op_sel selects the high half of src0. +// op_sel_hi selects the high half of src1. + +unsigned Src0OpSel = +Src0SubReg == AMDGPU::sub1 ? SISrcMods::OP_SEL_0 : SISrcMods::NONE; +unsigned Src1OpSel = +Src1SubReg == AMDGPU::sub1 ? SISrcMods::OP_SEL_0 : SISrcMods::NONE; Sisyph wrote: It is written in a very confusing way in the docs, but I think you have it correct in the code. Out of the 6 bits (op_sel[0-2] and op_sel_hi[0-2]) only op_sel[0] and op_sel[1] do anything iiuc. https://github.com/llvm/llvm-project/pull/123684 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Implement isExtractVecEltCheap (PR #122460)
@@ -1949,6 +1949,13 @@ bool SITargetLowering::isExtractSubvectorCheap(EVT ResVT, EVT SrcVT, return Index == 0; } +bool SITargetLowering::isExtractVecEltCheap(EVT VT, unsigned Index) const { + // TODO: This should be more aggressive, particular for 16-bit element + // vectors. However there are some mixed improvements and regressions. + EVT EltTy = VT.getVectorElementType(); + return EltTy.getSizeInBits() % 32 == 0; Sisyph wrote: Yes I would think EltTy.getSizeInBits() * Index % 16 == 0 for True16 would be the way to go. https://github.com/llvm/llvm-project/pull/122460 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Remove redundant operand folding checks (PR #140587)
https://github.com/Sisyph approved this pull request. Your logic makes sense to me. Handing cases uniformly is good. https://github.com/llvm/llvm-project/pull/140587 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Negative gfx1250 v_dual_cndmask_b32 tests. NFC. (PR #148057)
https://github.com/Sisyph approved this pull request. https://github.com/llvm/llvm-project/pull/148057 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits