[llvm-branch-commits] [RISCV][MC] Warn if SEW/LMUL may not be compatible (PR #94313)
@@ -71,18 +73,21 @@ vsetvli a2, a0, e32, m8, ta, ma vsetvli a2, a0, e32, mf2, ta, ma # CHECK-INST: vsetvli a2, a0, e32, mf2, ta, ma +# CHECK-WARNING: :[[#@LINE-2]]:17: warning: SEW > 16 may not be compatible with all RVV implementations{{$}} # CHECK-ENCODING: [0x57,0x76,0x75,0x0d] # CHECK-ERROR: instruction requires the following: 'V' (Vector Extension for Application Processors), 'Zve32x' (Vector Extensions for Embedded Processors){{$}} # CHECK-UNKNOWN: 0d757657 vsetvli a2, a0, e32, mf4, ta, ma # CHECK-INST: vsetvli a2, a0, e32, mf4, ta, ma +# CHECK-WARNING: :[[#@LINE-2]]:17: warning: SEW > 8 may not be compatible with all RVV implementations{{$}} lukel97 wrote: I see that the spec recommends that we warn when LMUL < SEWMIN/ELEN, but do we need to warn for SEW > LMUL * ELEN? IIUC this cause a warning on zve64x too since 32 > 1/4 * 64 https://github.com/llvm/llvm-project/pull/94313 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [RISCV][MC] Warn if SEW/LMUL may not be compatible (PR #94313)
@@ -1,5 +1,7 @@ # RUN: llvm-mc -triple=riscv64 -show-encoding --mattr=+v %s \ # RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST +# RUN: llvm-mc -triple=riscv64 -show-encoding --mattr=+zve32x %s 2>&1 \ +# RUN:| FileCheck %s --check-prefix=CHECK-WARNING lukel97 wrote: Nit, can we name the prefix something like CHECK-ZVE32X https://github.com/llvm/llvm-project/pull/94313 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [RISCV][MC] Warn if SEW/LMUL may not be compatible (PR #94313)
@@ -71,18 +73,21 @@ vsetvli a2, a0, e32, m8, ta, ma vsetvli a2, a0, e32, mf2, ta, ma # CHECK-INST: vsetvli a2, a0, e32, mf2, ta, ma +# CHECK-WARNING: :[[#@LINE-2]]:17: warning: SEW > 16 may not be compatible with all RVV implementations{{$}} # CHECK-ENCODING: [0x57,0x76,0x75,0x0d] # CHECK-ERROR: instruction requires the following: 'V' (Vector Extension for Application Processors), 'Zve32x' (Vector Extensions for Embedded Processors){{$}} # CHECK-UNKNOWN: 0d757657 vsetvli a2, a0, e32, mf4, ta, ma # CHECK-INST: vsetvli a2, a0, e32, mf4, ta, ma +# CHECK-WARNING: :[[#@LINE-2]]:17: warning: SEW > 8 may not be compatible with all RVV implementations{{$}} lukel97 wrote: Ok, that seems reasonable. Should we maybe then reword the LMUL < SEWMIN/ELEN case to mention that the encoding is actually reserved, whereas for SEW > LMUL * ELEN it may just not be compatible https://github.com/llvm/llvm-project/pull/94313 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV][MC] Warn if SEW/LMUL may not be compatible (PR #94313)
https://github.com/lukel97 edited https://github.com/llvm/llvm-project/pull/94313 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV][MC] Warn if SEW/LMUL may not be compatible (PR #94313)
https://github.com/lukel97 approved this pull request. https://github.com/llvm/llvm-project/pull/94313 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV][MC] Warn if SEW/LMUL may not be compatible (PR #94313)
@@ -2211,6 +,16 @@ ParseStatus RISCVAsmParser::parseVTypeI(OperandVector &Operands) { if (getLexer().is(AsmToken::EndOfStatement) && State == VTypeState_Done) { RISCVII::VLMUL VLMUL = RISCVVType::encodeLMUL(Lmul, Fractional); +if (Fractional) { + unsigned ELEN = STI->hasFeature(RISCV::FeatureStdExtZve64x) ? 64 : 32; + unsigned MaxSEW = ELEN / Lmul; + // If MaxSEW < 8, we should have printed warning about reserved LMUL. + if (MaxSEW >= 8 && Sew > MaxSEW) +Warning(SEWLoc, +"use of vtype encodings with SEW > " + Twine(MaxSEW) + +" and LMUL == " + (Fractional ? "mf" : "m") + Twine(Lmul) + lukel97 wrote: Fractional is always true here right? https://github.com/llvm/llvm-project/pull/94313 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [RISCV] Use APInt in isSimpleVIDSequence to account for index overflow (#100072) (PR #101124)
lukel97 wrote: It's a miscompile, but it wasn't a regression since it looks like we've had it since LLVM 16 https://github.com/llvm/llvm-project/pull/101124 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Fix vmerge.vvm/vmv.v.v getting folded into ops with mismatching EEW (PR #101464)
https://github.com/lukel97 milestoned https://github.com/llvm/llvm-project/pull/101464 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Fix vmerge.vvm/vmv.v.v getting folded into ops with mismatching EEW (PR #101464)
https://github.com/lukel97 created https://github.com/llvm/llvm-project/pull/101464 This is a backport of #101152 which fixes a miscompile on RISC-V, albeit not a regression. >From 6b7c614ad8a69dfb610ed02da541fb8d3bf009e3 Mon Sep 17 00:00:00 2001 From: Luke Lau Date: Wed, 31 Jul 2024 00:28:52 +0800 Subject: [PATCH] [RISCV] Fix vmerge.vvm/vmv.v.v getting folded into ops with mismatching EEW (#101152) As noted in https://github.com/llvm/llvm-project/pull/100367/files#r1695448771, we currently fold in vmerge.vvms and vmv.v.vs into their ops even if the EEW is different which leads to an incorrect transform. This checks the op's EEW via its simple value type for now since there doesn't seem to be any existing information about the EEW size of instructions. We'll probably need to encode this at some point if we want to be able to access it at the MachineInstr level in #100367 --- llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp | 4 llvm/test/CodeGen/RISCV/rvv/combine-vmv.ll| 14 + .../RISCV/rvv/rvv-peephole-vmerge-vops.ll | 21 +++ 3 files changed, 39 insertions(+) diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp index eef6ae677ac85..db949f3476e2b 100644 --- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp +++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp @@ -3721,6 +3721,10 @@ bool RISCVDAGToDAGISel::performCombineVMergeAndVOps(SDNode *N) { assert(!Mask || cast(Mask)->getReg() == RISCV::V0); assert(!Glue || Glue.getValueType() == MVT::Glue); + // If the EEW of True is different from vmerge's SEW, then we can't fold. + if (True.getSimpleValueType() != N->getSimpleValueType(0)) +return false; + // We require that either merge and false are the same, or that merge // is undefined. if (Merge != False && !isImplicitDef(Merge)) diff --git a/llvm/test/CodeGen/RISCV/rvv/combine-vmv.ll b/llvm/test/CodeGen/RISCV/rvv/combine-vmv.ll index ec03f773c7108..dfc2b2bdda026 100644 --- a/llvm/test/CodeGen/RISCV/rvv/combine-vmv.ll +++ b/llvm/test/CodeGen/RISCV/rvv/combine-vmv.ll @@ -168,3 +168,17 @@ define @unfoldable_vredsum( %passthru, @llvm.riscv.vmv.v.v.nxv2i32( %passthru, %a, iXLen 1) ret %b } + +define @unfoldable_mismatched_sew( %passthru, %x, %y, iXLen %avl) { +; CHECK-LABEL: unfoldable_mismatched_sew: +; CHECK: # %bb.0: +; CHECK-NEXT:vsetvli zero, a0, e64, m1, ta, ma +; CHECK-NEXT:vadd.vv v9, v9, v10 +; CHECK-NEXT:vsetvli zero, a0, e32, m1, tu, ma +; CHECK-NEXT:vmv.v.v v8, v9 +; CHECK-NEXT:ret + %a = call @llvm.riscv.vadd.nxv1i64.nxv1i64( poison, %x, %y, iXLen %avl) + %a.bitcast = bitcast %a to + %b = call @llvm.riscv.vmv.v.v.nxv2i32( %passthru, %a.bitcast, iXLen %avl) + ret %b +} diff --git a/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll b/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll index a08bcae074b9b..259515f160048 100644 --- a/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll +++ b/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll @@ -1196,3 +1196,24 @@ define @true_mask_vmerge_implicit_passthru( ) ret %b } + + +define @unfoldable_mismatched_sew( %passthru, %x, %y, %mask, i64 %avl) { +; CHECK-LABEL: unfoldable_mismatched_sew: +; CHECK: # %bb.0: +; CHECK-NEXT:vsetvli zero, a0, e64, m1, ta, ma +; CHECK-NEXT:vadd.vv v9, v9, v10 +; CHECK-NEXT:vsetvli zero, a0, e32, m1, tu, ma +; CHECK-NEXT:vmv.v.v v8, v9 +; CHECK-NEXT:ret + %a = call @llvm.riscv.vadd.nxv1i64.nxv1i64( poison, %x, %y, i64 %avl) + %a.bitcast = bitcast %a to + %b = call @llvm.riscv.vmerge.nxv2i32.nxv2i32( + %passthru, + %passthru, + %a.bitcast, + splat (i1 true), +i64 %avl + ) + ret %b +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Fix vmerge.vvm/vmv.v.v getting folded into ops with mismatching EEW (PR #101464)
https://github.com/lukel97 edited https://github.com/llvm/llvm-project/pull/101464 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [RISCV] Use APInt in isSimpleVIDSequence to account for index overflow (#100072) (PR #101124)
https://github.com/lukel97 closed https://github.com/llvm/llvm-project/pull/101124 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Revert "[RISCV] Recurse on first operand of two operand shuffles (#79180)" (PR #80238)
https://github.com/lukel97 milestoned https://github.com/llvm/llvm-project/pull/80238 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [SelectionDAG] Change computeAliasing signature from optional to LocationSize. (#83017) (PR #83848)
lukel97 wrote: > I think the "Requested by" comes from the git committer. There's a PR open to fix this: #82680 > @lukel97 i'm not sure if you have already or not, but it might be good to > include the recent test you added too. Sure thing, I can't see a way of editing/pushing more commits to this PR's branch though. I'll close this and create another PR. https://github.com/llvm/llvm-project/pull/83848 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [SelectionDAG] Change computeAliasing signature from optional to LocationSize. (#83017) (PR #83848)
https://github.com/lukel97 closed https://github.com/llvm/llvm-project/pull/83848 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [SelectionDAG] Change computeAliasing signature from optional to LocationSize. (#83017) (PR #83848)
lukel97 wrote: Superseded by #83856 https://github.com/llvm/llvm-project/pull/83848 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [RISCV] Pass LMUL to copyPhysRegVector (PR #84448)
https://github.com/lukel97 commented: Is this NFC? https://github.com/llvm/llvm-project/pull/84448 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Use larger copies when register tuples are aligned (PR #84455)
@@ -302,102 +302,81 @@ void RISCVInstrInfo::copyPhysRegVector(MachineBasicBlock &MBB, RISCVII::VLMUL LMul, unsigned NF) const { const TargetRegisterInfo *TRI = STI.getRegisterInfo(); - unsigned Opc; - unsigned SubRegIdx; - unsigned VVOpc, VIOpc; - switch (LMul) { - default: -llvm_unreachable("Impossible LMUL for vector register copy."); - case RISCVII::LMUL_1: -Opc = RISCV::VMV1R_V; -SubRegIdx = RISCV::sub_vrm1_0; -VVOpc = RISCV::PseudoVMV_V_V_M1; -VIOpc = RISCV::PseudoVMV_V_I_M1; -break; - case RISCVII::LMUL_2: -Opc = RISCV::VMV2R_V; -SubRegIdx = RISCV::sub_vrm2_0; -VVOpc = RISCV::PseudoVMV_V_V_M2; -VIOpc = RISCV::PseudoVMV_V_I_M2; -break; - case RISCVII::LMUL_4: -Opc = RISCV::VMV4R_V; -SubRegIdx = RISCV::sub_vrm4_0; -VVOpc = RISCV::PseudoVMV_V_V_M4; -VIOpc = RISCV::PseudoVMV_V_I_M4; -break; - case RISCVII::LMUL_8: -assert(NF == 1); -Opc = RISCV::VMV8R_V; -SubRegIdx = RISCV::sub_vrm1_0; // There is no sub_vrm8_0. -VVOpc = RISCV::PseudoVMV_V_V_M8; -VIOpc = RISCV::PseudoVMV_V_I_M8; -break; - } - - bool UseVMV_V_V = false; - bool UseVMV_V_I = false; - MachineBasicBlock::const_iterator DefMBBI; - if (isConvertibleToVMV_V_V(STI, MBB, MBBI, DefMBBI, LMul)) { -UseVMV_V_V = true; -Opc = VVOpc; - -if (DefMBBI->getOpcode() == VIOpc) { - UseVMV_V_I = true; - Opc = VIOpc; -} - } - - if (NF == 1) { -auto MIB = BuildMI(MBB, MBBI, DL, get(Opc), DstReg); -if (UseVMV_V_V) - MIB.addReg(DstReg, RegState::Undef); -if (UseVMV_V_I) - MIB = MIB.add(DefMBBI->getOperand(2)); -else - MIB = MIB.addReg(SrcReg, getKillRegState(KillSrc)); -if (UseVMV_V_V) { - const MCInstrDesc &Desc = DefMBBI->getDesc(); - MIB.add(DefMBBI->getOperand(RISCVII::getVLOpNum(Desc))); // AVL - MIB.add(DefMBBI->getOperand(RISCVII::getSEWOpNum(Desc))); // SEW - MIB.addImm(0);// tu, mu - MIB.addReg(RISCV::VL, RegState::Implicit); - MIB.addReg(RISCV::VTYPE, RegState::Implicit); -} -return; - } - - int I = 0, End = NF, Incr = 1; unsigned SrcEncoding = TRI->getEncodingValue(SrcReg); unsigned DstEncoding = TRI->getEncodingValue(DstReg); unsigned LMulVal; bool Fractional; std::tie(LMulVal, Fractional) = RISCVVType::decodeVLMUL(LMul); assert(!Fractional && "It is impossible be fractional lmul here."); - if (forwardCopyWillClobberTuple(DstEncoding, SrcEncoding, NF * LMulVal)) { -I = NF - 1; -End = -1; -Incr = -1; - } + unsigned NumRegs = NF * LMulVal; + bool ReversedCopy = + forwardCopyWillClobberTuple(DstEncoding, SrcEncoding, NumRegs); + + unsigned I = 0; + auto GetCopyInfo = [&](MCRegister SrcReg, MCRegister DstReg) + -> std::tuple { +unsigned SrcEncoding = TRI->getEncodingValue(SrcReg); +unsigned DstEncoding = TRI->getEncodingValue(DstReg); +if (!(SrcEncoding & 0b111) && !(DstEncoding & 0b111) && I + 8 <= NumRegs) lukel97 wrote: Is this the same as `SrcEncoding % 8 == 0 && DstEncoding % 8 == 0`? https://github.com/llvm/llvm-project/pull/84455 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [RISCV][NFC] Pass LMUL to copyPhysRegVector (PR #84448)
https://github.com/lukel97 approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/84448 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Store VLMul/NF into RegisterClass's TSFlags (PR #84894)
@@ -127,8 +127,21 @@ def XLenRI : RegInfoByHwMode< [RV32, RV64], [RegInfo<32,32,32>, RegInfo<64,64,64>]>; +class RISCVRegisterClass regTypes, int align, dag regList> +: RegisterClass<"RISCV", regTypes, align, regList> { + bit IsVRegClass = 0; + int VLMul = 1; + int NF = 1; lukel97 wrote: Should these default to 0 since 0 is an invalid LMUL/NF? https://github.com/llvm/llvm-project/pull/84894 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Use larger copies when register tuples are aligned (PR #84455)
@@ -302,102 +302,81 @@ void RISCVInstrInfo::copyPhysRegVector(MachineBasicBlock &MBB, RISCVII::VLMUL LMul, unsigned NF) const { const TargetRegisterInfo *TRI = STI.getRegisterInfo(); - unsigned Opc; - unsigned SubRegIdx; - unsigned VVOpc, VIOpc; - switch (LMul) { - default: -llvm_unreachable("Impossible LMUL for vector register copy."); - case RISCVII::LMUL_1: -Opc = RISCV::VMV1R_V; -SubRegIdx = RISCV::sub_vrm1_0; -VVOpc = RISCV::PseudoVMV_V_V_M1; -VIOpc = RISCV::PseudoVMV_V_I_M1; -break; - case RISCVII::LMUL_2: -Opc = RISCV::VMV2R_V; -SubRegIdx = RISCV::sub_vrm2_0; -VVOpc = RISCV::PseudoVMV_V_V_M2; -VIOpc = RISCV::PseudoVMV_V_I_M2; -break; - case RISCVII::LMUL_4: -Opc = RISCV::VMV4R_V; -SubRegIdx = RISCV::sub_vrm4_0; -VVOpc = RISCV::PseudoVMV_V_V_M4; -VIOpc = RISCV::PseudoVMV_V_I_M4; -break; - case RISCVII::LMUL_8: -assert(NF == 1); -Opc = RISCV::VMV8R_V; -SubRegIdx = RISCV::sub_vrm1_0; // There is no sub_vrm8_0. -VVOpc = RISCV::PseudoVMV_V_V_M8; -VIOpc = RISCV::PseudoVMV_V_I_M8; -break; - } - - bool UseVMV_V_V = false; - bool UseVMV_V_I = false; - MachineBasicBlock::const_iterator DefMBBI; - if (isConvertibleToVMV_V_V(STI, MBB, MBBI, DefMBBI, LMul)) { -UseVMV_V_V = true; -Opc = VVOpc; - -if (DefMBBI->getOpcode() == VIOpc) { - UseVMV_V_I = true; - Opc = VIOpc; -} - } - - if (NF == 1) { -auto MIB = BuildMI(MBB, MBBI, DL, get(Opc), DstReg); -if (UseVMV_V_V) - MIB.addReg(DstReg, RegState::Undef); -if (UseVMV_V_I) - MIB = MIB.add(DefMBBI->getOperand(2)); -else - MIB = MIB.addReg(SrcReg, getKillRegState(KillSrc)); -if (UseVMV_V_V) { - const MCInstrDesc &Desc = DefMBBI->getDesc(); - MIB.add(DefMBBI->getOperand(RISCVII::getVLOpNum(Desc))); // AVL - MIB.add(DefMBBI->getOperand(RISCVII::getSEWOpNum(Desc))); // SEW - MIB.addImm(0);// tu, mu - MIB.addReg(RISCV::VL, RegState::Implicit); - MIB.addReg(RISCV::VTYPE, RegState::Implicit); -} -return; - } - - int I = 0, End = NF, Incr = 1; unsigned SrcEncoding = TRI->getEncodingValue(SrcReg); unsigned DstEncoding = TRI->getEncodingValue(DstReg); unsigned LMulVal; bool Fractional; std::tie(LMulVal, Fractional) = RISCVVType::decodeVLMUL(LMul); assert(!Fractional && "It is impossible be fractional lmul here."); - if (forwardCopyWillClobberTuple(DstEncoding, SrcEncoding, NF * LMulVal)) { -I = NF - 1; -End = -1; -Incr = -1; - } + unsigned NumRegs = NF * LMulVal; + bool ReversedCopy = + forwardCopyWillClobberTuple(DstEncoding, SrcEncoding, NumRegs); + + unsigned I = 0; + auto GetCopyInfo = [&](MCRegister SrcReg, MCRegister DstReg) + -> std::tuple { +unsigned SrcEncoding = TRI->getEncodingValue(SrcReg); +unsigned DstEncoding = TRI->getEncodingValue(DstReg); +if (!(SrcEncoding & 0b111) && !(DstEncoding & 0b111) && I + 8 <= NumRegs) lukel97 wrote: Ah ok, just wanted to check. I found it a bit hard to read but I'm not strongly opinionated https://github.com/llvm/llvm-project/pull/84455 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Use larger copies when register tuples are aligned (PR #84455)
@@ -146,16 +127,12 @@ body: | ; CHECK-NEXT: $v7 = VMV1R_V $v12 ; CHECK-NEXT: $v8 = VMV1R_V $v13 ; CHECK-NEXT: $v9 = VMV1R_V $v14 -; CHECK-NEXT: $v6 = VMV1R_V $v10 -; CHECK-NEXT: $v7 = VMV1R_V $v11 -; CHECK-NEXT: $v8 = VMV1R_V $v12 -; CHECK-NEXT: $v9 = VMV1R_V $v13 -; CHECK-NEXT: $v10 = VMV1R_V $v14 -; CHECK-NEXT: $v18 = VMV1R_V $v14 -; CHECK-NEXT: $v17 = VMV1R_V $v13 -; CHECK-NEXT: $v16 = VMV1R_V $v12 -; CHECK-NEXT: $v15 = VMV1R_V $v11 -; CHECK-NEXT: $v14 = VMV1R_V $v10 +; CHECK-NEXT: $v6m2 = VMV2R_V $v10m2 +; CHECK-NEXT: $v8m2 = VMV2R_V $v12m2 +; CHECK-NEXT: $v8 = VMV1R_V $v14 lukel97 wrote: Shouldn't this be `$v10 = VMV1R_V $v14`? https://github.com/llvm/llvm-project/pull/84455 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Use larger copies when register tuples are aligned (PR #84455)
@@ -146,16 +127,12 @@ body: | ; CHECK-NEXT: $v7 = VMV1R_V $v12 ; CHECK-NEXT: $v8 = VMV1R_V $v13 ; CHECK-NEXT: $v9 = VMV1R_V $v14 -; CHECK-NEXT: $v6 = VMV1R_V $v10 -; CHECK-NEXT: $v7 = VMV1R_V $v11 -; CHECK-NEXT: $v8 = VMV1R_V $v12 -; CHECK-NEXT: $v9 = VMV1R_V $v13 -; CHECK-NEXT: $v10 = VMV1R_V $v14 -; CHECK-NEXT: $v18 = VMV1R_V $v14 -; CHECK-NEXT: $v17 = VMV1R_V $v13 -; CHECK-NEXT: $v16 = VMV1R_V $v12 -; CHECK-NEXT: $v15 = VMV1R_V $v11 -; CHECK-NEXT: $v14 = VMV1R_V $v10 +; CHECK-NEXT: $v6m2 = VMV2R_V $v10m2 +; CHECK-NEXT: $v8m2 = VMV2R_V $v12m2 +; CHECK-NEXT: $v8 = VMV1R_V $v14 +; CHECK-NEXT: $v14m2 = VMV2R_V $v10m2 +; CHECK-NEXT: $v12m2 = VMV2R_V $v8m2 +; CHECK-NEXT: $v8 = VMV1R_V $v4 lukel97 wrote: And this should be like? ``` $v18 = VMV1R_V $v14 $v16 = VMV2R_V $v12m2 $v14 = VMV2R_V $v10m2 ``` https://github.com/llvm/llvm-project/pull/84455 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Use larger copies when register tuples are aligned (PR #84455)
@@ -302,102 +302,98 @@ void RISCVInstrInfo::copyPhysRegVector(MachineBasicBlock &MBB, RISCVII::VLMUL LMul, unsigned NF) const { const TargetRegisterInfo *TRI = STI.getRegisterInfo(); - unsigned Opc; - unsigned SubRegIdx; - unsigned VVOpc, VIOpc; - switch (LMul) { - default: -llvm_unreachable("Impossible LMUL for vector register copy."); - case RISCVII::LMUL_1: -Opc = RISCV::VMV1R_V; -SubRegIdx = RISCV::sub_vrm1_0; -VVOpc = RISCV::PseudoVMV_V_V_M1; -VIOpc = RISCV::PseudoVMV_V_I_M1; -break; - case RISCVII::LMUL_2: -Opc = RISCV::VMV2R_V; -SubRegIdx = RISCV::sub_vrm2_0; -VVOpc = RISCV::PseudoVMV_V_V_M2; -VIOpc = RISCV::PseudoVMV_V_I_M2; -break; - case RISCVII::LMUL_4: -Opc = RISCV::VMV4R_V; -SubRegIdx = RISCV::sub_vrm4_0; -VVOpc = RISCV::PseudoVMV_V_V_M4; -VIOpc = RISCV::PseudoVMV_V_I_M4; -break; - case RISCVII::LMUL_8: -assert(NF == 1); -Opc = RISCV::VMV8R_V; -SubRegIdx = RISCV::sub_vrm1_0; // There is no sub_vrm8_0. -VVOpc = RISCV::PseudoVMV_V_V_M8; -VIOpc = RISCV::PseudoVMV_V_I_M8; -break; - } - - bool UseVMV_V_V = false; - bool UseVMV_V_I = false; - MachineBasicBlock::const_iterator DefMBBI; - if (isConvertibleToVMV_V_V(STI, MBB, MBBI, DefMBBI, LMul)) { -UseVMV_V_V = true; -Opc = VVOpc; - -if (DefMBBI->getOpcode() == VIOpc) { - UseVMV_V_I = true; - Opc = VIOpc; -} - } - - if (NF == 1) { -auto MIB = BuildMI(MBB, MBBI, DL, get(Opc), DstReg); -if (UseVMV_V_V) - MIB.addReg(DstReg, RegState::Undef); -if (UseVMV_V_I) - MIB = MIB.add(DefMBBI->getOperand(2)); -else - MIB = MIB.addReg(SrcReg, getKillRegState(KillSrc)); -if (UseVMV_V_V) { - const MCInstrDesc &Desc = DefMBBI->getDesc(); - MIB.add(DefMBBI->getOperand(RISCVII::getVLOpNum(Desc))); // AVL - MIB.add(DefMBBI->getOperand(RISCVII::getSEWOpNum(Desc))); // SEW - MIB.addImm(0);// tu, mu - MIB.addReg(RISCV::VL, RegState::Implicit); - MIB.addReg(RISCV::VTYPE, RegState::Implicit); -} -return; - } - - int I = 0, End = NF, Incr = 1; unsigned SrcEncoding = TRI->getEncodingValue(SrcReg); unsigned DstEncoding = TRI->getEncodingValue(DstReg); unsigned LMulVal; bool Fractional; std::tie(LMulVal, Fractional) = RISCVVType::decodeVLMUL(LMul); assert(!Fractional && "It is impossible be fractional lmul here."); - if (forwardCopyWillClobberTuple(DstEncoding, SrcEncoding, NF * LMulVal)) { -I = NF - 1; -End = -1; -Incr = -1; - } + unsigned NumRegs = NF * LMulVal; + bool ReversedCopy = + forwardCopyWillClobberTuple(DstEncoding, SrcEncoding, NumRegs); + if (ReversedCopy) { +// If there exists overlapping, we should copy the registers reversely. +SrcEncoding += NumRegs - LMulVal; +DstEncoding += NumRegs - LMulVal; + } + + unsigned I = 0; + auto GetCopyInfo = [&](uint16_t SrcEncoding, uint16_t DstEncoding) + -> std::tuple { +// If source register encoding and destination register encoding are aligned +// to 8, we can do a LMUL8 copying. +if (SrcEncoding % 8 == 0 && DstEncoding % 8 == 0 && I + 8 <= NumRegs) + return {RISCVII::LMUL_8, RISCV::VRM8RegClass, RISCV::VMV8R_V, + RISCV::PseudoVMV_V_V_M8, RISCV::PseudoVMV_V_I_M8}; +// If source register encoding and destination register encoding are aligned +// to 4, we can do a LMUL4 copying. +if (SrcEncoding % 4 == 0 && DstEncoding % 4 == 0 && I + 4 <= NumRegs) + return {RISCVII::LMUL_4, RISCV::VRM4RegClass, RISCV::VMV4R_V, + RISCV::PseudoVMV_V_V_M4, RISCV::PseudoVMV_V_I_M4}; +// If source register encoding and destination register encoding are aligned +// to 2, we can do a LMUL2 copying. +if (SrcEncoding % 2 == 0 && DstEncoding % 2 == 0 && I + 2 <= NumRegs) + return {RISCVII::LMUL_2, RISCV::VRM2RegClass, RISCV::VMV2R_V, + RISCV::PseudoVMV_V_V_M2, RISCV::PseudoVMV_V_I_M2}; +// Or we should do LMUL1 copying. +return {RISCVII::LMUL_1, RISCV::VRRegClass, RISCV::VMV1R_V, +RISCV::PseudoVMV_V_V_M1, RISCV::PseudoVMV_V_I_M1}; + }; + auto FindRegWithEncoding = [&TRI](const TargetRegisterClass &RegClass, +uint16_t Encoding) { +ArrayRef Regs = RegClass.getRegisters(); +const auto *FoundReg = llvm::find_if(Regs, [&](MCPhysReg Reg) { + return TRI->getEncodingValue(Reg) == Encoding; +}); +// We should be always able to find one valid register. +assert(FoundReg != Regs.end()); +return *FoundReg; + }; lukel97 wrote: Would it be easier to get the register via `TRI->getSubReg`? I think you should be able to compute the subreg index based off the RegClass and `I`. I don't think you'll need to compose any subreg indices like in `RISCVTargetLowering::decomposeSubvectorInsertExtractToSubRegs` ht
[llvm-branch-commits] [llvm] [RISCV] Use larger copies when register tuples are aligned (PR #84455)
@@ -302,102 +302,98 @@ void RISCVInstrInfo::copyPhysRegVector(MachineBasicBlock &MBB, RISCVII::VLMUL LMul, unsigned NF) const { const TargetRegisterInfo *TRI = STI.getRegisterInfo(); - unsigned Opc; - unsigned SubRegIdx; - unsigned VVOpc, VIOpc; - switch (LMul) { - default: -llvm_unreachable("Impossible LMUL for vector register copy."); - case RISCVII::LMUL_1: -Opc = RISCV::VMV1R_V; -SubRegIdx = RISCV::sub_vrm1_0; -VVOpc = RISCV::PseudoVMV_V_V_M1; -VIOpc = RISCV::PseudoVMV_V_I_M1; -break; - case RISCVII::LMUL_2: -Opc = RISCV::VMV2R_V; -SubRegIdx = RISCV::sub_vrm2_0; -VVOpc = RISCV::PseudoVMV_V_V_M2; -VIOpc = RISCV::PseudoVMV_V_I_M2; -break; - case RISCVII::LMUL_4: -Opc = RISCV::VMV4R_V; -SubRegIdx = RISCV::sub_vrm4_0; -VVOpc = RISCV::PseudoVMV_V_V_M4; -VIOpc = RISCV::PseudoVMV_V_I_M4; -break; - case RISCVII::LMUL_8: -assert(NF == 1); -Opc = RISCV::VMV8R_V; -SubRegIdx = RISCV::sub_vrm1_0; // There is no sub_vrm8_0. -VVOpc = RISCV::PseudoVMV_V_V_M8; -VIOpc = RISCV::PseudoVMV_V_I_M8; -break; - } - - bool UseVMV_V_V = false; - bool UseVMV_V_I = false; - MachineBasicBlock::const_iterator DefMBBI; - if (isConvertibleToVMV_V_V(STI, MBB, MBBI, DefMBBI, LMul)) { -UseVMV_V_V = true; -Opc = VVOpc; - -if (DefMBBI->getOpcode() == VIOpc) { - UseVMV_V_I = true; - Opc = VIOpc; -} - } - - if (NF == 1) { -auto MIB = BuildMI(MBB, MBBI, DL, get(Opc), DstReg); -if (UseVMV_V_V) - MIB.addReg(DstReg, RegState::Undef); -if (UseVMV_V_I) - MIB = MIB.add(DefMBBI->getOperand(2)); -else - MIB = MIB.addReg(SrcReg, getKillRegState(KillSrc)); -if (UseVMV_V_V) { - const MCInstrDesc &Desc = DefMBBI->getDesc(); - MIB.add(DefMBBI->getOperand(RISCVII::getVLOpNum(Desc))); // AVL - MIB.add(DefMBBI->getOperand(RISCVII::getSEWOpNum(Desc))); // SEW - MIB.addImm(0);// tu, mu - MIB.addReg(RISCV::VL, RegState::Implicit); - MIB.addReg(RISCV::VTYPE, RegState::Implicit); -} -return; - } - - int I = 0, End = NF, Incr = 1; unsigned SrcEncoding = TRI->getEncodingValue(SrcReg); unsigned DstEncoding = TRI->getEncodingValue(DstReg); unsigned LMulVal; bool Fractional; std::tie(LMulVal, Fractional) = RISCVVType::decodeVLMUL(LMul); assert(!Fractional && "It is impossible be fractional lmul here."); - if (forwardCopyWillClobberTuple(DstEncoding, SrcEncoding, NF * LMulVal)) { -I = NF - 1; -End = -1; -Incr = -1; - } + unsigned NumRegs = NF * LMulVal; + bool ReversedCopy = + forwardCopyWillClobberTuple(DstEncoding, SrcEncoding, NumRegs); + if (ReversedCopy) { +// If there exists overlapping, we should copy the registers reversely. lukel97 wrote: Nit, maybe clarify this happens when copying tuples? ```suggestion // If the src and dest overlap when copying a tuple, we need to copy the registers in reverse. ``` https://github.com/llvm/llvm-project/pull/84455 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Use larger copies when register tuples are aligned (PR #84455)
@@ -212,19 +185,13 @@ body: | ; CHECK-NEXT: $v7 = VMV1R_V $v14 ; CHECK-NEXT: $v8 = VMV1R_V $v15 ; CHECK-NEXT: $v9 = VMV1R_V $v16 -; CHECK-NEXT: $v4 = VMV1R_V $v10 -; CHECK-NEXT: $v5 = VMV1R_V $v11 -; CHECK-NEXT: $v6 = VMV1R_V $v12 -; CHECK-NEXT: $v7 = VMV1R_V $v13 -; CHECK-NEXT: $v8 = VMV1R_V $v14 -; CHECK-NEXT: $v9 = VMV1R_V $v15 +; CHECK-NEXT: $v4m2 = VMV2R_V $v10m2 +; CHECK-NEXT: $v6m2 = VMV2R_V $v12m2 +; CHECK-NEXT: $v8m2 = VMV2R_V $v14m2 ; CHECK-NEXT: $v10 = VMV1R_V $v16 -; CHECK-NEXT: $v22 = VMV1R_V $v16 -; CHECK-NEXT: $v21 = VMV1R_V $v15 -; CHECK-NEXT: $v20 = VMV1R_V $v14 -; CHECK-NEXT: $v19 = VMV1R_V $v13 -; CHECK-NEXT: $v18 = VMV1R_V $v12 -; CHECK-NEXT: $v17 = VMV1R_V $v11 +; CHECK-NEXT: $v22m2 = VMV2R_V $v16m2 +; CHECK-NEXT: $v20m2 = VMV2R_V $v14m2 +; CHECK-NEXT: $v18m2 = VMV2R_V $v12m2 ; CHECK-NEXT: $v16 = VMV1R_V $v10 lukel97 wrote: Do we have a test for a copy like: ``` $v16_v17_v18_v19_v20_v21_v22 = COPY $v15_v16_v17_v18_v19_v20_v21 ``` Because I think this will need to be all VMV1R_Vs. Does it already do this? https://github.com/llvm/llvm-project/pull/84455 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Use larger copies when register tuples are aligned (PR #84455)
@@ -302,102 +302,98 @@ void RISCVInstrInfo::copyPhysRegVector(MachineBasicBlock &MBB, RISCVII::VLMUL LMul, unsigned NF) const { const TargetRegisterInfo *TRI = STI.getRegisterInfo(); - unsigned Opc; - unsigned SubRegIdx; - unsigned VVOpc, VIOpc; - switch (LMul) { - default: -llvm_unreachable("Impossible LMUL for vector register copy."); - case RISCVII::LMUL_1: -Opc = RISCV::VMV1R_V; -SubRegIdx = RISCV::sub_vrm1_0; -VVOpc = RISCV::PseudoVMV_V_V_M1; -VIOpc = RISCV::PseudoVMV_V_I_M1; -break; - case RISCVII::LMUL_2: -Opc = RISCV::VMV2R_V; -SubRegIdx = RISCV::sub_vrm2_0; -VVOpc = RISCV::PseudoVMV_V_V_M2; -VIOpc = RISCV::PseudoVMV_V_I_M2; -break; - case RISCVII::LMUL_4: -Opc = RISCV::VMV4R_V; -SubRegIdx = RISCV::sub_vrm4_0; -VVOpc = RISCV::PseudoVMV_V_V_M4; -VIOpc = RISCV::PseudoVMV_V_I_M4; -break; - case RISCVII::LMUL_8: -assert(NF == 1); -Opc = RISCV::VMV8R_V; -SubRegIdx = RISCV::sub_vrm1_0; // There is no sub_vrm8_0. -VVOpc = RISCV::PseudoVMV_V_V_M8; -VIOpc = RISCV::PseudoVMV_V_I_M8; -break; - } - - bool UseVMV_V_V = false; - bool UseVMV_V_I = false; - MachineBasicBlock::const_iterator DefMBBI; - if (isConvertibleToVMV_V_V(STI, MBB, MBBI, DefMBBI, LMul)) { -UseVMV_V_V = true; -Opc = VVOpc; - -if (DefMBBI->getOpcode() == VIOpc) { - UseVMV_V_I = true; - Opc = VIOpc; -} - } - - if (NF == 1) { -auto MIB = BuildMI(MBB, MBBI, DL, get(Opc), DstReg); -if (UseVMV_V_V) - MIB.addReg(DstReg, RegState::Undef); -if (UseVMV_V_I) - MIB = MIB.add(DefMBBI->getOperand(2)); -else - MIB = MIB.addReg(SrcReg, getKillRegState(KillSrc)); -if (UseVMV_V_V) { - const MCInstrDesc &Desc = DefMBBI->getDesc(); - MIB.add(DefMBBI->getOperand(RISCVII::getVLOpNum(Desc))); // AVL - MIB.add(DefMBBI->getOperand(RISCVII::getSEWOpNum(Desc))); // SEW - MIB.addImm(0);// tu, mu - MIB.addReg(RISCV::VL, RegState::Implicit); - MIB.addReg(RISCV::VTYPE, RegState::Implicit); -} -return; - } - - int I = 0, End = NF, Incr = 1; unsigned SrcEncoding = TRI->getEncodingValue(SrcReg); unsigned DstEncoding = TRI->getEncodingValue(DstReg); unsigned LMulVal; bool Fractional; std::tie(LMulVal, Fractional) = RISCVVType::decodeVLMUL(LMul); assert(!Fractional && "It is impossible be fractional lmul here."); - if (forwardCopyWillClobberTuple(DstEncoding, SrcEncoding, NF * LMulVal)) { -I = NF - 1; -End = -1; -Incr = -1; - } + unsigned NumRegs = NF * LMulVal; + bool ReversedCopy = + forwardCopyWillClobberTuple(DstEncoding, SrcEncoding, NumRegs); + if (ReversedCopy) { +// If there exists overlapping, we should copy the registers reversely. +SrcEncoding += NumRegs - LMulVal; +DstEncoding += NumRegs - LMulVal; + } + + unsigned I = 0; + auto GetCopyInfo = [&](uint16_t SrcEncoding, uint16_t DstEncoding) + -> std::tuple { +// If source register encoding and destination register encoding are aligned +// to 8, we can do a LMUL8 copying. +if (SrcEncoding % 8 == 0 && DstEncoding % 8 == 0 && I + 8 <= NumRegs) + return {RISCVII::LMUL_8, RISCV::VRM8RegClass, RISCV::VMV8R_V, + RISCV::PseudoVMV_V_V_M8, RISCV::PseudoVMV_V_I_M8}; +// If source register encoding and destination register encoding are aligned +// to 4, we can do a LMUL4 copying. +if (SrcEncoding % 4 == 0 && DstEncoding % 4 == 0 && I + 4 <= NumRegs) + return {RISCVII::LMUL_4, RISCV::VRM4RegClass, RISCV::VMV4R_V, + RISCV::PseudoVMV_V_V_M4, RISCV::PseudoVMV_V_I_M4}; +// If source register encoding and destination register encoding are aligned +// to 2, we can do a LMUL2 copying. +if (SrcEncoding % 2 == 0 && DstEncoding % 2 == 0 && I + 2 <= NumRegs) + return {RISCVII::LMUL_2, RISCV::VRM2RegClass, RISCV::VMV2R_V, + RISCV::PseudoVMV_V_V_M2, RISCV::PseudoVMV_V_I_M2}; +// Or we should do LMUL1 copying. +return {RISCVII::LMUL_1, RISCV::VRRegClass, RISCV::VMV1R_V, +RISCV::PseudoVMV_V_V_M1, RISCV::PseudoVMV_V_I_M1}; + }; + auto FindRegWithEncoding = [&TRI](const TargetRegisterClass &RegClass, +uint16_t Encoding) { +ArrayRef Regs = RegClass.getRegisters(); +const auto *FoundReg = llvm::find_if(Regs, [&](MCPhysReg Reg) { + return TRI->getEncodingValue(Reg) == Encoding; +}); +// We should be always able to find one valid register. +assert(FoundReg != Regs.end()); +return *FoundReg; + }; lukel97 wrote: I presume you don't need to use a subreg index if the register is a VRN8M1 and you're trying to do a VRM8 copy? Since the VRM8 reg class should be a subclass of VRN8M1 right? (Hope I'm getting the subreg/subclass terminology right btw) https://github.com/llvm/llvm-project/pull/8
[llvm-branch-commits] [llvm] [RISCV] Use larger copies when register tuples are aligned (PR #84455)
@@ -302,102 +302,98 @@ void RISCVInstrInfo::copyPhysRegVector(MachineBasicBlock &MBB, RISCVII::VLMUL LMul, unsigned NF) const { const TargetRegisterInfo *TRI = STI.getRegisterInfo(); - unsigned Opc; - unsigned SubRegIdx; - unsigned VVOpc, VIOpc; - switch (LMul) { - default: -llvm_unreachable("Impossible LMUL for vector register copy."); - case RISCVII::LMUL_1: -Opc = RISCV::VMV1R_V; -SubRegIdx = RISCV::sub_vrm1_0; -VVOpc = RISCV::PseudoVMV_V_V_M1; -VIOpc = RISCV::PseudoVMV_V_I_M1; -break; - case RISCVII::LMUL_2: -Opc = RISCV::VMV2R_V; -SubRegIdx = RISCV::sub_vrm2_0; -VVOpc = RISCV::PseudoVMV_V_V_M2; -VIOpc = RISCV::PseudoVMV_V_I_M2; -break; - case RISCVII::LMUL_4: -Opc = RISCV::VMV4R_V; -SubRegIdx = RISCV::sub_vrm4_0; -VVOpc = RISCV::PseudoVMV_V_V_M4; -VIOpc = RISCV::PseudoVMV_V_I_M4; -break; - case RISCVII::LMUL_8: -assert(NF == 1); -Opc = RISCV::VMV8R_V; -SubRegIdx = RISCV::sub_vrm1_0; // There is no sub_vrm8_0. -VVOpc = RISCV::PseudoVMV_V_V_M8; -VIOpc = RISCV::PseudoVMV_V_I_M8; -break; - } - - bool UseVMV_V_V = false; - bool UseVMV_V_I = false; - MachineBasicBlock::const_iterator DefMBBI; - if (isConvertibleToVMV_V_V(STI, MBB, MBBI, DefMBBI, LMul)) { -UseVMV_V_V = true; -Opc = VVOpc; - -if (DefMBBI->getOpcode() == VIOpc) { - UseVMV_V_I = true; - Opc = VIOpc; -} - } - - if (NF == 1) { -auto MIB = BuildMI(MBB, MBBI, DL, get(Opc), DstReg); -if (UseVMV_V_V) - MIB.addReg(DstReg, RegState::Undef); -if (UseVMV_V_I) - MIB = MIB.add(DefMBBI->getOperand(2)); -else - MIB = MIB.addReg(SrcReg, getKillRegState(KillSrc)); -if (UseVMV_V_V) { - const MCInstrDesc &Desc = DefMBBI->getDesc(); - MIB.add(DefMBBI->getOperand(RISCVII::getVLOpNum(Desc))); // AVL - MIB.add(DefMBBI->getOperand(RISCVII::getSEWOpNum(Desc))); // SEW - MIB.addImm(0);// tu, mu - MIB.addReg(RISCV::VL, RegState::Implicit); - MIB.addReg(RISCV::VTYPE, RegState::Implicit); -} -return; - } - - int I = 0, End = NF, Incr = 1; unsigned SrcEncoding = TRI->getEncodingValue(SrcReg); unsigned DstEncoding = TRI->getEncodingValue(DstReg); unsigned LMulVal; bool Fractional; std::tie(LMulVal, Fractional) = RISCVVType::decodeVLMUL(LMul); assert(!Fractional && "It is impossible be fractional lmul here."); - if (forwardCopyWillClobberTuple(DstEncoding, SrcEncoding, NF * LMulVal)) { -I = NF - 1; -End = -1; -Incr = -1; - } + unsigned NumRegs = NF * LMulVal; + bool ReversedCopy = + forwardCopyWillClobberTuple(DstEncoding, SrcEncoding, NumRegs); + if (ReversedCopy) { +// If there exists overlapping, we should copy the registers reversely. +SrcEncoding += NumRegs - LMulVal; +DstEncoding += NumRegs - LMulVal; + } + + unsigned I = 0; + auto GetCopyInfo = [&](uint16_t SrcEncoding, uint16_t DstEncoding) + -> std::tuple { +// If source register encoding and destination register encoding are aligned +// to 8, we can do a LMUL8 copying. +if (SrcEncoding % 8 == 0 && DstEncoding % 8 == 0 && I + 8 <= NumRegs) + return {RISCVII::LMUL_8, RISCV::VRM8RegClass, RISCV::VMV8R_V, + RISCV::PseudoVMV_V_V_M8, RISCV::PseudoVMV_V_I_M8}; +// If source register encoding and destination register encoding are aligned +// to 4, we can do a LMUL4 copying. +if (SrcEncoding % 4 == 0 && DstEncoding % 4 == 0 && I + 4 <= NumRegs) + return {RISCVII::LMUL_4, RISCV::VRM4RegClass, RISCV::VMV4R_V, + RISCV::PseudoVMV_V_V_M4, RISCV::PseudoVMV_V_I_M4}; +// If source register encoding and destination register encoding are aligned +// to 2, we can do a LMUL2 copying. +if (SrcEncoding % 2 == 0 && DstEncoding % 2 == 0 && I + 2 <= NumRegs) + return {RISCVII::LMUL_2, RISCV::VRM2RegClass, RISCV::VMV2R_V, + RISCV::PseudoVMV_V_V_M2, RISCV::PseudoVMV_V_I_M2}; +// Or we should do LMUL1 copying. +return {RISCVII::LMUL_1, RISCV::VRRegClass, RISCV::VMV1R_V, +RISCV::PseudoVMV_V_V_M1, RISCV::PseudoVMV_V_I_M1}; + }; + auto FindRegWithEncoding = [&TRI](const TargetRegisterClass &RegClass, +uint16_t Encoding) { +ArrayRef Regs = RegClass.getRegisters(); +const auto *FoundReg = llvm::find_if(Regs, [&](MCPhysReg Reg) { + return TRI->getEncodingValue(Reg) == Encoding; +}); +// We should be always able to find one valid register. +assert(FoundReg != Regs.end()); +return *FoundReg; + }; lukel97 wrote: Yeah, although I thought that `GetCopyInfo` was already checking that SrcReg/DstReg was aligned to the VRM8 reg class. But I just checked and it looks like there's only subregisters on tuples for the same LMUL, e.g. V0_V1_V2_V3_V4_V5_V6_V7 from VRN8M1 only has the LMUL1 subregi
[llvm-branch-commits] [llvm] [RISCV] Use larger copies when register tuples are aligned (PR #84455)
https://github.com/lukel97 approved this pull request. https://github.com/llvm/llvm-project/pull/84455 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [RISCV] Remove hasSideEffects=1 for saturating/fault-only-first instructions (PR #90049)
https://github.com/lukel97 commented: Removing it from vleNff sense to me. As long as we have the implicit-def $vl on the pseudo to prevent it being moved between vsetvlis I think it should be ok. https://github.com/llvm/llvm-project/pull/90049 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [RISCV] Remove hasSideEffects=1 for saturating/fault-only-first instructions (PR #90049)
https://github.com/lukel97 edited https://github.com/llvm/llvm-project/pull/90049 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [RISCV] Remove hasSideEffects=1 for saturating/fault-only-first instructions (PR #90049)
https://github.com/lukel97 edited https://github.com/llvm/llvm-project/pull/90049 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [RISCV] Remove hasSideEffects=1 for saturating/fault-only-first instructions (PR #90049)
https://github.com/lukel97 approved this pull request. https://github.com/llvm/llvm-project/pull/90049 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [RISCV] Remove hasSideEffects=1 for saturating/fault-only-first instructions (PR #90049)
@@ -194,15 +194,12 @@ define void @vpmerge_vpload_store( %passthru, ptr %p, , i64 } @llvm.riscv.vleff.nxv2i32(, ptr, i64) define @vpmerge_vleff( %passthru, ptr %p, %m, i32 zeroext %vl) { ; CHECK-LABEL: vpmerge_vleff: ; CHECK: # %bb.0: -; CHECK-NEXT:vsetvli zero, a1, e32, m1, ta, ma -; CHECK-NEXT:vle32ff.v v9, (a0) -; CHECK-NEXT:vsetvli zero, a1, e32, m1, tu, ma -; CHECK-NEXT:vmerge.vvm v8, v8, v9, v0 +; CHECK-NEXT:vsetvli zero, a1, e32, m1, tu, mu +; CHECK-NEXT:vle32ff.v v8, (a0), v0.t lukel97 wrote: Looks correct to me. https://github.com/llvm/llvm-project/pull/90049 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/18.x: [RISCV] Re-separate unaligned scalar and vector memory features in the backend. (PR #92143)
https://github.com/lukel97 approved this pull request. Chiming in that this seems reasonable to me, given the performance impact of not having unaligned scalar accesses. And hopefully we can remove this one we're settled on a proper interface. https://github.com/llvm/llvm-project/pull/92143 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] c0b9269 - [RISCV] Add helper to copy the AVL of another VSETVLIInfo. NFC
Author: Luke Lau Date: 2023-11-30T15:19:46+08:00 New Revision: c0b926939829d9d4bb6ac5825e62f30960b6ed22 URL: https://github.com/llvm/llvm-project/commit/c0b926939829d9d4bb6ac5825e62f30960b6ed22 DIFF: https://github.com/llvm/llvm-project/commit/c0b926939829d9d4bb6ac5825e62f30960b6ed22.diff LOG: [RISCV] Add helper to copy the AVL of another VSETVLIInfo. NFC Added: Modified: llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp Removed: diff --git a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp index 3bbc85d836c3f4a..3bb648359e39dd6 100644 --- a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp +++ b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp @@ -477,6 +477,18 @@ class VSETVLIInfo { return AVLImm; } + void setAVL(VSETVLIInfo Info) { +assert(Info.isValid()); +if (Info.isUnknown()) + setUnknown(); +else if (Info.hasAVLReg()) + setAVLReg(Info.getAVLReg()); +else { + assert(Info.hasAVLImm()); + setAVLImm(Info.getAVLImm()); +} + } + unsigned getSEW() const { return SEW; } RISCVII::VLMUL getVLMUL() const { return VLMul; } @@ -1054,10 +1066,7 @@ void RISCVInsertVSETVLI::transferBefore(VSETVLIInfo &Info, // TODO: We can probably relax this for immediates. if (Demanded.VLZeroness && !Demanded.VLAny && PrevInfo.isValid() && PrevInfo.hasEquallyZeroAVL(Info, *MRI) && Info.hasSameVLMAX(PrevInfo)) { -if (PrevInfo.hasAVLImm()) - Info.setAVLImm(PrevInfo.getAVLImm()); -else - Info.setAVLReg(PrevInfo.getAVLReg()); +Info.setAVL(PrevInfo); return; } @@ -1074,10 +1083,7 @@ void RISCVInsertVSETVLI::transferBefore(VSETVLIInfo &Info, VSETVLIInfo DefInfo = getInfoForVSETVLI(*DefMI); if (DefInfo.hasSameVLMAX(Info) && (DefInfo.hasAVLImm() || DefInfo.getAVLReg() == RISCV::X0)) { -if (DefInfo.hasAVLImm()) - Info.setAVLImm(DefInfo.getAVLImm()); -else - Info.setAVLReg(DefInfo.getAVLReg()); +Info.setAVL(DefInfo); return; } } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Backport 5cf9f2cd9888feea23a624c1de3cc37ce8ce8112 to release/18.x (PR #79931)
https://github.com/lukel97 milestoned https://github.com/llvm/llvm-project/pull/79931 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Backport 5cf9f2cd9888feea23a624c1de3cc37ce8ce8112 to release/18.x (PR #79931)
https://github.com/lukel97 created https://github.com/llvm/llvm-project/pull/79931 This cherry picks a fix 5cf9f2cd9888feea23a624c1de3cc37ce8ce8112 for a miscompile (only with the -mrvv-vector-bits=zvl configuration or similar) introduced in bb8a8770e203ba027d141cd1200e93809ea66c8f, which is present in the 18.x release branch. It also includes a commit that adds a test d407e6ca61a422f25841674d8f0b5ea0dbec85f8 >From 5b3331f29489446d7d723a33310b7fec37153976 Mon Sep 17 00:00:00 2001 From: Luke Lau Date: Fri, 26 Jan 2024 20:16:21 +0700 Subject: [PATCH 1/2] [RISCV] Add test to showcase miscompile from #79072 --- .../rvv/fixed-vectors-shuffle-exact-vlen.ll| 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-exact-vlen.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-exact-vlen.ll index f53b51e05c572..c0b02f62444ef 100644 --- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-exact-vlen.ll +++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-exact-vlen.ll @@ -138,8 +138,8 @@ define <4 x i64> @m2_splat_two_source(<4 x i64> %v1, <4 x i64> %v2) vscale_range ret <4 x i64> %res } -define <4 x i64> @m2_splat_into_identity_two_source(<4 x i64> %v1, <4 x i64> %v2) vscale_range(2,2) { -; CHECK-LABEL: m2_splat_into_identity_two_source: +define <4 x i64> @m2_splat_into_identity_two_source_v2_hi(<4 x i64> %v1, <4 x i64> %v2) vscale_range(2,2) { +; CHECK-LABEL: m2_splat_into_identity_two_source_v2_hi: ; CHECK: # %bb.0: ; CHECK-NEXT:vsetivli zero, 2, e64, m1, ta, ma ; CHECK-NEXT:vrgather.vi v10, v8, 0 @@ -149,6 +149,20 @@ define <4 x i64> @m2_splat_into_identity_two_source(<4 x i64> %v1, <4 x i64> %v2 ret <4 x i64> %res } +; FIXME: This is a miscompile, we're clobbering the lower reg group of %v2 +; (v10), and the vmv1r.v is moving from the wrong reg group (should be v10) +define <4 x i64> @m2_splat_into_slide_two_source_v2_lo(<4 x i64> %v1, <4 x i64> %v2) vscale_range(2,2) { +; CHECK-LABEL: m2_splat_into_slide_two_source_v2_lo: +; CHECK: # %bb.0: +; CHECK-NEXT:vsetivli zero, 2, e64, m1, ta, ma +; CHECK-NEXT:vrgather.vi v10, v8, 0 +; CHECK-NEXT:vmv1r.v v11, v8 +; CHECK-NEXT:vmv2r.v v8, v10 +; CHECK-NEXT:ret + %res = shufflevector <4 x i64> %v1, <4 x i64> %v2, <4 x i32> + ret <4 x i64> %res +} + define <4 x i64> @m2_splat_into_slide_two_source(<4 x i64> %v1, <4 x i64> %v2) vscale_range(2,2) { ; CHECK-LABEL: m2_splat_into_slide_two_source: ; CHECK: # %bb.0: >From 60341586c8bd46b1094663749ac6467058b7efe8 Mon Sep 17 00:00:00 2001 From: Luke Lau Date: Fri, 26 Jan 2024 20:18:08 +0700 Subject: [PATCH 2/2] [RISCV] Fix M1 shuffle on wrong SrcVec in lowerShuffleViaVRegSplitting This fixes a miscompile from #79072 where we were taking the wrong SrcVec to do the M1 shuffle. E.g. if the SrcVecIdx was 2 and we had 2 VRegsPerSrc, we ended up taking it from V1 instead of V2. --- llvm/lib/Target/RISCV/RISCVISelLowering.cpp | 2 +- .../CodeGen/RISCV/rvv/fixed-vectors-shuffle-exact-vlen.ll | 8 +++- 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp index 47c6cd6e5487b..7895d74f06d12 100644 --- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp +++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp @@ -4718,7 +4718,7 @@ static SDValue lowerShuffleViaVRegSplitting(ShuffleVectorSDNode *SVN, if (SrcVecIdx == -1) continue; unsigned ExtractIdx = (SrcVecIdx % VRegsPerSrc) * NumOpElts; -SDValue SrcVec = (unsigned)SrcVecIdx > VRegsPerSrc ? V2 : V1; +SDValue SrcVec = (unsigned)SrcVecIdx >= VRegsPerSrc ? V2 : V1; SDValue SubVec = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, M1VT, SrcVec, DAG.getVectorIdxConstant(ExtractIdx, DL)); SubVec = convertFromScalableVector(OneRegVT, SubVec, DAG, Subtarget); diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-exact-vlen.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-exact-vlen.ll index c0b02f62444ef..3f0bdb9d5e316 100644 --- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-exact-vlen.ll +++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-exact-vlen.ll @@ -149,15 +149,13 @@ define <4 x i64> @m2_splat_into_identity_two_source_v2_hi(<4 x i64> %v1, <4 x i6 ret <4 x i64> %res } -; FIXME: This is a miscompile, we're clobbering the lower reg group of %v2 -; (v10), and the vmv1r.v is moving from the wrong reg group (should be v10) define <4 x i64> @m2_splat_into_slide_two_source_v2_lo(<4 x i64> %v1, <4 x i64> %v2) vscale_range(2,2) { ; CHECK-LABEL: m2_splat_into_slide_two_source_v2_lo: ; CHECK: # %bb.0: ; CHECK-NEXT:vsetivli zero, 2, e64, m1, ta, ma -; CHECK-NEXT:vrgather.vi v10, v8, 0 -; CHECK-NEXT:vmv1r.v v11, v8 -; CHECK-NEXT:vmv2r.v v8, v10 +; CHECK-NEXT:vrgather.vi v12, v8, 0 +; CHECK-NEXT:vm
[llvm-branch-commits] [llvm] [RISCV] Add initial support of memcmp expansion (PR #107548)
lukel97 wrote: I collected the stats on the number of memcmps that were inlined, it looks like we're able to expand a good chunk of them: ``` Program expand-memcmp.NumMemCmpCalls expand-memcmp.NumMemCmpInlined lhs rhs diff lhsrhsdiff FP2017rate/510.parest_r/510.parest_r 410.00 468.00 14.1%104.00 inf% INT2017speed/602.gcc_s/602.gcc_s 83.00 92.00 10.8% 36.00 inf% INT2017rate/502.gcc_r/502.gcc_r83.00 92.00 10.8% 36.00 inf% INT2017spe...00.perlbench_s/600.perlbench_s 207.00 220.00 6.3%120.00 inf% INT2017rat...00.perlbench_r/500.perlbench_r 207.00 220.00 6.3%120.00 inf% INT2017spe...ed/620.omnetpp_s/620.omnetpp_s 304.00 306.00 0.7% 13.00 inf% INT2017rate/520.omnetpp_r/520.omnetpp_r 304.00 306.00 0.7% 13.00 inf% FP2017rate/508.namd_r/508.namd_r 13.00 13.00 0.0% 13.00 inf% INT2017rate/541.leela_r/541.leela_r40.00 40.00 0.0% 3.00 inf% INT2017speed/641.leela_s/641.leela_s 40.00 40.00 0.0% 3.00 inf% INT2017speed/625.x264_s/625.x264_s 8.00 8.00 0.0% 6.00 inf% INT2017spe...23.xalancbmk_s/623.xalancbmk_s 8.00 8.00 0.0% 6.00 inf% INT2017rate/557.xz_r/557.xz_r 6.00 6.00 0.0% 4.00 inf% INT2017rat...23.xalancbmk_r/523.xalancbmk_r 8.00 8.00 0.0% 6.00 inf% INT2017rate/525.x264_r/525.x264_r 8.00 8.00 0.0% 6.00 inf% FP2017speed/644.nab_s/644.nab_s77.00 77.00 0.0% 71.00 inf% FP2017speed/638.imagick_s/638.imagick_s 3.00 3.00 0.0% FP2017rate/544.nab_r/544.nab_r 77.00 77.00 0.0% 71.00 inf% FP2017rate/538.imagick_r/538.imagick_r 3.00 3.00 0.0% FP2017rate/526.blender_r/526.blender_r 41.00 41.00 0.0% 27.00 inf% FP2017rate/511.povray_r/511.povray_r5.00 5.00 0.0% 5.00 inf% INT2017speed/657.xz_s/657.xz_s 6.00 6.00 0.0% 4.00 inf% ``` There's a small difference in the number of original memcmp calls, there's some merge commits in this branch which might have changed the codegen slightly in the meantime. I'm working on getting some runtime numbers now, sorry for the delay https://github.com/llvm/llvm-project/pull/107548 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Add initial support of memcmp expansion (PR #107548)
lukel97 wrote: The run just finished, I'm seeing a 0.75% improvement on 500.perlbench_r, no regressions or improvements on the other benchmarks as far as I can see. Seems to check out with the number of memcmps inlined reported for perlbench! https://github.com/llvm/llvm-project/pull/107548 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Add initial support of memcmp expansion (PR #107548)
lukel97 wrote: > > > The run just finished, I'm seeing a 0.75% improvement on 500.perlbench_r > > > on the BPI F3 (-O3 -mcpu=spacemit-x60), no regressions or improvements on > > > the other benchmarks as far as I can see. Seems to check out with the > > > number of memcmps inlined reported for perlbench! > > > > > > Does spacemit-x60 support unaligned scalar memory and was your test with or > > without that enabled? > > > > It supports unaligned scalar but not unaligned vector. And it seems we don't > add these features to `-mcpu=spacemit-x60`. So I think @lukel97 ran the SPEC > without unaligned scalar. Yeah, -mno-strict-align gave a bus error. I ultimately built it without unaligned scalar since I wasn't sure if unaligned scalar was performant or not. https://github.com/llvm/llvm-project/pull/107548 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Support memcmp expansion for vectors (PR #114517)
@@ -14520,17 +14520,78 @@ static bool narrowIndex(SDValue &N, ISD::MemIndexType IndexType, SelectionDAG &D return true; } +/// Try to map an integer comparison with size > XLEN to vector instructions +/// before type legalization splits it up into chunks. +static SDValue +combineVectorSizedSetCCEquality(EVT VT, SDValue X, SDValue Y, ISD::CondCode CC, +const SDLoc &DL, SelectionDAG &DAG, +const RISCVSubtarget &Subtarget) { + assert(ISD::isIntEqualitySetCC(CC) && "Bad comparison predicate"); + + if (!Subtarget.hasVInstructions()) +return SDValue(); + + MVT XLenVT = Subtarget.getXLenVT(); + EVT OpVT = X.getValueType(); + // We're looking for an oversized integer equality comparison. + if (OpVT.isScalableVT() || !OpVT.isScalarInteger()) +return SDValue(); + + unsigned OpSize = OpVT.getSizeInBits(); + // TODO: Support non-power-of-2 types. + if (!isPowerOf2_32(OpSize)) +return SDValue(); + + // The size should be larger than XLen and smaller than the maximum vector + // size. + if (OpSize <= Subtarget.getXLen() || + OpSize > Subtarget.getRealMinVLen() * + Subtarget.getMaxLMULForFixedLengthVectors()) +return SDValue(); + + // Don't perform this combine if constructing the vector will be expensive. + auto IsVectorBitCastCheap = [](SDValue X) { +X = peekThroughBitcasts(X); +return isa(X) || X.getValueType().isVector() || + X.getOpcode() == ISD::LOAD; + }; + if (!IsVectorBitCastCheap(X) || !IsVectorBitCastCheap(Y)) +return SDValue(); + + if (DAG.getMachineFunction().getFunction().hasFnAttribute( + Attribute::NoImplicitFloat)) +return SDValue(); lukel97 wrote: Do we need to check for this on RISC-V? We're not introducing any FP code here https://github.com/llvm/llvm-project/pull/114517 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Support memcmp expansion for vectors (PR #114517)
@@ -2525,5 +2527,21 @@ RISCVTTIImpl::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const { Options.LoadSizes = {8, 4, 2, 1}; else Options.LoadSizes = {4, 2, 1}; + if (IsZeroCmp && ST->hasVInstructions()) { lukel97 wrote: Doesn't this mean that processors with only +unaligned-scalar-mem will now expand vector-sized compares? https://github.com/llvm/llvm-project/pull/114517 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Support memcmp expansion for vectors (PR #114517)
@@ -14520,17 +14520,78 @@ static bool narrowIndex(SDValue &N, ISD::MemIndexType IndexType, SelectionDAG &D return true; } +/// Try to map an integer comparison with size > XLEN to vector instructions +/// before type legalization splits it up into chunks. +static SDValue +combineVectorSizedSetCCEquality(EVT VT, SDValue X, SDValue Y, ISD::CondCode CC, +const SDLoc &DL, SelectionDAG &DAG, +const RISCVSubtarget &Subtarget) { + assert(ISD::isIntEqualitySetCC(CC) && "Bad comparison predicate"); + + if (!Subtarget.hasVInstructions()) +return SDValue(); + + MVT XLenVT = Subtarget.getXLenVT(); + EVT OpVT = X.getValueType(); + // We're looking for an oversized integer equality comparison. + if (OpVT.isScalableVT() || !OpVT.isScalarInteger()) +return SDValue(); + + unsigned OpSize = OpVT.getSizeInBits(); + // TODO: Support non-power-of-2 types. + if (!isPowerOf2_32(OpSize)) +return SDValue(); + + // The size should be larger than XLen and smaller than the maximum vector + // size. + if (OpSize <= Subtarget.getXLen() || + OpSize > Subtarget.getRealMinVLen() * + Subtarget.getMaxLMULForFixedLengthVectors()) +return SDValue(); + + // Don't perform this combine if constructing the vector will be expensive. + auto IsVectorBitCastCheap = [](SDValue X) { +X = peekThroughBitcasts(X); +return isa(X) || X.getValueType().isVector() || + X.getOpcode() == ISD::LOAD; + }; + if (!IsVectorBitCastCheap(X) || !IsVectorBitCastCheap(Y)) +return SDValue(); + + if (DAG.getMachineFunction().getFunction().hasFnAttribute( + Attribute::NoImplicitFloat)) +return SDValue(); lukel97 wrote: Oh that's right noimplicitfloat also disables SIMD, I forgot about that. https://github.com/llvm/llvm-project/pull/114517 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Support memcmp expansion for vectors (PR #114517)
https://github.com/lukel97 edited https://github.com/llvm/llvm-project/pull/114517 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [RISCV] Add vcpop.m/vfirst.m to RISCVMaskedPseudosTable (PR #115162)
https://github.com/lukel97 edited https://github.com/llvm/llvm-project/pull/115162 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [RISCV] Add vcpop.m/vfirst.m to RISCVMaskedPseudosTable (PR #115162)
@@ -1150,6 +1150,7 @@ class VPseudoUnaryNoMaskGPROut : class VPseudoUnaryMaskGPROut : Pseudo<(outs GPR:$rd), (ins VR:$rs1, VMaskOp:$vm, AVL:$vl, sew:$sew), []>, + RISCVMaskedPseudo, lukel97 wrote: Nit, add instead of adding it in the class move it to the two `def`s so it's consistent with other uses of RISCVMaskedPseudo? https://github.com/llvm/llvm-project/pull/115162 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [RISCV] Add vcpop.m/vfirst.m to RISCVMaskedPseudosTable (PR #115162)
https://github.com/lukel97 approved this pull request. Good catch. I double checked and we're setting ElementsDependOnVL and ElementsDependOnMask for VCPOP_M and VFIRST_M so adding RISCVMaskedPseudo should be safe. https://github.com/llvm/llvm-project/pull/115162 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: [VPlan] Only use SCEV for live-ins in tryToWiden. (#125436) (PR #125659)
https://github.com/lukel97 approved this pull request. https://github.com/llvm/llvm-project/pull/125659 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/20.x: Revert "[SLP] getSpillCost - fully populate IntrinsicCostAttributes to improve cost analysis." (#124962) (PR #126487)
https://github.com/lukel97 approved this pull request. Thanks for fixing the cherry-pick. Re: #124499, I couldn't think of a simple fix we could apply on top of e3fbf19eb4428cac03c0e7301512f11f8947d743 for the 20.x release branch. I think it's best if we cherry-pick the revert so that performance isn't impacted on 20.x, and just continue to fix the cost model stuff in-tree for 21.x. https://github.com/llvm/llvm-project/pull/126487 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [RISCV] Add hasPostISelHook to sf.vfnrclip pseudo instructions. (#114274) (PR #117948)
https://github.com/lukel97 approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/117948 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoopVectorizer] Prune VFs based on plan register pressure (PR #132190)
lukel97 wrote: I collected some more data on RISC-V on SPEC CPU 2017, this improves code size by up to 7% on some benchmarks, and no regressions were found: https://lnt.lukelau.me/db_default/v4/nts/399?show_delta=yes&show_previous=yes&show_stddev=yes&show_mad=yes&show_all=yes&show_all_samples=yes&show_sample_counts=yes&show_small_diff=yes&num_comparison_runs=0&test_filter=&test_min_value_filter=&aggregation_fn=min&MW_confidence_lv=0.05&compare_to=401&submit=Update There's also a significant decrease in vector spilling and reloading. It removes all the spilling entirely on one benchmark so the geomean result is stuck at 100%: ``` Program riscv-instr-info.NumVRegReloaded riscv-instr-info.NumVRegSpilled lhs rhs difflhs rhs diff FP2017rate/508.namd_r/508.namd_r 6.00 6.000.0%1.001.000.0% INT2017rat...00.perlbench_r/500.perlbench_r 8.00 8.000.0%4.004.000.0% INT2017speed/625.x264_s/625.x264_s 35.00 35.000.0% 39.00 39.000.0% INT2017spe...23.xalancbmk_s/623.xalancbmk_s 6.00 6.000.0%6.006.000.0% INT2017spe...ed/620.omnetpp_s/620.omnetpp_s 5.00 5.000.0%4.004.000.0% INT2017speed/602.gcc_s/602.gcc_s70.00 70.000.0% 64.00 64.000.0% INT2017spe...00.perlbench_s/600.perlbench_s 8.00 8.000.0%4.004.000.0% INT2017rate/525.x264_r/525.x264_r 35.00 35.000.0% 39.00 39.000.0% INT2017rat...23.xalancbmk_r/523.xalancbmk_r 6.00 6.000.0%6.006.000.0% INT2017rate/520.omnetpp_r/520.omnetpp_r 5.00 5.000.0%4.004.000.0% INT2017rate/502.gcc_r/502.gcc_r 70.00 70.000.0% 64.00 64.000.0% FP2017speed/644.nab_s/644.nab_s 24.00 24.000.0% 24.00 24.000.0% FP2017rate/544.nab_r/544.nab_r 24.00 24.000.0% 24.00 24.000.0% FP2017rate/511.povray_r/511.povray_r 131.00 131.000.0% 74.00 74.000.0% FP2017rate/510.parest_r/510.parest_r 1490.00 1484.00 -0.4% 1231.00 1225.00 -0.5% INT2017rat...31.deepsjeng_r/531.deepsjeng_r248.00 218.00 -12.1% 134.00 102.00 -23.9% INT2017spe...31.deepsjeng_s/631.deepsjeng_s248.00 218.00 -12.1% 134.00 102.00 -23.9% FP2017rate/526.blender_r/526.blender_r1210.00 703.00 -41.9% 1033.00 654.00 -36.7% FP2017speed/638.imagick_s/638.imagick_s 7524.00 1486.00 -80.2% 4813.00 925.00 -80.8% FP2017rate/538.imagick_r/538.imagick_r7524.00 1486.00 -80.2% 4813.00 925.00 -80.8% FP2017speed/619.lbm_s/619.lbm_s 42.00 0.00 -100.0% 42.00 -100.0% FP2017rate/519.lbm_r/519.lbm_r 42.00 0.00 -100.0% 42.00 -100.0% FP2017rate...97.specrand_fr/997.specrand_fr 0.00 0.00 FP2017spee...96.specrand_fs/996.specrand_fs 0.00 0.00 INT2017rate/505.mcf_r/505.mcf_r 0.00 0.00 INT2017rate/541.leela_r/541.leela_r 0.00 0.00 INT2017rate/557.xz_r/557.xz_r0.00 0.00 INT2017rat...99.specrand_ir/999.specrand_ir 0.00 0.00 INT2017speed/605.mcf_s/605.mcf_s
[llvm-branch-commits] [llvm] release/20.x: [RISCV] Handle scalarized reductions in getArithmeticReductionCost (PR #136688)
https://github.com/lukel97 approved this pull request. https://github.com/llvm/llvm-project/pull/136688 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Support non-power-of-2 types when expanding memcmp (PR #114971)
https://github.com/lukel97 approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/114971 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Support non-power-of-2 types when expanding memcmp (PR #114971)
@@ -2954,20 +2954,13 @@ RISCVTTIImpl::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const { } if (IsZeroCmp && ST->hasVInstructions()) { -unsigned RealMinVLen = ST->getRealMinVLen(); -// Support Fractional LMULs if the lengths are larger than XLen. -// TODO: Support non-power-of-2 types. -for (unsigned FLMUL = 8; FLMUL >= 2; FLMUL /= 2) { - unsigned Len = RealMinVLen / FLMUL; - if (Len > ST->getXLen()) -Options.LoadSizes.insert(Options.LoadSizes.begin(), Len / 8); -} -for (unsigned LMUL = 1; LMUL <= ST->getMaxLMULForFixedLengthVectors(); - LMUL *= 2) { - unsigned Len = RealMinVLen * LMUL; - if (Len > ST->getXLen()) -Options.LoadSizes.insert(Options.LoadSizes.begin(), Len / 8); -} +unsigned VLenB = ST->getRealMinVLen() / 8; +// The minimum size should be the maximum bytes between `VLen * LMUL_MF8` +// and `XLen * 2`. +unsigned MinSize = std::max(VLenB / 8, ST->getXLen() * 2 / 8); lukel97 wrote: If that's the case, do we even need the LMUL check? I.e. can we just do ``` unsigned MinSize = ST->getXLen() + 1; ``` And presumably for sizes < MF8, lowering will use the correct container anyway? https://github.com/llvm/llvm-project/pull/114971 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Support non-power-of-2 types when expanding memcmp (PR #114971)
@@ -2954,20 +2954,13 @@ RISCVTTIImpl::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const { } if (IsZeroCmp && ST->hasVInstructions()) { -unsigned RealMinVLen = ST->getRealMinVLen(); -// Support Fractional LMULs if the lengths are larger than XLen. -// TODO: Support non-power-of-2 types. -for (unsigned FLMUL = 8; FLMUL >= 2; FLMUL /= 2) { - unsigned Len = RealMinVLen / FLMUL; - if (Len > ST->getXLen()) -Options.LoadSizes.insert(Options.LoadSizes.begin(), Len / 8); -} -for (unsigned LMUL = 1; LMUL <= ST->getMaxLMULForFixedLengthVectors(); - LMUL *= 2) { - unsigned Len = RealMinVLen * LMUL; - if (Len > ST->getXLen()) -Options.LoadSizes.insert(Options.LoadSizes.begin(), Len / 8); -} +unsigned VLenB = ST->getRealMinVLen() / 8; +// The minimum size should be the maximum bytes between `VLen * LMUL_MF8` +// and `XLen * 2`. +unsigned MinSize = std::max(VLenB / 8, ST->getXLen() * 2 / 8); lukel97 wrote: Just checking, if MF8 isn't supported for the ELEN, e.g. MF8 on zve32x, `getContainerForFixedLengthVector` in RISCVISelLowering will still lower it into the next largest LMUL so this should be fine right? https://github.com/llvm/llvm-project/pull/114971 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Support non-power-of-2 types when expanding memcmp (PR #114971)
@@ -2954,20 +2954,13 @@ RISCVTTIImpl::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const { } if (IsZeroCmp && ST->hasVInstructions()) { -unsigned RealMinVLen = ST->getRealMinVLen(); -// Support Fractional LMULs if the lengths are larger than XLen. -// TODO: Support non-power-of-2 types. -for (unsigned FLMUL = 8; FLMUL >= 2; FLMUL /= 2) { - unsigned Len = RealMinVLen / FLMUL; - if (Len > ST->getXLen()) -Options.LoadSizes.insert(Options.LoadSizes.begin(), Len / 8); -} -for (unsigned LMUL = 1; LMUL <= ST->getMaxLMULForFixedLengthVectors(); - LMUL *= 2) { - unsigned Len = RealMinVLen * LMUL; - if (Len > ST->getXLen()) -Options.LoadSizes.insert(Options.LoadSizes.begin(), Len / 8); -} +unsigned VLenB = ST->getRealMinVLen() / 8; +// The minimum size should be the maximum bytes between `VLen * LMUL_MF8` +// and `XLen * 2`. +unsigned MinSize = std::max(VLenB / 8, ST->getXLen() * 2 / 8); lukel97 wrote: How come we need to limit the minimum size to XLen * 2? Can we not use vectors for the `bcmp_size_15` test case on RV64 too? https://github.com/llvm/llvm-project/pull/114971 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Support memcmp expansion for vectors (PR #114517)
@@ -14520,17 +14520,78 @@ static bool narrowIndex(SDValue &N, ISD::MemIndexType IndexType, SelectionDAG &D return true; } +/// Try to map an integer comparison with size > XLEN to vector instructions +/// before type legalization splits it up into chunks. +static SDValue +combineVectorSizedSetCCEquality(EVT VT, SDValue X, SDValue Y, ISD::CondCode CC, +const SDLoc &DL, SelectionDAG &DAG, +const RISCVSubtarget &Subtarget) { + assert(ISD::isIntEqualitySetCC(CC) && "Bad comparison predicate"); + + if (!Subtarget.hasVInstructions()) +return SDValue(); + + MVT XLenVT = Subtarget.getXLenVT(); + EVT OpVT = X.getValueType(); + // We're looking for an oversized integer equality comparison. + if (OpVT.isScalableVT() || !OpVT.isScalarInteger()) lukel97 wrote: I believe OpVT.isScalableVT() implies !OpVT.isScalarInteger()? Can this be simplified to ```suggestion if (!OpVT.isScalarInteger()) ``` https://github.com/llvm/llvm-project/pull/114517 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Support memcmp expansion for vectors (PR #114517)
@@ -14520,17 +14520,78 @@ static bool narrowIndex(SDValue &N, ISD::MemIndexType IndexType, SelectionDAG &D return true; } +/// Try to map an integer comparison with size > XLEN to vector instructions +/// before type legalization splits it up into chunks. +static SDValue +combineVectorSizedSetCCEquality(EVT VT, SDValue X, SDValue Y, ISD::CondCode CC, +const SDLoc &DL, SelectionDAG &DAG, +const RISCVSubtarget &Subtarget) { + assert(ISD::isIntEqualitySetCC(CC) && "Bad comparison predicate"); + + if (!Subtarget.hasVInstructions()) +return SDValue(); + + MVT XLenVT = Subtarget.getXLenVT(); + EVT OpVT = X.getValueType(); + // We're looking for an oversized integer equality comparison. + if (OpVT.isScalableVT() || !OpVT.isScalarInteger()) +return SDValue(); + + unsigned OpSize = OpVT.getSizeInBits(); + // TODO: Support non-power-of-2 types. + if (!isPowerOf2_32(OpSize)) +return SDValue(); lukel97 wrote: I think as long as it's byte sized it should be ok right? E.g. ```suggestion if (OpSize % 8) return SDValue(); ``` But happy if you want to leave this as is and do it as a follow on with the TODO in RISCVTargetTransformInfo.cpp https://github.com/llvm/llvm-project/pull/114517 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Support memcmp expansion for vectors (PR #114517)
https://github.com/lukel97 edited https://github.com/llvm/llvm-project/pull/114517 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Support memcmp expansion for vectors (PR #114517)
@@ -2952,5 +2952,22 @@ RISCVTTIImpl::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const { Options.LoadSizes = {4, 2, 1}; Options.AllowedTailExpansions = {3}; } + + if (IsZeroCmp && ST->hasVInstructions() && ST->enableUnalignedVectorMem()) { lukel97 wrote: Do we still need the enableUnalignedVectorMem check? If I'm understanding this right MemcmpExpand will generate a scalar which should be ok because we check for enableUnalignedScalarMem. And then in the new combine we're not actually changing the load at all. There must be some other existing combine which is converting the scalar load to a vector load, which should be respecting alignment? https://github.com/llvm/llvm-project/pull/114517 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [RISCV] Support memcmp expansion for vectors (PR #114517)
@@ -16172,8 +16233,6 @@ static SDValue performSETCCCombine(SDNode *N, SelectionDAG &DAG, N0.getConstantOperandVal(1) != UINT64_C(0x)) return SDValue(); - // Looking for an equality compare. - ISD::CondCode Cond = cast(N->getOperand(2))->get(); if (!isIntEqualitySetCC(Cond)) return SDValue(); lukel97 wrote: Nit, you could pull up this early exit to line 16217 since both combineVectorSizedSetCCEquality and the existing combine need it https://github.com/llvm/llvm-project/pull/114517 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits