[llvm-branch-commits] [RISCV][MC] Warn if SEW/LMUL may not be compatible (PR #94313)

2024-06-04 Thread Luke Lau via llvm-branch-commits


@@ -71,18 +73,21 @@ vsetvli a2, a0, e32, m8, ta, ma
 
 vsetvli a2, a0, e32, mf2, ta, ma
 # CHECK-INST: vsetvli a2, a0, e32, mf2, ta, ma
+# CHECK-WARNING: :[[#@LINE-2]]:17: warning: SEW > 16 may not be compatible 
with all RVV implementations{{$}}
 # CHECK-ENCODING: [0x57,0x76,0x75,0x0d]
 # CHECK-ERROR: instruction requires the following: 'V' (Vector Extension for 
Application Processors), 'Zve32x' (Vector Extensions for Embedded 
Processors){{$}}
 # CHECK-UNKNOWN: 0d757657 
 
 vsetvli a2, a0, e32, mf4, ta, ma
 # CHECK-INST: vsetvli a2, a0, e32, mf4, ta, ma
+# CHECK-WARNING: :[[#@LINE-2]]:17: warning: SEW > 8 may not be compatible with 
all RVV implementations{{$}}

lukel97 wrote:

I see that the spec recommends that we warn when LMUL < SEWMIN/ELEN, but do we 
need to warn for SEW > LMUL * ELEN? IIUC this cause a warning on zve64x too 
since 32 > 1/4 * 64

https://github.com/llvm/llvm-project/pull/94313
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [RISCV][MC] Warn if SEW/LMUL may not be compatible (PR #94313)

2024-06-04 Thread Luke Lau via llvm-branch-commits


@@ -1,5 +1,7 @@
 # RUN: llvm-mc -triple=riscv64 -show-encoding --mattr=+v %s \
 # RUN:| FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+# RUN: llvm-mc -triple=riscv64 -show-encoding --mattr=+zve32x %s 2>&1 \
+# RUN:| FileCheck %s --check-prefix=CHECK-WARNING

lukel97 wrote:

Nit, can we name the prefix something like CHECK-ZVE32X

https://github.com/llvm/llvm-project/pull/94313
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [RISCV][MC] Warn if SEW/LMUL may not be compatible (PR #94313)

2024-06-04 Thread Luke Lau via llvm-branch-commits


@@ -71,18 +73,21 @@ vsetvli a2, a0, e32, m8, ta, ma
 
 vsetvli a2, a0, e32, mf2, ta, ma
 # CHECK-INST: vsetvli a2, a0, e32, mf2, ta, ma
+# CHECK-WARNING: :[[#@LINE-2]]:17: warning: SEW > 16 may not be compatible 
with all RVV implementations{{$}}
 # CHECK-ENCODING: [0x57,0x76,0x75,0x0d]
 # CHECK-ERROR: instruction requires the following: 'V' (Vector Extension for 
Application Processors), 'Zve32x' (Vector Extensions for Embedded 
Processors){{$}}
 # CHECK-UNKNOWN: 0d757657 
 
 vsetvli a2, a0, e32, mf4, ta, ma
 # CHECK-INST: vsetvli a2, a0, e32, mf4, ta, ma
+# CHECK-WARNING: :[[#@LINE-2]]:17: warning: SEW > 8 may not be compatible with 
all RVV implementations{{$}}

lukel97 wrote:

Ok, that seems reasonable. Should we maybe then reword the LMUL < SEWMIN/ELEN 
case to mention that the encoding is actually reserved, whereas for SEW > LMUL 
* ELEN it may just not be compatible

https://github.com/llvm/llvm-project/pull/94313
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV][MC] Warn if SEW/LMUL may not be compatible (PR #94313)

2024-06-06 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 edited 
https://github.com/llvm/llvm-project/pull/94313
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV][MC] Warn if SEW/LMUL may not be compatible (PR #94313)

2024-06-06 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 approved this pull request.


https://github.com/llvm/llvm-project/pull/94313
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV][MC] Warn if SEW/LMUL may not be compatible (PR #94313)

2024-06-06 Thread Luke Lau via llvm-branch-commits


@@ -2211,6 +,16 @@ ParseStatus RISCVAsmParser::parseVTypeI(OperandVector 
&Operands) {
 
   if (getLexer().is(AsmToken::EndOfStatement) && State == VTypeState_Done) {
 RISCVII::VLMUL VLMUL = RISCVVType::encodeLMUL(Lmul, Fractional);
+if (Fractional) {
+  unsigned ELEN = STI->hasFeature(RISCV::FeatureStdExtZve64x) ? 64 : 32;
+  unsigned MaxSEW = ELEN / Lmul;
+  // If MaxSEW < 8, we should have printed warning about reserved LMUL.
+  if (MaxSEW >= 8 && Sew > MaxSEW)
+Warning(SEWLoc,
+"use of vtype encodings with SEW > " + Twine(MaxSEW) +
+" and LMUL == " + (Fractional ? "mf" : "m") + Twine(Lmul) +

lukel97 wrote:

Fractional is always true here right?

https://github.com/llvm/llvm-project/pull/94313
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [RISCV] Use APInt in isSimpleVIDSequence to account for index overflow (#100072) (PR #101124)

2024-08-01 Thread Luke Lau via llvm-branch-commits

lukel97 wrote:

It's a miscompile, but it wasn't a regression since it looks like we've had it 
since LLVM 16

https://github.com/llvm/llvm-project/pull/101124
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Fix vmerge.vvm/vmv.v.v getting folded into ops with mismatching EEW (PR #101464)

2024-08-01 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 milestoned 
https://github.com/llvm/llvm-project/pull/101464
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Fix vmerge.vvm/vmv.v.v getting folded into ops with mismatching EEW (PR #101464)

2024-08-01 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 created 
https://github.com/llvm/llvm-project/pull/101464

This is a backport of #101152 which fixes a miscompile on RISC-V, albeit not a 
regression.

>From 6b7c614ad8a69dfb610ed02da541fb8d3bf009e3 Mon Sep 17 00:00:00 2001
From: Luke Lau 
Date: Wed, 31 Jul 2024 00:28:52 +0800
Subject: [PATCH] [RISCV] Fix vmerge.vvm/vmv.v.v getting folded into ops with
 mismatching EEW (#101152)

As noted in
https://github.com/llvm/llvm-project/pull/100367/files#r1695448771, we
currently fold in vmerge.vvms and vmv.v.vs into their ops even if the
EEW is different which leads to an incorrect transform.

This checks the op's EEW via its simple value type for now since there
doesn't seem to be any existing information about the EEW size of
instructions. We'll probably need to encode this at some point if we
want to be able to access it at the MachineInstr level in #100367
---
 llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp   |  4 
 llvm/test/CodeGen/RISCV/rvv/combine-vmv.ll| 14 +
 .../RISCV/rvv/rvv-peephole-vmerge-vops.ll | 21 +++
 3 files changed, 39 insertions(+)

diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp 
b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
index eef6ae677ac85..db949f3476e2b 100644
--- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
@@ -3721,6 +3721,10 @@ bool 
RISCVDAGToDAGISel::performCombineVMergeAndVOps(SDNode *N) {
   assert(!Mask || cast(Mask)->getReg() == RISCV::V0);
   assert(!Glue || Glue.getValueType() == MVT::Glue);
 
+  // If the EEW of True is different from vmerge's SEW, then we can't fold.
+  if (True.getSimpleValueType() != N->getSimpleValueType(0))
+return false;
+
   // We require that either merge and false are the same, or that merge
   // is undefined.
   if (Merge != False && !isImplicitDef(Merge))
diff --git a/llvm/test/CodeGen/RISCV/rvv/combine-vmv.ll 
b/llvm/test/CodeGen/RISCV/rvv/combine-vmv.ll
index ec03f773c7108..dfc2b2bdda026 100644
--- a/llvm/test/CodeGen/RISCV/rvv/combine-vmv.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/combine-vmv.ll
@@ -168,3 +168,17 @@ define  @unfoldable_vredsum( %passthru,  @llvm.riscv.vmv.v.v.nxv2i32( 
%passthru,  %a, iXLen 1)
   ret  %b
 }
+
+define  @unfoldable_mismatched_sew( 
%passthru,  %x,  %y, iXLen %avl) {
+; CHECK-LABEL: unfoldable_mismatched_sew:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:vsetvli zero, a0, e64, m1, ta, ma
+; CHECK-NEXT:vadd.vv v9, v9, v10
+; CHECK-NEXT:vsetvli zero, a0, e32, m1, tu, ma
+; CHECK-NEXT:vmv.v.v v8, v9
+; CHECK-NEXT:ret
+  %a = call  @llvm.riscv.vadd.nxv1i64.nxv1i64( poison,  %x,  %y, iXLen %avl)
+  %a.bitcast = bitcast  %a to 
+  %b = call  @llvm.riscv.vmv.v.v.nxv2i32( 
%passthru,  %a.bitcast, iXLen %avl)
+  ret  %b
+}
diff --git a/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll 
b/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll
index a08bcae074b9b..259515f160048 100644
--- a/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll
@@ -1196,3 +1196,24 @@ define  
@true_mask_vmerge_implicit_passthru(
   )
   ret  %b
 }
+
+
+define  @unfoldable_mismatched_sew( 
%passthru,  %x,  %y,  
%mask, i64 %avl) {
+; CHECK-LABEL: unfoldable_mismatched_sew:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:vsetvli zero, a0, e64, m1, ta, ma
+; CHECK-NEXT:vadd.vv v9, v9, v10
+; CHECK-NEXT:vsetvli zero, a0, e32, m1, tu, ma
+; CHECK-NEXT:vmv.v.v v8, v9
+; CHECK-NEXT:ret
+  %a = call  @llvm.riscv.vadd.nxv1i64.nxv1i64( poison,  %x,  %y, i64 %avl)
+  %a.bitcast = bitcast  %a to 
+  %b = call  @llvm.riscv.vmerge.nxv2i32.nxv2i32(
+ %passthru,
+ %passthru,
+ %a.bitcast,
+ splat (i1 true),
+i64 %avl
+  )
+  ret  %b
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Fix vmerge.vvm/vmv.v.v getting folded into ops with mismatching EEW (PR #101464)

2024-08-01 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 edited 
https://github.com/llvm/llvm-project/pull/101464
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [RISCV] Use APInt in isSimpleVIDSequence to account for index overflow (#100072) (PR #101124)

2024-08-01 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 closed 
https://github.com/llvm/llvm-project/pull/101124
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Revert "[RISCV] Recurse on first operand of two operand shuffles (#79180)" (PR #80238)

2024-02-13 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 milestoned 
https://github.com/llvm/llvm-project/pull/80238
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: [SelectionDAG] Change computeAliasing signature from optional to LocationSize. (#83017) (PR #83848)

2024-03-04 Thread Luke Lau via llvm-branch-commits

lukel97 wrote:

> I think the "Requested by" comes from the git committer.

There's a PR open to fix this: #82680 

> @lukel97 i'm not sure if you have already or not, but it might be good to 
> include the recent test you added too.

Sure thing, I can't see a way of editing/pushing more commits to this PR's 
branch though. I'll close this and create another PR.

https://github.com/llvm/llvm-project/pull/83848
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: [SelectionDAG] Change computeAliasing signature from optional to LocationSize. (#83017) (PR #83848)

2024-03-04 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 closed 
https://github.com/llvm/llvm-project/pull/83848
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: [SelectionDAG] Change computeAliasing signature from optional to LocationSize. (#83017) (PR #83848)

2024-03-04 Thread Luke Lau via llvm-branch-commits

lukel97 wrote:

Superseded by #83856 

https://github.com/llvm/llvm-project/pull/83848
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [RISCV] Pass LMUL to copyPhysRegVector (PR #84448)

2024-03-08 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 commented:

Is this NFC?

https://github.com/llvm/llvm-project/pull/84448
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Use larger copies when register tuples are aligned (PR #84455)

2024-03-24 Thread Luke Lau via llvm-branch-commits


@@ -302,102 +302,81 @@ void RISCVInstrInfo::copyPhysRegVector(MachineBasicBlock 
&MBB,
RISCVII::VLMUL LMul, unsigned NF) const 
{
   const TargetRegisterInfo *TRI = STI.getRegisterInfo();
 
-  unsigned Opc;
-  unsigned SubRegIdx;
-  unsigned VVOpc, VIOpc;
-  switch (LMul) {
-  default:
-llvm_unreachable("Impossible LMUL for vector register copy.");
-  case RISCVII::LMUL_1:
-Opc = RISCV::VMV1R_V;
-SubRegIdx = RISCV::sub_vrm1_0;
-VVOpc = RISCV::PseudoVMV_V_V_M1;
-VIOpc = RISCV::PseudoVMV_V_I_M1;
-break;
-  case RISCVII::LMUL_2:
-Opc = RISCV::VMV2R_V;
-SubRegIdx = RISCV::sub_vrm2_0;
-VVOpc = RISCV::PseudoVMV_V_V_M2;
-VIOpc = RISCV::PseudoVMV_V_I_M2;
-break;
-  case RISCVII::LMUL_4:
-Opc = RISCV::VMV4R_V;
-SubRegIdx = RISCV::sub_vrm4_0;
-VVOpc = RISCV::PseudoVMV_V_V_M4;
-VIOpc = RISCV::PseudoVMV_V_I_M4;
-break;
-  case RISCVII::LMUL_8:
-assert(NF == 1);
-Opc = RISCV::VMV8R_V;
-SubRegIdx = RISCV::sub_vrm1_0; // There is no sub_vrm8_0.
-VVOpc = RISCV::PseudoVMV_V_V_M8;
-VIOpc = RISCV::PseudoVMV_V_I_M8;
-break;
-  }
-
-  bool UseVMV_V_V = false;
-  bool UseVMV_V_I = false;
-  MachineBasicBlock::const_iterator DefMBBI;
-  if (isConvertibleToVMV_V_V(STI, MBB, MBBI, DefMBBI, LMul)) {
-UseVMV_V_V = true;
-Opc = VVOpc;
-
-if (DefMBBI->getOpcode() == VIOpc) {
-  UseVMV_V_I = true;
-  Opc = VIOpc;
-}
-  }
-
-  if (NF == 1) {
-auto MIB = BuildMI(MBB, MBBI, DL, get(Opc), DstReg);
-if (UseVMV_V_V)
-  MIB.addReg(DstReg, RegState::Undef);
-if (UseVMV_V_I)
-  MIB = MIB.add(DefMBBI->getOperand(2));
-else
-  MIB = MIB.addReg(SrcReg, getKillRegState(KillSrc));
-if (UseVMV_V_V) {
-  const MCInstrDesc &Desc = DefMBBI->getDesc();
-  MIB.add(DefMBBI->getOperand(RISCVII::getVLOpNum(Desc)));  // AVL
-  MIB.add(DefMBBI->getOperand(RISCVII::getSEWOpNum(Desc))); // SEW
-  MIB.addImm(0);// tu, mu
-  MIB.addReg(RISCV::VL, RegState::Implicit);
-  MIB.addReg(RISCV::VTYPE, RegState::Implicit);
-}
-return;
-  }
-
-  int I = 0, End = NF, Incr = 1;
   unsigned SrcEncoding = TRI->getEncodingValue(SrcReg);
   unsigned DstEncoding = TRI->getEncodingValue(DstReg);
   unsigned LMulVal;
   bool Fractional;
   std::tie(LMulVal, Fractional) = RISCVVType::decodeVLMUL(LMul);
   assert(!Fractional && "It is impossible be fractional lmul here.");
-  if (forwardCopyWillClobberTuple(DstEncoding, SrcEncoding, NF * LMulVal)) {
-I = NF - 1;
-End = -1;
-Incr = -1;
-  }
+  unsigned NumRegs = NF * LMulVal;
+  bool ReversedCopy =
+  forwardCopyWillClobberTuple(DstEncoding, SrcEncoding, NumRegs);
+
+  unsigned I = 0;
+  auto GetCopyInfo = [&](MCRegister SrcReg, MCRegister DstReg)
+  -> std::tuple {
+unsigned SrcEncoding = TRI->getEncodingValue(SrcReg);
+unsigned DstEncoding = TRI->getEncodingValue(DstReg);
+if (!(SrcEncoding & 0b111) && !(DstEncoding & 0b111) && I + 8 <= NumRegs)

lukel97 wrote:

Is this the same as `SrcEncoding % 8 == 0 && DstEncoding % 8 == 0`?

https://github.com/llvm/llvm-project/pull/84455
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [RISCV][NFC] Pass LMUL to copyPhysRegVector (PR #84448)

2024-03-24 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/84448
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Store VLMul/NF into RegisterClass's TSFlags (PR #84894)

2024-03-24 Thread Luke Lau via llvm-branch-commits


@@ -127,8 +127,21 @@ def XLenRI : RegInfoByHwMode<
   [RV32,  RV64],
   [RegInfo<32,32,32>, RegInfo<64,64,64>]>;
 
+class RISCVRegisterClass regTypes, int align, dag regList>
+: RegisterClass<"RISCV", regTypes, align, regList> {
+  bit IsVRegClass = 0;
+  int VLMul = 1;
+  int NF = 1;

lukel97 wrote:

Should these default to 0 since 0 is an invalid LMUL/NF?

https://github.com/llvm/llvm-project/pull/84894
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Use larger copies when register tuples are aligned (PR #84455)

2024-03-24 Thread Luke Lau via llvm-branch-commits


@@ -302,102 +302,81 @@ void RISCVInstrInfo::copyPhysRegVector(MachineBasicBlock 
&MBB,
RISCVII::VLMUL LMul, unsigned NF) const 
{
   const TargetRegisterInfo *TRI = STI.getRegisterInfo();
 
-  unsigned Opc;
-  unsigned SubRegIdx;
-  unsigned VVOpc, VIOpc;
-  switch (LMul) {
-  default:
-llvm_unreachable("Impossible LMUL for vector register copy.");
-  case RISCVII::LMUL_1:
-Opc = RISCV::VMV1R_V;
-SubRegIdx = RISCV::sub_vrm1_0;
-VVOpc = RISCV::PseudoVMV_V_V_M1;
-VIOpc = RISCV::PseudoVMV_V_I_M1;
-break;
-  case RISCVII::LMUL_2:
-Opc = RISCV::VMV2R_V;
-SubRegIdx = RISCV::sub_vrm2_0;
-VVOpc = RISCV::PseudoVMV_V_V_M2;
-VIOpc = RISCV::PseudoVMV_V_I_M2;
-break;
-  case RISCVII::LMUL_4:
-Opc = RISCV::VMV4R_V;
-SubRegIdx = RISCV::sub_vrm4_0;
-VVOpc = RISCV::PseudoVMV_V_V_M4;
-VIOpc = RISCV::PseudoVMV_V_I_M4;
-break;
-  case RISCVII::LMUL_8:
-assert(NF == 1);
-Opc = RISCV::VMV8R_V;
-SubRegIdx = RISCV::sub_vrm1_0; // There is no sub_vrm8_0.
-VVOpc = RISCV::PseudoVMV_V_V_M8;
-VIOpc = RISCV::PseudoVMV_V_I_M8;
-break;
-  }
-
-  bool UseVMV_V_V = false;
-  bool UseVMV_V_I = false;
-  MachineBasicBlock::const_iterator DefMBBI;
-  if (isConvertibleToVMV_V_V(STI, MBB, MBBI, DefMBBI, LMul)) {
-UseVMV_V_V = true;
-Opc = VVOpc;
-
-if (DefMBBI->getOpcode() == VIOpc) {
-  UseVMV_V_I = true;
-  Opc = VIOpc;
-}
-  }
-
-  if (NF == 1) {
-auto MIB = BuildMI(MBB, MBBI, DL, get(Opc), DstReg);
-if (UseVMV_V_V)
-  MIB.addReg(DstReg, RegState::Undef);
-if (UseVMV_V_I)
-  MIB = MIB.add(DefMBBI->getOperand(2));
-else
-  MIB = MIB.addReg(SrcReg, getKillRegState(KillSrc));
-if (UseVMV_V_V) {
-  const MCInstrDesc &Desc = DefMBBI->getDesc();
-  MIB.add(DefMBBI->getOperand(RISCVII::getVLOpNum(Desc)));  // AVL
-  MIB.add(DefMBBI->getOperand(RISCVII::getSEWOpNum(Desc))); // SEW
-  MIB.addImm(0);// tu, mu
-  MIB.addReg(RISCV::VL, RegState::Implicit);
-  MIB.addReg(RISCV::VTYPE, RegState::Implicit);
-}
-return;
-  }
-
-  int I = 0, End = NF, Incr = 1;
   unsigned SrcEncoding = TRI->getEncodingValue(SrcReg);
   unsigned DstEncoding = TRI->getEncodingValue(DstReg);
   unsigned LMulVal;
   bool Fractional;
   std::tie(LMulVal, Fractional) = RISCVVType::decodeVLMUL(LMul);
   assert(!Fractional && "It is impossible be fractional lmul here.");
-  if (forwardCopyWillClobberTuple(DstEncoding, SrcEncoding, NF * LMulVal)) {
-I = NF - 1;
-End = -1;
-Incr = -1;
-  }
+  unsigned NumRegs = NF * LMulVal;
+  bool ReversedCopy =
+  forwardCopyWillClobberTuple(DstEncoding, SrcEncoding, NumRegs);
+
+  unsigned I = 0;
+  auto GetCopyInfo = [&](MCRegister SrcReg, MCRegister DstReg)
+  -> std::tuple {
+unsigned SrcEncoding = TRI->getEncodingValue(SrcReg);
+unsigned DstEncoding = TRI->getEncodingValue(DstReg);
+if (!(SrcEncoding & 0b111) && !(DstEncoding & 0b111) && I + 8 <= NumRegs)

lukel97 wrote:

Ah ok, just wanted to check. I found it a bit hard to read but I'm not strongly 
opinionated

https://github.com/llvm/llvm-project/pull/84455
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Use larger copies when register tuples are aligned (PR #84455)

2024-03-24 Thread Luke Lau via llvm-branch-commits


@@ -146,16 +127,12 @@ body: |
 ; CHECK-NEXT: $v7 = VMV1R_V $v12
 ; CHECK-NEXT: $v8 = VMV1R_V $v13
 ; CHECK-NEXT: $v9 = VMV1R_V $v14
-; CHECK-NEXT: $v6 = VMV1R_V $v10
-; CHECK-NEXT: $v7 = VMV1R_V $v11
-; CHECK-NEXT: $v8 = VMV1R_V $v12
-; CHECK-NEXT: $v9 = VMV1R_V $v13
-; CHECK-NEXT: $v10 = VMV1R_V $v14
-; CHECK-NEXT: $v18 = VMV1R_V $v14
-; CHECK-NEXT: $v17 = VMV1R_V $v13
-; CHECK-NEXT: $v16 = VMV1R_V $v12
-; CHECK-NEXT: $v15 = VMV1R_V $v11
-; CHECK-NEXT: $v14 = VMV1R_V $v10
+; CHECK-NEXT: $v6m2 = VMV2R_V $v10m2
+; CHECK-NEXT: $v8m2 = VMV2R_V $v12m2
+; CHECK-NEXT: $v8 = VMV1R_V $v14

lukel97 wrote:

Shouldn't this be `$v10 = VMV1R_V $v14`?

https://github.com/llvm/llvm-project/pull/84455
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Use larger copies when register tuples are aligned (PR #84455)

2024-03-24 Thread Luke Lau via llvm-branch-commits


@@ -146,16 +127,12 @@ body: |
 ; CHECK-NEXT: $v7 = VMV1R_V $v12
 ; CHECK-NEXT: $v8 = VMV1R_V $v13
 ; CHECK-NEXT: $v9 = VMV1R_V $v14
-; CHECK-NEXT: $v6 = VMV1R_V $v10
-; CHECK-NEXT: $v7 = VMV1R_V $v11
-; CHECK-NEXT: $v8 = VMV1R_V $v12
-; CHECK-NEXT: $v9 = VMV1R_V $v13
-; CHECK-NEXT: $v10 = VMV1R_V $v14
-; CHECK-NEXT: $v18 = VMV1R_V $v14
-; CHECK-NEXT: $v17 = VMV1R_V $v13
-; CHECK-NEXT: $v16 = VMV1R_V $v12
-; CHECK-NEXT: $v15 = VMV1R_V $v11
-; CHECK-NEXT: $v14 = VMV1R_V $v10
+; CHECK-NEXT: $v6m2 = VMV2R_V $v10m2
+; CHECK-NEXT: $v8m2 = VMV2R_V $v12m2
+; CHECK-NEXT: $v8 = VMV1R_V $v14
+; CHECK-NEXT: $v14m2 = VMV2R_V $v10m2
+; CHECK-NEXT: $v12m2 = VMV2R_V $v8m2
+; CHECK-NEXT: $v8 = VMV1R_V $v4

lukel97 wrote:

And this should be like?
```
$v18 = VMV1R_V $v14
$v16 = VMV2R_V $v12m2
$v14 = VMV2R_V $v10m2
```

https://github.com/llvm/llvm-project/pull/84455
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Use larger copies when register tuples are aligned (PR #84455)

2024-03-26 Thread Luke Lau via llvm-branch-commits


@@ -302,102 +302,98 @@ void RISCVInstrInfo::copyPhysRegVector(MachineBasicBlock 
&MBB,
RISCVII::VLMUL LMul, unsigned NF) const 
{
   const TargetRegisterInfo *TRI = STI.getRegisterInfo();
 
-  unsigned Opc;
-  unsigned SubRegIdx;
-  unsigned VVOpc, VIOpc;
-  switch (LMul) {
-  default:
-llvm_unreachable("Impossible LMUL for vector register copy.");
-  case RISCVII::LMUL_1:
-Opc = RISCV::VMV1R_V;
-SubRegIdx = RISCV::sub_vrm1_0;
-VVOpc = RISCV::PseudoVMV_V_V_M1;
-VIOpc = RISCV::PseudoVMV_V_I_M1;
-break;
-  case RISCVII::LMUL_2:
-Opc = RISCV::VMV2R_V;
-SubRegIdx = RISCV::sub_vrm2_0;
-VVOpc = RISCV::PseudoVMV_V_V_M2;
-VIOpc = RISCV::PseudoVMV_V_I_M2;
-break;
-  case RISCVII::LMUL_4:
-Opc = RISCV::VMV4R_V;
-SubRegIdx = RISCV::sub_vrm4_0;
-VVOpc = RISCV::PseudoVMV_V_V_M4;
-VIOpc = RISCV::PseudoVMV_V_I_M4;
-break;
-  case RISCVII::LMUL_8:
-assert(NF == 1);
-Opc = RISCV::VMV8R_V;
-SubRegIdx = RISCV::sub_vrm1_0; // There is no sub_vrm8_0.
-VVOpc = RISCV::PseudoVMV_V_V_M8;
-VIOpc = RISCV::PseudoVMV_V_I_M8;
-break;
-  }
-
-  bool UseVMV_V_V = false;
-  bool UseVMV_V_I = false;
-  MachineBasicBlock::const_iterator DefMBBI;
-  if (isConvertibleToVMV_V_V(STI, MBB, MBBI, DefMBBI, LMul)) {
-UseVMV_V_V = true;
-Opc = VVOpc;
-
-if (DefMBBI->getOpcode() == VIOpc) {
-  UseVMV_V_I = true;
-  Opc = VIOpc;
-}
-  }
-
-  if (NF == 1) {
-auto MIB = BuildMI(MBB, MBBI, DL, get(Opc), DstReg);
-if (UseVMV_V_V)
-  MIB.addReg(DstReg, RegState::Undef);
-if (UseVMV_V_I)
-  MIB = MIB.add(DefMBBI->getOperand(2));
-else
-  MIB = MIB.addReg(SrcReg, getKillRegState(KillSrc));
-if (UseVMV_V_V) {
-  const MCInstrDesc &Desc = DefMBBI->getDesc();
-  MIB.add(DefMBBI->getOperand(RISCVII::getVLOpNum(Desc)));  // AVL
-  MIB.add(DefMBBI->getOperand(RISCVII::getSEWOpNum(Desc))); // SEW
-  MIB.addImm(0);// tu, mu
-  MIB.addReg(RISCV::VL, RegState::Implicit);
-  MIB.addReg(RISCV::VTYPE, RegState::Implicit);
-}
-return;
-  }
-
-  int I = 0, End = NF, Incr = 1;
   unsigned SrcEncoding = TRI->getEncodingValue(SrcReg);
   unsigned DstEncoding = TRI->getEncodingValue(DstReg);
   unsigned LMulVal;
   bool Fractional;
   std::tie(LMulVal, Fractional) = RISCVVType::decodeVLMUL(LMul);
   assert(!Fractional && "It is impossible be fractional lmul here.");
-  if (forwardCopyWillClobberTuple(DstEncoding, SrcEncoding, NF * LMulVal)) {
-I = NF - 1;
-End = -1;
-Incr = -1;
-  }
+  unsigned NumRegs = NF * LMulVal;
+  bool ReversedCopy =
+  forwardCopyWillClobberTuple(DstEncoding, SrcEncoding, NumRegs);
+  if (ReversedCopy) {
+// If there exists overlapping, we should copy the registers reversely.
+SrcEncoding += NumRegs - LMulVal;
+DstEncoding += NumRegs - LMulVal;
+  }
+
+  unsigned I = 0;
+  auto GetCopyInfo = [&](uint16_t SrcEncoding, uint16_t DstEncoding)
+  -> std::tuple {
+// If source register encoding and destination register encoding are 
aligned
+// to 8, we can do a LMUL8 copying.
+if (SrcEncoding % 8 == 0 && DstEncoding % 8 == 0 && I + 8 <= NumRegs)
+  return {RISCVII::LMUL_8, RISCV::VRM8RegClass, RISCV::VMV8R_V,
+  RISCV::PseudoVMV_V_V_M8, RISCV::PseudoVMV_V_I_M8};
+// If source register encoding and destination register encoding are 
aligned
+// to 4, we can do a LMUL4 copying.
+if (SrcEncoding % 4 == 0 && DstEncoding % 4 == 0 && I + 4 <= NumRegs)
+  return {RISCVII::LMUL_4, RISCV::VRM4RegClass, RISCV::VMV4R_V,
+  RISCV::PseudoVMV_V_V_M4, RISCV::PseudoVMV_V_I_M4};
+// If source register encoding and destination register encoding are 
aligned
+// to 2, we can do a LMUL2 copying.
+if (SrcEncoding % 2 == 0 && DstEncoding % 2 == 0 && I + 2 <= NumRegs)
+  return {RISCVII::LMUL_2, RISCV::VRM2RegClass, RISCV::VMV2R_V,
+  RISCV::PseudoVMV_V_V_M2, RISCV::PseudoVMV_V_I_M2};
+// Or we should do LMUL1 copying.
+return {RISCVII::LMUL_1, RISCV::VRRegClass, RISCV::VMV1R_V,
+RISCV::PseudoVMV_V_V_M1, RISCV::PseudoVMV_V_I_M1};
+  };
+  auto FindRegWithEncoding = [&TRI](const TargetRegisterClass &RegClass,
+uint16_t Encoding) {
+ArrayRef Regs = RegClass.getRegisters();
+const auto *FoundReg = llvm::find_if(Regs, [&](MCPhysReg Reg) {
+  return TRI->getEncodingValue(Reg) == Encoding;
+});
+// We should be always able to find one valid register.
+assert(FoundReg != Regs.end());
+return *FoundReg;
+  };

lukel97 wrote:

Would it be easier to get the register via `TRI->getSubReg`? I think you should 
be able to compute the subreg index based off the RegClass and `I`.

I don't think you'll need to compose any subreg indices like in 
`RISCVTargetLowering::decomposeSubvectorInsertExtractToSubRegs`

ht

[llvm-branch-commits] [llvm] [RISCV] Use larger copies when register tuples are aligned (PR #84455)

2024-03-26 Thread Luke Lau via llvm-branch-commits


@@ -302,102 +302,98 @@ void RISCVInstrInfo::copyPhysRegVector(MachineBasicBlock 
&MBB,
RISCVII::VLMUL LMul, unsigned NF) const 
{
   const TargetRegisterInfo *TRI = STI.getRegisterInfo();
 
-  unsigned Opc;
-  unsigned SubRegIdx;
-  unsigned VVOpc, VIOpc;
-  switch (LMul) {
-  default:
-llvm_unreachable("Impossible LMUL for vector register copy.");
-  case RISCVII::LMUL_1:
-Opc = RISCV::VMV1R_V;
-SubRegIdx = RISCV::sub_vrm1_0;
-VVOpc = RISCV::PseudoVMV_V_V_M1;
-VIOpc = RISCV::PseudoVMV_V_I_M1;
-break;
-  case RISCVII::LMUL_2:
-Opc = RISCV::VMV2R_V;
-SubRegIdx = RISCV::sub_vrm2_0;
-VVOpc = RISCV::PseudoVMV_V_V_M2;
-VIOpc = RISCV::PseudoVMV_V_I_M2;
-break;
-  case RISCVII::LMUL_4:
-Opc = RISCV::VMV4R_V;
-SubRegIdx = RISCV::sub_vrm4_0;
-VVOpc = RISCV::PseudoVMV_V_V_M4;
-VIOpc = RISCV::PseudoVMV_V_I_M4;
-break;
-  case RISCVII::LMUL_8:
-assert(NF == 1);
-Opc = RISCV::VMV8R_V;
-SubRegIdx = RISCV::sub_vrm1_0; // There is no sub_vrm8_0.
-VVOpc = RISCV::PseudoVMV_V_V_M8;
-VIOpc = RISCV::PseudoVMV_V_I_M8;
-break;
-  }
-
-  bool UseVMV_V_V = false;
-  bool UseVMV_V_I = false;
-  MachineBasicBlock::const_iterator DefMBBI;
-  if (isConvertibleToVMV_V_V(STI, MBB, MBBI, DefMBBI, LMul)) {
-UseVMV_V_V = true;
-Opc = VVOpc;
-
-if (DefMBBI->getOpcode() == VIOpc) {
-  UseVMV_V_I = true;
-  Opc = VIOpc;
-}
-  }
-
-  if (NF == 1) {
-auto MIB = BuildMI(MBB, MBBI, DL, get(Opc), DstReg);
-if (UseVMV_V_V)
-  MIB.addReg(DstReg, RegState::Undef);
-if (UseVMV_V_I)
-  MIB = MIB.add(DefMBBI->getOperand(2));
-else
-  MIB = MIB.addReg(SrcReg, getKillRegState(KillSrc));
-if (UseVMV_V_V) {
-  const MCInstrDesc &Desc = DefMBBI->getDesc();
-  MIB.add(DefMBBI->getOperand(RISCVII::getVLOpNum(Desc)));  // AVL
-  MIB.add(DefMBBI->getOperand(RISCVII::getSEWOpNum(Desc))); // SEW
-  MIB.addImm(0);// tu, mu
-  MIB.addReg(RISCV::VL, RegState::Implicit);
-  MIB.addReg(RISCV::VTYPE, RegState::Implicit);
-}
-return;
-  }
-
-  int I = 0, End = NF, Incr = 1;
   unsigned SrcEncoding = TRI->getEncodingValue(SrcReg);
   unsigned DstEncoding = TRI->getEncodingValue(DstReg);
   unsigned LMulVal;
   bool Fractional;
   std::tie(LMulVal, Fractional) = RISCVVType::decodeVLMUL(LMul);
   assert(!Fractional && "It is impossible be fractional lmul here.");
-  if (forwardCopyWillClobberTuple(DstEncoding, SrcEncoding, NF * LMulVal)) {
-I = NF - 1;
-End = -1;
-Incr = -1;
-  }
+  unsigned NumRegs = NF * LMulVal;
+  bool ReversedCopy =
+  forwardCopyWillClobberTuple(DstEncoding, SrcEncoding, NumRegs);
+  if (ReversedCopy) {
+// If there exists overlapping, we should copy the registers reversely.

lukel97 wrote:

Nit, maybe clarify this happens when copying tuples?

```suggestion
// If the src and dest overlap when copying a tuple, we need to copy the 
registers in reverse.
```

https://github.com/llvm/llvm-project/pull/84455
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Use larger copies when register tuples are aligned (PR #84455)

2024-03-26 Thread Luke Lau via llvm-branch-commits


@@ -212,19 +185,13 @@ body: |
 ; CHECK-NEXT: $v7 = VMV1R_V $v14
 ; CHECK-NEXT: $v8 = VMV1R_V $v15
 ; CHECK-NEXT: $v9 = VMV1R_V $v16
-; CHECK-NEXT: $v4 = VMV1R_V $v10
-; CHECK-NEXT: $v5 = VMV1R_V $v11
-; CHECK-NEXT: $v6 = VMV1R_V $v12
-; CHECK-NEXT: $v7 = VMV1R_V $v13
-; CHECK-NEXT: $v8 = VMV1R_V $v14
-; CHECK-NEXT: $v9 = VMV1R_V $v15
+; CHECK-NEXT: $v4m2 = VMV2R_V $v10m2
+; CHECK-NEXT: $v6m2 = VMV2R_V $v12m2
+; CHECK-NEXT: $v8m2 = VMV2R_V $v14m2
 ; CHECK-NEXT: $v10 = VMV1R_V $v16
-; CHECK-NEXT: $v22 = VMV1R_V $v16
-; CHECK-NEXT: $v21 = VMV1R_V $v15
-; CHECK-NEXT: $v20 = VMV1R_V $v14
-; CHECK-NEXT: $v19 = VMV1R_V $v13
-; CHECK-NEXT: $v18 = VMV1R_V $v12
-; CHECK-NEXT: $v17 = VMV1R_V $v11
+; CHECK-NEXT: $v22m2 = VMV2R_V $v16m2
+; CHECK-NEXT: $v20m2 = VMV2R_V $v14m2
+; CHECK-NEXT: $v18m2 = VMV2R_V $v12m2
 ; CHECK-NEXT: $v16 = VMV1R_V $v10

lukel97 wrote:

Do we have a test for a copy like:

```
$v16_v17_v18_v19_v20_v21_v22 = COPY $v15_v16_v17_v18_v19_v20_v21
```

Because I think this will need to be all VMV1R_Vs. Does it already do this?

https://github.com/llvm/llvm-project/pull/84455
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Use larger copies when register tuples are aligned (PR #84455)

2024-03-29 Thread Luke Lau via llvm-branch-commits


@@ -302,102 +302,98 @@ void RISCVInstrInfo::copyPhysRegVector(MachineBasicBlock 
&MBB,
RISCVII::VLMUL LMul, unsigned NF) const 
{
   const TargetRegisterInfo *TRI = STI.getRegisterInfo();
 
-  unsigned Opc;
-  unsigned SubRegIdx;
-  unsigned VVOpc, VIOpc;
-  switch (LMul) {
-  default:
-llvm_unreachable("Impossible LMUL for vector register copy.");
-  case RISCVII::LMUL_1:
-Opc = RISCV::VMV1R_V;
-SubRegIdx = RISCV::sub_vrm1_0;
-VVOpc = RISCV::PseudoVMV_V_V_M1;
-VIOpc = RISCV::PseudoVMV_V_I_M1;
-break;
-  case RISCVII::LMUL_2:
-Opc = RISCV::VMV2R_V;
-SubRegIdx = RISCV::sub_vrm2_0;
-VVOpc = RISCV::PseudoVMV_V_V_M2;
-VIOpc = RISCV::PseudoVMV_V_I_M2;
-break;
-  case RISCVII::LMUL_4:
-Opc = RISCV::VMV4R_V;
-SubRegIdx = RISCV::sub_vrm4_0;
-VVOpc = RISCV::PseudoVMV_V_V_M4;
-VIOpc = RISCV::PseudoVMV_V_I_M4;
-break;
-  case RISCVII::LMUL_8:
-assert(NF == 1);
-Opc = RISCV::VMV8R_V;
-SubRegIdx = RISCV::sub_vrm1_0; // There is no sub_vrm8_0.
-VVOpc = RISCV::PseudoVMV_V_V_M8;
-VIOpc = RISCV::PseudoVMV_V_I_M8;
-break;
-  }
-
-  bool UseVMV_V_V = false;
-  bool UseVMV_V_I = false;
-  MachineBasicBlock::const_iterator DefMBBI;
-  if (isConvertibleToVMV_V_V(STI, MBB, MBBI, DefMBBI, LMul)) {
-UseVMV_V_V = true;
-Opc = VVOpc;
-
-if (DefMBBI->getOpcode() == VIOpc) {
-  UseVMV_V_I = true;
-  Opc = VIOpc;
-}
-  }
-
-  if (NF == 1) {
-auto MIB = BuildMI(MBB, MBBI, DL, get(Opc), DstReg);
-if (UseVMV_V_V)
-  MIB.addReg(DstReg, RegState::Undef);
-if (UseVMV_V_I)
-  MIB = MIB.add(DefMBBI->getOperand(2));
-else
-  MIB = MIB.addReg(SrcReg, getKillRegState(KillSrc));
-if (UseVMV_V_V) {
-  const MCInstrDesc &Desc = DefMBBI->getDesc();
-  MIB.add(DefMBBI->getOperand(RISCVII::getVLOpNum(Desc)));  // AVL
-  MIB.add(DefMBBI->getOperand(RISCVII::getSEWOpNum(Desc))); // SEW
-  MIB.addImm(0);// tu, mu
-  MIB.addReg(RISCV::VL, RegState::Implicit);
-  MIB.addReg(RISCV::VTYPE, RegState::Implicit);
-}
-return;
-  }
-
-  int I = 0, End = NF, Incr = 1;
   unsigned SrcEncoding = TRI->getEncodingValue(SrcReg);
   unsigned DstEncoding = TRI->getEncodingValue(DstReg);
   unsigned LMulVal;
   bool Fractional;
   std::tie(LMulVal, Fractional) = RISCVVType::decodeVLMUL(LMul);
   assert(!Fractional && "It is impossible be fractional lmul here.");
-  if (forwardCopyWillClobberTuple(DstEncoding, SrcEncoding, NF * LMulVal)) {
-I = NF - 1;
-End = -1;
-Incr = -1;
-  }
+  unsigned NumRegs = NF * LMulVal;
+  bool ReversedCopy =
+  forwardCopyWillClobberTuple(DstEncoding, SrcEncoding, NumRegs);
+  if (ReversedCopy) {
+// If there exists overlapping, we should copy the registers reversely.
+SrcEncoding += NumRegs - LMulVal;
+DstEncoding += NumRegs - LMulVal;
+  }
+
+  unsigned I = 0;
+  auto GetCopyInfo = [&](uint16_t SrcEncoding, uint16_t DstEncoding)
+  -> std::tuple {
+// If source register encoding and destination register encoding are 
aligned
+// to 8, we can do a LMUL8 copying.
+if (SrcEncoding % 8 == 0 && DstEncoding % 8 == 0 && I + 8 <= NumRegs)
+  return {RISCVII::LMUL_8, RISCV::VRM8RegClass, RISCV::VMV8R_V,
+  RISCV::PseudoVMV_V_V_M8, RISCV::PseudoVMV_V_I_M8};
+// If source register encoding and destination register encoding are 
aligned
+// to 4, we can do a LMUL4 copying.
+if (SrcEncoding % 4 == 0 && DstEncoding % 4 == 0 && I + 4 <= NumRegs)
+  return {RISCVII::LMUL_4, RISCV::VRM4RegClass, RISCV::VMV4R_V,
+  RISCV::PseudoVMV_V_V_M4, RISCV::PseudoVMV_V_I_M4};
+// If source register encoding and destination register encoding are 
aligned
+// to 2, we can do a LMUL2 copying.
+if (SrcEncoding % 2 == 0 && DstEncoding % 2 == 0 && I + 2 <= NumRegs)
+  return {RISCVII::LMUL_2, RISCV::VRM2RegClass, RISCV::VMV2R_V,
+  RISCV::PseudoVMV_V_V_M2, RISCV::PseudoVMV_V_I_M2};
+// Or we should do LMUL1 copying.
+return {RISCVII::LMUL_1, RISCV::VRRegClass, RISCV::VMV1R_V,
+RISCV::PseudoVMV_V_V_M1, RISCV::PseudoVMV_V_I_M1};
+  };
+  auto FindRegWithEncoding = [&TRI](const TargetRegisterClass &RegClass,
+uint16_t Encoding) {
+ArrayRef Regs = RegClass.getRegisters();
+const auto *FoundReg = llvm::find_if(Regs, [&](MCPhysReg Reg) {
+  return TRI->getEncodingValue(Reg) == Encoding;
+});
+// We should be always able to find one valid register.
+assert(FoundReg != Regs.end());
+return *FoundReg;
+  };

lukel97 wrote:

I presume you don't need to use a subreg index if the register is a VRN8M1 and 
you're trying to do a VRM8 copy? Since the VRM8 reg class should be a subclass 
of VRN8M1 right? (Hope I'm getting the subreg/subclass terminology right btw)

https://github.com/llvm/llvm-project/pull/8

[llvm-branch-commits] [llvm] [RISCV] Use larger copies when register tuples are aligned (PR #84455)

2024-03-31 Thread Luke Lau via llvm-branch-commits


@@ -302,102 +302,98 @@ void RISCVInstrInfo::copyPhysRegVector(MachineBasicBlock 
&MBB,
RISCVII::VLMUL LMul, unsigned NF) const 
{
   const TargetRegisterInfo *TRI = STI.getRegisterInfo();
 
-  unsigned Opc;
-  unsigned SubRegIdx;
-  unsigned VVOpc, VIOpc;
-  switch (LMul) {
-  default:
-llvm_unreachable("Impossible LMUL for vector register copy.");
-  case RISCVII::LMUL_1:
-Opc = RISCV::VMV1R_V;
-SubRegIdx = RISCV::sub_vrm1_0;
-VVOpc = RISCV::PseudoVMV_V_V_M1;
-VIOpc = RISCV::PseudoVMV_V_I_M1;
-break;
-  case RISCVII::LMUL_2:
-Opc = RISCV::VMV2R_V;
-SubRegIdx = RISCV::sub_vrm2_0;
-VVOpc = RISCV::PseudoVMV_V_V_M2;
-VIOpc = RISCV::PseudoVMV_V_I_M2;
-break;
-  case RISCVII::LMUL_4:
-Opc = RISCV::VMV4R_V;
-SubRegIdx = RISCV::sub_vrm4_0;
-VVOpc = RISCV::PseudoVMV_V_V_M4;
-VIOpc = RISCV::PseudoVMV_V_I_M4;
-break;
-  case RISCVII::LMUL_8:
-assert(NF == 1);
-Opc = RISCV::VMV8R_V;
-SubRegIdx = RISCV::sub_vrm1_0; // There is no sub_vrm8_0.
-VVOpc = RISCV::PseudoVMV_V_V_M8;
-VIOpc = RISCV::PseudoVMV_V_I_M8;
-break;
-  }
-
-  bool UseVMV_V_V = false;
-  bool UseVMV_V_I = false;
-  MachineBasicBlock::const_iterator DefMBBI;
-  if (isConvertibleToVMV_V_V(STI, MBB, MBBI, DefMBBI, LMul)) {
-UseVMV_V_V = true;
-Opc = VVOpc;
-
-if (DefMBBI->getOpcode() == VIOpc) {
-  UseVMV_V_I = true;
-  Opc = VIOpc;
-}
-  }
-
-  if (NF == 1) {
-auto MIB = BuildMI(MBB, MBBI, DL, get(Opc), DstReg);
-if (UseVMV_V_V)
-  MIB.addReg(DstReg, RegState::Undef);
-if (UseVMV_V_I)
-  MIB = MIB.add(DefMBBI->getOperand(2));
-else
-  MIB = MIB.addReg(SrcReg, getKillRegState(KillSrc));
-if (UseVMV_V_V) {
-  const MCInstrDesc &Desc = DefMBBI->getDesc();
-  MIB.add(DefMBBI->getOperand(RISCVII::getVLOpNum(Desc)));  // AVL
-  MIB.add(DefMBBI->getOperand(RISCVII::getSEWOpNum(Desc))); // SEW
-  MIB.addImm(0);// tu, mu
-  MIB.addReg(RISCV::VL, RegState::Implicit);
-  MIB.addReg(RISCV::VTYPE, RegState::Implicit);
-}
-return;
-  }
-
-  int I = 0, End = NF, Incr = 1;
   unsigned SrcEncoding = TRI->getEncodingValue(SrcReg);
   unsigned DstEncoding = TRI->getEncodingValue(DstReg);
   unsigned LMulVal;
   bool Fractional;
   std::tie(LMulVal, Fractional) = RISCVVType::decodeVLMUL(LMul);
   assert(!Fractional && "It is impossible be fractional lmul here.");
-  if (forwardCopyWillClobberTuple(DstEncoding, SrcEncoding, NF * LMulVal)) {
-I = NF - 1;
-End = -1;
-Incr = -1;
-  }
+  unsigned NumRegs = NF * LMulVal;
+  bool ReversedCopy =
+  forwardCopyWillClobberTuple(DstEncoding, SrcEncoding, NumRegs);
+  if (ReversedCopy) {
+// If there exists overlapping, we should copy the registers reversely.
+SrcEncoding += NumRegs - LMulVal;
+DstEncoding += NumRegs - LMulVal;
+  }
+
+  unsigned I = 0;
+  auto GetCopyInfo = [&](uint16_t SrcEncoding, uint16_t DstEncoding)
+  -> std::tuple {
+// If source register encoding and destination register encoding are 
aligned
+// to 8, we can do a LMUL8 copying.
+if (SrcEncoding % 8 == 0 && DstEncoding % 8 == 0 && I + 8 <= NumRegs)
+  return {RISCVII::LMUL_8, RISCV::VRM8RegClass, RISCV::VMV8R_V,
+  RISCV::PseudoVMV_V_V_M8, RISCV::PseudoVMV_V_I_M8};
+// If source register encoding and destination register encoding are 
aligned
+// to 4, we can do a LMUL4 copying.
+if (SrcEncoding % 4 == 0 && DstEncoding % 4 == 0 && I + 4 <= NumRegs)
+  return {RISCVII::LMUL_4, RISCV::VRM4RegClass, RISCV::VMV4R_V,
+  RISCV::PseudoVMV_V_V_M4, RISCV::PseudoVMV_V_I_M4};
+// If source register encoding and destination register encoding are 
aligned
+// to 2, we can do a LMUL2 copying.
+if (SrcEncoding % 2 == 0 && DstEncoding % 2 == 0 && I + 2 <= NumRegs)
+  return {RISCVII::LMUL_2, RISCV::VRM2RegClass, RISCV::VMV2R_V,
+  RISCV::PseudoVMV_V_V_M2, RISCV::PseudoVMV_V_I_M2};
+// Or we should do LMUL1 copying.
+return {RISCVII::LMUL_1, RISCV::VRRegClass, RISCV::VMV1R_V,
+RISCV::PseudoVMV_V_V_M1, RISCV::PseudoVMV_V_I_M1};
+  };
+  auto FindRegWithEncoding = [&TRI](const TargetRegisterClass &RegClass,
+uint16_t Encoding) {
+ArrayRef Regs = RegClass.getRegisters();
+const auto *FoundReg = llvm::find_if(Regs, [&](MCPhysReg Reg) {
+  return TRI->getEncodingValue(Reg) == Encoding;
+});
+// We should be always able to find one valid register.
+assert(FoundReg != Regs.end());
+return *FoundReg;
+  };

lukel97 wrote:

Yeah, although I thought that `GetCopyInfo` was already checking that 
SrcReg/DstReg was aligned to the VRM8 reg class.

But I just checked and it looks like there's only subregisters on tuples for 
the same LMUL, e.g. V0_V1_V2_V3_V4_V5_V6_V7 from VRN8M1 only has the LMUL1 
subregi

[llvm-branch-commits] [llvm] [RISCV] Use larger copies when register tuples are aligned (PR #84455)

2024-04-05 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 approved this pull request.


https://github.com/llvm/llvm-project/pull/84455
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [RISCV] Remove hasSideEffects=1 for saturating/fault-only-first instructions (PR #90049)

2024-04-25 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 commented:

Removing it from vleNff sense to me. As long as we have the implicit-def $vl on 
the pseudo to prevent it being moved between vsetvlis I think it should be ok. 

https://github.com/llvm/llvm-project/pull/90049
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [RISCV] Remove hasSideEffects=1 for saturating/fault-only-first instructions (PR #90049)

2024-04-25 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 edited 
https://github.com/llvm/llvm-project/pull/90049
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [RISCV] Remove hasSideEffects=1 for saturating/fault-only-first instructions (PR #90049)

2024-04-29 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 edited 
https://github.com/llvm/llvm-project/pull/90049
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [RISCV] Remove hasSideEffects=1 for saturating/fault-only-first instructions (PR #90049)

2024-04-29 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 approved this pull request.


https://github.com/llvm/llvm-project/pull/90049
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [RISCV] Remove hasSideEffects=1 for saturating/fault-only-first instructions (PR #90049)

2024-04-29 Thread Luke Lau via llvm-branch-commits


@@ -194,15 +194,12 @@ define void @vpmerge_vpload_store( 
%passthru, ptr %p, , i64 } @llvm.riscv.vleff.nxv2i32(, ptr, i64)
 define  @vpmerge_vleff( %passthru, ptr %p, 
 %m, i32 zeroext %vl) {
 ; CHECK-LABEL: vpmerge_vleff:
 ; CHECK:   # %bb.0:
-; CHECK-NEXT:vsetvli zero, a1, e32, m1, ta, ma
-; CHECK-NEXT:vle32ff.v v9, (a0)
-; CHECK-NEXT:vsetvli zero, a1, e32, m1, tu, ma
-; CHECK-NEXT:vmerge.vvm v8, v8, v9, v0
+; CHECK-NEXT:vsetvli zero, a1, e32, m1, tu, mu
+; CHECK-NEXT:vle32ff.v v8, (a0), v0.t

lukel97 wrote:

Looks correct to me.

https://github.com/llvm/llvm-project/pull/90049
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/18.x: [RISCV] Re-separate unaligned scalar and vector memory features in the backend. (PR #92143)

2024-05-17 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 approved this pull request.

Chiming in that this seems reasonable to me, given the performance impact of 
not having unaligned scalar accesses. And hopefully we can remove this one 
we're settled on a proper interface.

https://github.com/llvm/llvm-project/pull/92143
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] c0b9269 - [RISCV] Add helper to copy the AVL of another VSETVLIInfo. NFC

2023-11-30 Thread Luke Lau via llvm-branch-commits

Author: Luke Lau
Date: 2023-11-30T15:19:46+08:00
New Revision: c0b926939829d9d4bb6ac5825e62f30960b6ed22

URL: 
https://github.com/llvm/llvm-project/commit/c0b926939829d9d4bb6ac5825e62f30960b6ed22
DIFF: 
https://github.com/llvm/llvm-project/commit/c0b926939829d9d4bb6ac5825e62f30960b6ed22.diff

LOG: [RISCV] Add helper to copy the AVL of another VSETVLIInfo. NFC

Added: 


Modified: 
llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp

Removed: 




diff  --git a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp 
b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
index 3bbc85d836c3f4a..3bb648359e39dd6 100644
--- a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
@@ -477,6 +477,18 @@ class VSETVLIInfo {
 return AVLImm;
   }
 
+  void setAVL(VSETVLIInfo Info) {
+assert(Info.isValid());
+if (Info.isUnknown())
+  setUnknown();
+else if (Info.hasAVLReg())
+  setAVLReg(Info.getAVLReg());
+else {
+  assert(Info.hasAVLImm());
+  setAVLImm(Info.getAVLImm());
+}
+  }
+
   unsigned getSEW() const { return SEW; }
   RISCVII::VLMUL getVLMUL() const { return VLMul; }
 
@@ -1054,10 +1066,7 @@ void RISCVInsertVSETVLI::transferBefore(VSETVLIInfo 
&Info,
   // TODO: We can probably relax this for immediates.
   if (Demanded.VLZeroness && !Demanded.VLAny && PrevInfo.isValid() &&
   PrevInfo.hasEquallyZeroAVL(Info, *MRI) && Info.hasSameVLMAX(PrevInfo)) {
-if (PrevInfo.hasAVLImm())
-  Info.setAVLImm(PrevInfo.getAVLImm());
-else
-  Info.setAVLReg(PrevInfo.getAVLReg());
+Info.setAVL(PrevInfo);
 return;
   }
 
@@ -1074,10 +1083,7 @@ void RISCVInsertVSETVLI::transferBefore(VSETVLIInfo 
&Info,
   VSETVLIInfo DefInfo = getInfoForVSETVLI(*DefMI);
   if (DefInfo.hasSameVLMAX(Info) &&
   (DefInfo.hasAVLImm() || DefInfo.getAVLReg() == RISCV::X0)) {
-if (DefInfo.hasAVLImm())
-  Info.setAVLImm(DefInfo.getAVLImm());
-else
-  Info.setAVLReg(DefInfo.getAVLReg());
+Info.setAVL(DefInfo);
 return;
   }
 }



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Backport 5cf9f2cd9888feea23a624c1de3cc37ce8ce8112 to release/18.x (PR #79931)

2024-01-29 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 milestoned 
https://github.com/llvm/llvm-project/pull/79931
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Backport 5cf9f2cd9888feea23a624c1de3cc37ce8ce8112 to release/18.x (PR #79931)

2024-01-29 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 created 
https://github.com/llvm/llvm-project/pull/79931

This cherry picks a fix 5cf9f2cd9888feea23a624c1de3cc37ce8ce8112 for a 
miscompile (only with the -mrvv-vector-bits=zvl configuration or similar) 
introduced in bb8a8770e203ba027d141cd1200e93809ea66c8f, which is present in the 
18.x release branch. It also includes a commit that adds a test 
d407e6ca61a422f25841674d8f0b5ea0dbec85f8

>From 5b3331f29489446d7d723a33310b7fec37153976 Mon Sep 17 00:00:00 2001
From: Luke Lau 
Date: Fri, 26 Jan 2024 20:16:21 +0700
Subject: [PATCH 1/2] [RISCV] Add test to showcase miscompile from #79072

---
 .../rvv/fixed-vectors-shuffle-exact-vlen.ll| 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-exact-vlen.ll 
b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-exact-vlen.ll
index f53b51e05c572..c0b02f62444ef 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-exact-vlen.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-exact-vlen.ll
@@ -138,8 +138,8 @@ define <4 x i64> @m2_splat_two_source(<4 x i64> %v1, <4 x 
i64> %v2) vscale_range
   ret <4 x i64> %res
 }
 
-define <4 x i64> @m2_splat_into_identity_two_source(<4 x i64> %v1, <4 x i64> 
%v2) vscale_range(2,2) {
-; CHECK-LABEL: m2_splat_into_identity_two_source:
+define <4 x i64> @m2_splat_into_identity_two_source_v2_hi(<4 x i64> %v1, <4 x 
i64> %v2) vscale_range(2,2) {
+; CHECK-LABEL: m2_splat_into_identity_two_source_v2_hi:
 ; CHECK:   # %bb.0:
 ; CHECK-NEXT:vsetivli zero, 2, e64, m1, ta, ma
 ; CHECK-NEXT:vrgather.vi v10, v8, 0
@@ -149,6 +149,20 @@ define <4 x i64> @m2_splat_into_identity_two_source(<4 x 
i64> %v1, <4 x i64> %v2
   ret <4 x i64> %res
 }
 
+; FIXME: This is a miscompile, we're clobbering the lower reg group of %v2
+; (v10), and the vmv1r.v is moving from the wrong reg group (should be v10)
+define <4 x i64> @m2_splat_into_slide_two_source_v2_lo(<4 x i64> %v1, <4 x 
i64> %v2) vscale_range(2,2) {
+; CHECK-LABEL: m2_splat_into_slide_two_source_v2_lo:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:vsetivli zero, 2, e64, m1, ta, ma
+; CHECK-NEXT:vrgather.vi v10, v8, 0
+; CHECK-NEXT:vmv1r.v v11, v8
+; CHECK-NEXT:vmv2r.v v8, v10
+; CHECK-NEXT:ret
+  %res = shufflevector <4 x i64> %v1, <4 x i64> %v2, <4 x i32> 
+  ret <4 x i64> %res
+}
+
 define <4 x i64> @m2_splat_into_slide_two_source(<4 x i64> %v1, <4 x i64> %v2) 
vscale_range(2,2) {
 ; CHECK-LABEL: m2_splat_into_slide_two_source:
 ; CHECK:   # %bb.0:

>From 60341586c8bd46b1094663749ac6467058b7efe8 Mon Sep 17 00:00:00 2001
From: Luke Lau 
Date: Fri, 26 Jan 2024 20:18:08 +0700
Subject: [PATCH 2/2] [RISCV] Fix M1 shuffle on wrong SrcVec in
 lowerShuffleViaVRegSplitting

This fixes a miscompile from #79072 where we were taking the wrong SrcVec to do
the M1 shuffle. E.g. if the SrcVecIdx was 2 and we had 2 VRegsPerSrc, we ended
up taking it from V1 instead of V2.
---
 llvm/lib/Target/RISCV/RISCVISelLowering.cpp   | 2 +-
 .../CodeGen/RISCV/rvv/fixed-vectors-shuffle-exact-vlen.ll | 8 +++-
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp 
b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 47c6cd6e5487b..7895d74f06d12 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -4718,7 +4718,7 @@ static SDValue 
lowerShuffleViaVRegSplitting(ShuffleVectorSDNode *SVN,
 if (SrcVecIdx == -1)
   continue;
 unsigned ExtractIdx = (SrcVecIdx % VRegsPerSrc) * NumOpElts;
-SDValue SrcVec = (unsigned)SrcVecIdx > VRegsPerSrc ? V2 : V1;
+SDValue SrcVec = (unsigned)SrcVecIdx >= VRegsPerSrc ? V2 : V1;
 SDValue SubVec = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, M1VT, SrcVec,
  DAG.getVectorIdxConstant(ExtractIdx, DL));
 SubVec = convertFromScalableVector(OneRegVT, SubVec, DAG, Subtarget);
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-exact-vlen.ll 
b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-exact-vlen.ll
index c0b02f62444ef..3f0bdb9d5e316 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-exact-vlen.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-shuffle-exact-vlen.ll
@@ -149,15 +149,13 @@ define <4 x i64> 
@m2_splat_into_identity_two_source_v2_hi(<4 x i64> %v1, <4 x i6
   ret <4 x i64> %res
 }
 
-; FIXME: This is a miscompile, we're clobbering the lower reg group of %v2
-; (v10), and the vmv1r.v is moving from the wrong reg group (should be v10)
 define <4 x i64> @m2_splat_into_slide_two_source_v2_lo(<4 x i64> %v1, <4 x 
i64> %v2) vscale_range(2,2) {
 ; CHECK-LABEL: m2_splat_into_slide_two_source_v2_lo:
 ; CHECK:   # %bb.0:
 ; CHECK-NEXT:vsetivli zero, 2, e64, m1, ta, ma
-; CHECK-NEXT:vrgather.vi v10, v8, 0
-; CHECK-NEXT:vmv1r.v v11, v8
-; CHECK-NEXT:vmv2r.v v8, v10
+; CHECK-NEXT:vrgather.vi v12, v8, 0
+; CHECK-NEXT:vm

[llvm-branch-commits] [llvm] [RISCV] Add initial support of memcmp expansion (PR #107548)

2024-09-11 Thread Luke Lau via llvm-branch-commits

lukel97 wrote:

I collected the stats on the number of memcmps that were inlined, it looks like 
we're able to expand a good chunk of them:
```
Program   expand-memcmp.NumMemCmpCalls  
expand-memcmp.NumMemCmpInlined 
  lhs  rhs  
  diff  lhsrhsdiff 
FP2017rate/510.parest_r/510.parest_r  410.00   
468.00 14.1%104.00  inf%
INT2017speed/602.gcc_s/602.gcc_s   83.00
92.00 10.8% 36.00  inf%
INT2017rate/502.gcc_r/502.gcc_r83.00
92.00 10.8% 36.00  inf%
INT2017spe...00.perlbench_s/600.perlbench_s   207.00   
220.00  6.3%120.00  inf%
INT2017rat...00.perlbench_r/500.perlbench_r   207.00   
220.00  6.3%120.00  inf%
INT2017spe...ed/620.omnetpp_s/620.omnetpp_s   304.00   
306.00  0.7% 13.00  inf%
INT2017rate/520.omnetpp_r/520.omnetpp_r   304.00   
306.00  0.7% 13.00  inf%
FP2017rate/508.namd_r/508.namd_r   13.00
13.00  0.0% 13.00  inf%
INT2017rate/541.leela_r/541.leela_r40.00
40.00  0.0%  3.00  inf%
INT2017speed/641.leela_s/641.leela_s   40.00
40.00  0.0%  3.00  inf%
INT2017speed/625.x264_s/625.x264_s  8.00 
8.00  0.0%  6.00  inf%
INT2017spe...23.xalancbmk_s/623.xalancbmk_s 8.00 
8.00  0.0%  6.00  inf%
INT2017rate/557.xz_r/557.xz_r   6.00 
6.00  0.0%  4.00  inf%
INT2017rat...23.xalancbmk_r/523.xalancbmk_r 8.00 
8.00  0.0%  6.00  inf%
INT2017rate/525.x264_r/525.x264_r   8.00 
8.00  0.0%  6.00  inf%
FP2017speed/644.nab_s/644.nab_s77.00
77.00  0.0% 71.00  inf%
FP2017speed/638.imagick_s/638.imagick_s 3.00 
3.00  0.0%
FP2017rate/544.nab_r/544.nab_r 77.00
77.00  0.0% 71.00  inf%
FP2017rate/538.imagick_r/538.imagick_r  3.00 
3.00  0.0%
FP2017rate/526.blender_r/526.blender_r 41.00
41.00  0.0% 27.00  inf%
FP2017rate/511.povray_r/511.povray_r5.00 
5.00  0.0%  5.00  inf%
INT2017speed/657.xz_s/657.xz_s  6.00 
6.00  0.0%  4.00  inf%
```

There's a small difference in the number of original memcmp calls, there's some 
merge commits in this branch which might have changed the codegen slightly in 
the meantime.

I'm working on getting some runtime numbers now, sorry for the delay

https://github.com/llvm/llvm-project/pull/107548
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Add initial support of memcmp expansion (PR #107548)

2024-09-12 Thread Luke Lau via llvm-branch-commits

lukel97 wrote:

The run just finished, I'm seeing a 0.75% improvement on 500.perlbench_r, no 
regressions or improvements on the other benchmarks as far as I can see. Seems 
to check out with the number of memcmps inlined reported for perlbench!

https://github.com/llvm/llvm-project/pull/107548
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Add initial support of memcmp expansion (PR #107548)

2024-09-12 Thread Luke Lau via llvm-branch-commits

lukel97 wrote:

> > > The run just finished, I'm seeing a 0.75% improvement on 500.perlbench_r 
> > > on the BPI F3 (-O3 -mcpu=spacemit-x60), no regressions or improvements on 
> > > the other benchmarks as far as I can see. Seems to check out with the 
> > > number of memcmps inlined reported for perlbench!
> 
> > 
> 
> > Does spacemit-x60 support unaligned scalar memory and was your test with or 
> > without that enabled?
> 
> 
> 
> It supports unaligned scalar but not unaligned vector. And it seems we don't 
> add these features to `-mcpu=spacemit-x60`. So I think @lukel97 ran the SPEC 
> without unaligned scalar.

Yeah, -mno-strict-align gave a bus error. I ultimately built it without 
unaligned scalar since I wasn't sure if unaligned scalar was performant or not. 

https://github.com/llvm/llvm-project/pull/107548
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Support memcmp expansion for vectors (PR #114517)

2024-11-06 Thread Luke Lau via llvm-branch-commits


@@ -14520,17 +14520,78 @@ static bool narrowIndex(SDValue &N, ISD::MemIndexType 
IndexType, SelectionDAG &D
   return true;
 }
 
+/// Try to map an integer comparison with size > XLEN to vector instructions
+/// before type legalization splits it up into chunks.
+static SDValue
+combineVectorSizedSetCCEquality(EVT VT, SDValue X, SDValue Y, ISD::CondCode CC,
+const SDLoc &DL, SelectionDAG &DAG,
+const RISCVSubtarget &Subtarget) {
+  assert(ISD::isIntEqualitySetCC(CC) && "Bad comparison predicate");
+
+  if (!Subtarget.hasVInstructions())
+return SDValue();
+
+  MVT XLenVT = Subtarget.getXLenVT();
+  EVT OpVT = X.getValueType();
+  // We're looking for an oversized integer equality comparison.
+  if (OpVT.isScalableVT() || !OpVT.isScalarInteger())
+return SDValue();
+
+  unsigned OpSize = OpVT.getSizeInBits();
+  // TODO: Support non-power-of-2 types.
+  if (!isPowerOf2_32(OpSize))
+return SDValue();
+
+  // The size should be larger than XLen and smaller than the maximum vector
+  // size.
+  if (OpSize <= Subtarget.getXLen() ||
+  OpSize > Subtarget.getRealMinVLen() *
+   Subtarget.getMaxLMULForFixedLengthVectors())
+return SDValue();
+
+  // Don't perform this combine if constructing the vector will be expensive.
+  auto IsVectorBitCastCheap = [](SDValue X) {
+X = peekThroughBitcasts(X);
+return isa(X) || X.getValueType().isVector() ||
+   X.getOpcode() == ISD::LOAD;
+  };
+  if (!IsVectorBitCastCheap(X) || !IsVectorBitCastCheap(Y))
+return SDValue();
+
+  if (DAG.getMachineFunction().getFunction().hasFnAttribute(
+  Attribute::NoImplicitFloat))
+return SDValue();

lukel97 wrote:

Do we need to check for this on RISC-V? We're not introducing any FP code here

https://github.com/llvm/llvm-project/pull/114517
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Support memcmp expansion for vectors (PR #114517)

2024-11-06 Thread Luke Lau via llvm-branch-commits


@@ -2525,5 +2527,21 @@ RISCVTTIImpl::enableMemCmpExpansion(bool OptSize, bool 
IsZeroCmp) const {
 Options.LoadSizes = {8, 4, 2, 1};
   else
 Options.LoadSizes = {4, 2, 1};
+  if (IsZeroCmp && ST->hasVInstructions()) {

lukel97 wrote:

Doesn't this mean that processors with only +unaligned-scalar-mem will now 
expand vector-sized compares?

https://github.com/llvm/llvm-project/pull/114517
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Support memcmp expansion for vectors (PR #114517)

2024-11-06 Thread Luke Lau via llvm-branch-commits


@@ -14520,17 +14520,78 @@ static bool narrowIndex(SDValue &N, ISD::MemIndexType 
IndexType, SelectionDAG &D
   return true;
 }
 
+/// Try to map an integer comparison with size > XLEN to vector instructions
+/// before type legalization splits it up into chunks.
+static SDValue
+combineVectorSizedSetCCEquality(EVT VT, SDValue X, SDValue Y, ISD::CondCode CC,
+const SDLoc &DL, SelectionDAG &DAG,
+const RISCVSubtarget &Subtarget) {
+  assert(ISD::isIntEqualitySetCC(CC) && "Bad comparison predicate");
+
+  if (!Subtarget.hasVInstructions())
+return SDValue();
+
+  MVT XLenVT = Subtarget.getXLenVT();
+  EVT OpVT = X.getValueType();
+  // We're looking for an oversized integer equality comparison.
+  if (OpVT.isScalableVT() || !OpVT.isScalarInteger())
+return SDValue();
+
+  unsigned OpSize = OpVT.getSizeInBits();
+  // TODO: Support non-power-of-2 types.
+  if (!isPowerOf2_32(OpSize))
+return SDValue();
+
+  // The size should be larger than XLen and smaller than the maximum vector
+  // size.
+  if (OpSize <= Subtarget.getXLen() ||
+  OpSize > Subtarget.getRealMinVLen() *
+   Subtarget.getMaxLMULForFixedLengthVectors())
+return SDValue();
+
+  // Don't perform this combine if constructing the vector will be expensive.
+  auto IsVectorBitCastCheap = [](SDValue X) {
+X = peekThroughBitcasts(X);
+return isa(X) || X.getValueType().isVector() ||
+   X.getOpcode() == ISD::LOAD;
+  };
+  if (!IsVectorBitCastCheap(X) || !IsVectorBitCastCheap(Y))
+return SDValue();
+
+  if (DAG.getMachineFunction().getFunction().hasFnAttribute(
+  Attribute::NoImplicitFloat))
+return SDValue();

lukel97 wrote:

Oh that's right noimplicitfloat also disables SIMD, I forgot about that.

https://github.com/llvm/llvm-project/pull/114517
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Support memcmp expansion for vectors (PR #114517)

2024-11-06 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 edited 
https://github.com/llvm/llvm-project/pull/114517
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [RISCV] Add vcpop.m/vfirst.m to RISCVMaskedPseudosTable (PR #115162)

2024-11-06 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 edited 
https://github.com/llvm/llvm-project/pull/115162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [RISCV] Add vcpop.m/vfirst.m to RISCVMaskedPseudosTable (PR #115162)

2024-11-06 Thread Luke Lau via llvm-branch-commits


@@ -1150,6 +1150,7 @@ class VPseudoUnaryNoMaskGPROut :
 class VPseudoUnaryMaskGPROut :
   Pseudo<(outs GPR:$rd),
  (ins VR:$rs1, VMaskOp:$vm, AVL:$vl, sew:$sew), []>,
+  RISCVMaskedPseudo,

lukel97 wrote:

Nit, add instead of adding it in the class move it to the two `def`s so it's 
consistent with other uses of RISCVMaskedPseudo?

https://github.com/llvm/llvm-project/pull/115162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [RISCV] Add vcpop.m/vfirst.m to RISCVMaskedPseudosTable (PR #115162)

2024-11-06 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 approved this pull request.

Good catch. I double checked and we're setting ElementsDependOnVL and 
ElementsDependOnMask for VCPOP_M and VFIRST_M so adding RISCVMaskedPseudo 
should be safe.

https://github.com/llvm/llvm-project/pull/115162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/20.x: [VPlan] Only use SCEV for live-ins in tryToWiden. (#125436) (PR #125659)

2025-02-04 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 approved this pull request.


https://github.com/llvm/llvm-project/pull/125659
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/20.x: Revert "[SLP] getSpillCost - fully populate IntrinsicCostAttributes to improve cost analysis." (#124962) (PR #126487)

2025-02-10 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 approved this pull request.

Thanks for fixing the cherry-pick. Re: #124499, I couldn't think of a simple 
fix we could apply on top of e3fbf19eb4428cac03c0e7301512f11f8947d743 for the 
20.x release branch.

I think it's best if we cherry-pick the revert so that performance isn't 
impacted on 20.x, and just continue to fix the cost model stuff in-tree for 
21.x.

https://github.com/llvm/llvm-project/pull/126487
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [RISCV] Add hasPostISelHook to sf.vfnrclip pseudo instructions. (#114274) (PR #117948)

2024-12-17 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/117948
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopVectorizer] Prune VFs based on plan register pressure (PR #132190)

2025-04-08 Thread Luke Lau via llvm-branch-commits

lukel97 wrote:

I collected some more data on RISC-V on SPEC CPU 2017, this improves code size 
by up to 7% on some benchmarks, and no regressions were found: 
https://lnt.lukelau.me/db_default/v4/nts/399?show_delta=yes&show_previous=yes&show_stddev=yes&show_mad=yes&show_all=yes&show_all_samples=yes&show_sample_counts=yes&show_small_diff=yes&num_comparison_runs=0&test_filter=&test_min_value_filter=&aggregation_fn=min&MW_confidence_lv=0.05&compare_to=401&submit=Update

There's also a significant decrease in vector spilling and reloading. It 
removes all the spilling entirely on one benchmark so the geomean result is 
stuck at 100%:

```
Program   riscv-instr-info.NumVRegReloaded  
   riscv-instr-info.NumVRegSpilled
  lhs  
rhs difflhs rhs diff   
FP2017rate/508.namd_r/508.namd_r 6.00   
  6.000.0%1.001.000.0%
INT2017rat...00.perlbench_r/500.perlbench_r  8.00   
  8.000.0%4.004.000.0%
INT2017speed/625.x264_s/625.x264_s  35.00   
 35.000.0%   39.00   39.000.0%
INT2017spe...23.xalancbmk_s/623.xalancbmk_s  6.00   
  6.000.0%6.006.000.0%
INT2017spe...ed/620.omnetpp_s/620.omnetpp_s  5.00   
  5.000.0%4.004.000.0%
INT2017speed/602.gcc_s/602.gcc_s70.00   
 70.000.0%   64.00   64.000.0%
INT2017spe...00.perlbench_s/600.perlbench_s  8.00   
  8.000.0%4.004.000.0%
INT2017rate/525.x264_r/525.x264_r   35.00   
 35.000.0%   39.00   39.000.0%
INT2017rat...23.xalancbmk_r/523.xalancbmk_r  6.00   
  6.000.0%6.006.000.0%
INT2017rate/520.omnetpp_r/520.omnetpp_r  5.00   
  5.000.0%4.004.000.0%
INT2017rate/502.gcc_r/502.gcc_r 70.00   
 70.000.0%   64.00   64.000.0%
FP2017speed/644.nab_s/644.nab_s 24.00   
 24.000.0%   24.00   24.000.0%
FP2017rate/544.nab_r/544.nab_r  24.00   
 24.000.0%   24.00   24.000.0%
FP2017rate/511.povray_r/511.povray_r   131.00   
131.000.0%   74.00   74.000.0%
FP2017rate/510.parest_r/510.parest_r  1490.00  
1484.00   -0.4% 1231.00 1225.00   -0.5%
INT2017rat...31.deepsjeng_r/531.deepsjeng_r248.00   
218.00  -12.1%  134.00  102.00  -23.9%
INT2017spe...31.deepsjeng_s/631.deepsjeng_s248.00   
218.00  -12.1%  134.00  102.00  -23.9%
FP2017rate/526.blender_r/526.blender_r1210.00   
703.00  -41.9% 1033.00  654.00  -36.7%
FP2017speed/638.imagick_s/638.imagick_s   7524.00  
1486.00  -80.2% 4813.00  925.00  -80.8%
FP2017rate/538.imagick_r/538.imagick_r7524.00  
1486.00  -80.2% 4813.00  925.00  -80.8%
FP2017speed/619.lbm_s/619.lbm_s 42.00   
  0.00 -100.0%   42.00 -100.0%
FP2017rate/519.lbm_r/519.lbm_r  42.00   
  0.00 -100.0%   42.00 -100.0%
FP2017rate...97.specrand_fr/997.specrand_fr  0.00   
  0.00
FP2017spee...96.specrand_fs/996.specrand_fs  0.00   
  0.00
INT2017rate/505.mcf_r/505.mcf_r  0.00   
  0.00
INT2017rate/541.leela_r/541.leela_r  0.00   
  0.00
INT2017rate/557.xz_r/557.xz_r0.00   
  0.00
INT2017rat...99.specrand_ir/999.specrand_ir  0.00   
  0.00
INT2017speed/605.mcf_s/605.mcf_s  

[llvm-branch-commits] [llvm] release/20.x: [RISCV] Handle scalarized reductions in getArithmeticReductionCost (PR #136688)

2025-04-22 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 approved this pull request.


https://github.com/llvm/llvm-project/pull/136688
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Support non-power-of-2 types when expanding memcmp (PR #114971)

2025-06-16 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/114971
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Support non-power-of-2 types when expanding memcmp (PR #114971)

2025-06-13 Thread Luke Lau via llvm-branch-commits


@@ -2954,20 +2954,13 @@ RISCVTTIImpl::enableMemCmpExpansion(bool OptSize, bool 
IsZeroCmp) const {
   }
 
   if (IsZeroCmp && ST->hasVInstructions()) {
-unsigned RealMinVLen = ST->getRealMinVLen();
-// Support Fractional LMULs if the lengths are larger than XLen.
-// TODO: Support non-power-of-2 types.
-for (unsigned FLMUL = 8; FLMUL >= 2; FLMUL /= 2) {
-  unsigned Len = RealMinVLen / FLMUL;
-  if (Len > ST->getXLen())
-Options.LoadSizes.insert(Options.LoadSizes.begin(), Len / 8);
-}
-for (unsigned LMUL = 1; LMUL <= ST->getMaxLMULForFixedLengthVectors();
- LMUL *= 2) {
-  unsigned Len = RealMinVLen * LMUL;
-  if (Len > ST->getXLen())
-Options.LoadSizes.insert(Options.LoadSizes.begin(), Len / 8);
-}
+unsigned VLenB = ST->getRealMinVLen() / 8;
+// The minimum size should be the maximum bytes between `VLen * LMUL_MF8`
+// and `XLen * 2`.
+unsigned MinSize = std::max(VLenB / 8, ST->getXLen() * 2 / 8);

lukel97 wrote:

If that's the case, do we even need the LMUL check? I.e. can we just do 

```
unsigned MinSize = ST->getXLen() + 1;
```

And presumably for sizes < MF8, lowering will use the correct container anyway?

https://github.com/llvm/llvm-project/pull/114971
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Support non-power-of-2 types when expanding memcmp (PR #114971)

2025-06-13 Thread Luke Lau via llvm-branch-commits


@@ -2954,20 +2954,13 @@ RISCVTTIImpl::enableMemCmpExpansion(bool OptSize, bool 
IsZeroCmp) const {
   }
 
   if (IsZeroCmp && ST->hasVInstructions()) {
-unsigned RealMinVLen = ST->getRealMinVLen();
-// Support Fractional LMULs if the lengths are larger than XLen.
-// TODO: Support non-power-of-2 types.
-for (unsigned FLMUL = 8; FLMUL >= 2; FLMUL /= 2) {
-  unsigned Len = RealMinVLen / FLMUL;
-  if (Len > ST->getXLen())
-Options.LoadSizes.insert(Options.LoadSizes.begin(), Len / 8);
-}
-for (unsigned LMUL = 1; LMUL <= ST->getMaxLMULForFixedLengthVectors();
- LMUL *= 2) {
-  unsigned Len = RealMinVLen * LMUL;
-  if (Len > ST->getXLen())
-Options.LoadSizes.insert(Options.LoadSizes.begin(), Len / 8);
-}
+unsigned VLenB = ST->getRealMinVLen() / 8;
+// The minimum size should be the maximum bytes between `VLen * LMUL_MF8`
+// and `XLen * 2`.
+unsigned MinSize = std::max(VLenB / 8, ST->getXLen() * 2 / 8);

lukel97 wrote:

Just checking, if MF8 isn't supported for the ELEN, e.g. MF8 on zve32x, 
`getContainerForFixedLengthVector` in RISCVISelLowering will still lower it 
into the next largest LMUL so this should be fine right?


https://github.com/llvm/llvm-project/pull/114971
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Support non-power-of-2 types when expanding memcmp (PR #114971)

2025-06-13 Thread Luke Lau via llvm-branch-commits


@@ -2954,20 +2954,13 @@ RISCVTTIImpl::enableMemCmpExpansion(bool OptSize, bool 
IsZeroCmp) const {
   }
 
   if (IsZeroCmp && ST->hasVInstructions()) {
-unsigned RealMinVLen = ST->getRealMinVLen();
-// Support Fractional LMULs if the lengths are larger than XLen.
-// TODO: Support non-power-of-2 types.
-for (unsigned FLMUL = 8; FLMUL >= 2; FLMUL /= 2) {
-  unsigned Len = RealMinVLen / FLMUL;
-  if (Len > ST->getXLen())
-Options.LoadSizes.insert(Options.LoadSizes.begin(), Len / 8);
-}
-for (unsigned LMUL = 1; LMUL <= ST->getMaxLMULForFixedLengthVectors();
- LMUL *= 2) {
-  unsigned Len = RealMinVLen * LMUL;
-  if (Len > ST->getXLen())
-Options.LoadSizes.insert(Options.LoadSizes.begin(), Len / 8);
-}
+unsigned VLenB = ST->getRealMinVLen() / 8;
+// The minimum size should be the maximum bytes between `VLen * LMUL_MF8`
+// and `XLen * 2`.
+unsigned MinSize = std::max(VLenB / 8, ST->getXLen() * 2 / 8);

lukel97 wrote:

How come we need to limit the minimum size to XLen * 2? Can we not use vectors 
for the `bcmp_size_15` test case on RV64 too?

https://github.com/llvm/llvm-project/pull/114971
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Support memcmp expansion for vectors (PR #114517)

2025-06-11 Thread Luke Lau via llvm-branch-commits


@@ -14520,17 +14520,78 @@ static bool narrowIndex(SDValue &N, ISD::MemIndexType 
IndexType, SelectionDAG &D
   return true;
 }
 
+/// Try to map an integer comparison with size > XLEN to vector instructions
+/// before type legalization splits it up into chunks.
+static SDValue
+combineVectorSizedSetCCEquality(EVT VT, SDValue X, SDValue Y, ISD::CondCode CC,
+const SDLoc &DL, SelectionDAG &DAG,
+const RISCVSubtarget &Subtarget) {
+  assert(ISD::isIntEqualitySetCC(CC) && "Bad comparison predicate");
+
+  if (!Subtarget.hasVInstructions())
+return SDValue();
+
+  MVT XLenVT = Subtarget.getXLenVT();
+  EVT OpVT = X.getValueType();
+  // We're looking for an oversized integer equality comparison.
+  if (OpVT.isScalableVT() || !OpVT.isScalarInteger())

lukel97 wrote:

I believe OpVT.isScalableVT() implies !OpVT.isScalarInteger()? Can this be 
simplified to 
```suggestion
  if (!OpVT.isScalarInteger())
```

https://github.com/llvm/llvm-project/pull/114517
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Support memcmp expansion for vectors (PR #114517)

2025-06-11 Thread Luke Lau via llvm-branch-commits


@@ -14520,17 +14520,78 @@ static bool narrowIndex(SDValue &N, ISD::MemIndexType 
IndexType, SelectionDAG &D
   return true;
 }
 
+/// Try to map an integer comparison with size > XLEN to vector instructions
+/// before type legalization splits it up into chunks.
+static SDValue
+combineVectorSizedSetCCEquality(EVT VT, SDValue X, SDValue Y, ISD::CondCode CC,
+const SDLoc &DL, SelectionDAG &DAG,
+const RISCVSubtarget &Subtarget) {
+  assert(ISD::isIntEqualitySetCC(CC) && "Bad comparison predicate");
+
+  if (!Subtarget.hasVInstructions())
+return SDValue();
+
+  MVT XLenVT = Subtarget.getXLenVT();
+  EVT OpVT = X.getValueType();
+  // We're looking for an oversized integer equality comparison.
+  if (OpVT.isScalableVT() || !OpVT.isScalarInteger())
+return SDValue();
+
+  unsigned OpSize = OpVT.getSizeInBits();
+  // TODO: Support non-power-of-2 types.
+  if (!isPowerOf2_32(OpSize))
+return SDValue();

lukel97 wrote:

I think as long as it's byte sized it should be ok right? E.g.
```suggestion
  if (OpSize % 8)
return SDValue();
```

But happy if you want to leave this as is and do it as a follow on with the 
TODO in RISCVTargetTransformInfo.cpp

https://github.com/llvm/llvm-project/pull/114517
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Support memcmp expansion for vectors (PR #114517)

2025-06-12 Thread Luke Lau via llvm-branch-commits

https://github.com/lukel97 edited 
https://github.com/llvm/llvm-project/pull/114517
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Support memcmp expansion for vectors (PR #114517)

2025-06-12 Thread Luke Lau via llvm-branch-commits


@@ -2952,5 +2952,22 @@ RISCVTTIImpl::enableMemCmpExpansion(bool OptSize, bool 
IsZeroCmp) const {
 Options.LoadSizes = {4, 2, 1};
 Options.AllowedTailExpansions = {3};
   }
+
+  if (IsZeroCmp && ST->hasVInstructions() && ST->enableUnalignedVectorMem()) {

lukel97 wrote:

Do we still need the enableUnalignedVectorMem check? If I'm understanding this 
right MemcmpExpand will generate a scalar which should be ok because we check 
for enableUnalignedScalarMem. 

And then in the new combine we're not actually changing the load at all. There 
must be some other existing combine which is converting the scalar load to a 
vector load, which should be respecting alignment?

https://github.com/llvm/llvm-project/pull/114517
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [RISCV] Support memcmp expansion for vectors (PR #114517)

2025-06-12 Thread Luke Lau via llvm-branch-commits


@@ -16172,8 +16233,6 @@ static SDValue performSETCCCombine(SDNode *N, 
SelectionDAG &DAG,
   N0.getConstantOperandVal(1) != UINT64_C(0x))
 return SDValue();
 
-  // Looking for an equality compare.
-  ISD::CondCode Cond = cast(N->getOperand(2))->get();
   if (!isIntEqualitySetCC(Cond))
 return SDValue();

lukel97 wrote:

Nit, you could pull up this early exit to line 16217 since both 
combineVectorSizedSetCCEquality and the existing combine need it

https://github.com/llvm/llvm-project/pull/114517
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits