>> Ah, nice! How configurable are the bit ranges?
I think Lehua's patch is configurable for bit ranges.
Since his patch allow target flexible tracking subreg livenesss according to
REGMODE_NATURAL_SIZE
+/* Return true if REGNO is a pseudo and MODE is a multil regs size. */
+bool
+need_track_subreg (int regno, machine_mode reg_mode)
+{
+ poly_int64 total_size = GET_MODE_SIZE (reg_mode);
+ poly_int64 natural_size = REGMODE_NATURAL_SIZE (reg_mode);
+ return maybe_gt (total_size, natural_size)
+ && multiple_p (total_size, natural_size)
+ && regno >= FIRST_PSEUDO_REGISTER;
+}
It depends on how targets configure REGMODE_NATURAL_SIZE target hook.
If we return QImode size, his patch is enable tracking bit ranges 7 bits subreg.
[email protected]
From: Richard Sandiford
Date: 2023-11-12 19:53
To: 钟居哲
CC: Jeff Law; 丁乐华; gcc-patches; vmakarov
Subject: Re: [PATCH 0/7] ira/lra: Support subreg coalesce
钟居哲 <[email protected]> writes:
> Hi, Richard.
>
>>> Maybe dead lanes are better tracked at the gimple level though, not sure.
>>> (But AArch64 might need to lower lane operations more than it does now if
>>> we want gimple to handle it.)
>
> We were trying to address such issue at GIMPLE leve at the beginning.
> Tracking subreg-lanes of tuple type may be enough for aarch64 since aarch64
> only tuple types.
> However, for RVV, that's not enough to address all issues.
> Consider this following situation:
> https://godbolt.org/z/fhTvEjvr8
>
> You can see comparing with LLVM, GCC has so many redundant mov instructions
> "vmv1r.v".
> Since GCC is not able to tracking subreg liveness, wheras LLVM can.
>
> The reason why tracking sub-lanes in GIMPLE can not address these redundant
> move issues for RVV:
>
> 1. RVV has tuple type like "vint8m1x2_t" which is totoally the same as
> aarch64 "svint8x1_t".
> It used by segment load/store which is similiar instruction "ld2r"
> instruction in ARM SVE (vec_load_lanes/vec_store_lanes)
> Support sub-lanes tracking in GIMPLE can fix this situation for both RVV
> and ARM SVE.
>
> 2. However, we are not having "vint8m1x2_t", we also have "vint8m2_t" (LMUL
> =2) which also occupies 2 regsiters
> which is not tuple type, instead, it is simple vector type. Such type is
> used by all simple operations.
> For example, "vadd" with vint8m1_t is doing PLUS operation on single
> vector registers, wheras same
> instruction "vadd“ with vint8m2_t is dong PLUS operation on 2 vector
> registers. Such type we can't
> define them as tuple type for following reasons:
> 1). we also have tuple type for LMUL > 1, for example, we also have
> "vint8m2x2_t" has tuple type.
> If we define "vint8m2_t" as tuple type, How about "vint8m2x2_t" ? ,
> Tuple type with tuple or
> Array with array ? It makes type so strange.
> 2). RVV instrinsic doc define vint8m2x2_t as tuple type, but vint8m2_t
> not tuple type. We are not able
> to change the documents.
> 3). Clang has supported RVV intrinsics 3 years ago, vint8m2_t is not
> tuple type for 3 years and widely
> used, changing type definition will destroy ecosystem. So for
> compability, we are not able define
> LMUL > 1 as tuple type.
>
> For these reasons, we should be able to access highpart of vint8m2_t and
> lowpart of vint8m2_t, we provide
> vget to generate subreg access of the vector mode.
>
> So, at the discussion stage, we decided to address subpart access of vector
> mode in more generic way,
> which is support subreg liveness tracking in RTL level. So that it can not
> only address issues happens on ARM SVE,
> but also address issues for LMUL > 1.
>
> 3. After we decided to support subreg liveness tracking in RTL, we study LLVM.
> Actually, LLVM has a standalone PASS right before their linear scan RA
> (greedy) call register coalescer.
> So, the first draft of our solution is supporting register coalescing
> before RA which is opened source:
> riscv-gcc/gcc/ira-coalesce.cc at riscv-gcc-rvv-next ·
> riscv-collab/riscv-gcc (github.com)
> by simulating LLVM solution. However, we don't think such solution is
> elegant and we have consulted
> Vlad. Vlad suggested we should enhance IRA/LRA with subreg liveness
> tracking which turns to be
> more reasonable and elegant approach.
>
> So, after Lehua several experiments and investigations, he dedicate himself
> produce this series of patches.
> And we think Lehua's approach should be generic and optimal solution to fix
> this subreg generic problems.
Ah, sorry, I caused a misunderstanding. In the message quoted above,
I'd moved on from talking about tracking liveness of vectors in a tuple.
I was instead talking about tracking the liveness of individual lanes
in a single vector.
I was responding to Jeff's description of the bit-level liveness tracking
pass. That pass solves a generic issue: redundant sign and zero extensions.
But it sounded like it could also be reused for tracking lanes of a vector
(by using different bit ranges from the ones that Jeff listed).
The thing that I was saying might be better done on gimple was tracking
lanes of an individual vector. In other words, I was arguing against
my own question.
I should have changed the subject line when responding, sorry.
I wasn't suggesting that we should avoid subreg tracking in the RA.
That's definitely needed for AArch64, and in general.
Thanks,
Richard