This patch would like to introduce the combine of vec_dup + vdiv.vv into
vdiv.vx on the cost value of GR2VR. The late-combine will take place if
the cost of GR2VR is zero, or reject the combine if non-zero like 1, 15
in test. There will be two cases for the combine:
The series is OK, thanks.
This series is OK now, thanks.
--
Regards
Robin
1. riscv64-linux-gcc -march=rv64gc -march=foo-cpu -mtune=foo-cpu
2. riscv64-linux-gcc -march=rv64gc -march=foo-cpu
3. riscv64-linux-gcc -march=rv64gc -march=unset -mtune=unset -mcpu=foo-cpu
Preference to me:
- Prefer option 1.
- Less prefer option 3. (acceptable but I don't like)
- Strongly disli
I don't quite follow this part. IIUC the rules before this patch were
-march=ISA: Generate code that requires the given ISA, without
changing the tuning model.
-mcpu=CPU: Generate code for the given CPU, targeting all the
extensions that CPU supports and using the best known tu
This rule clearly applies to directly related options like -ffoo and
-fno-foo, but it’s less obvious for unrelated pairs like -ffoo and
-fbar especially when there is traditionally strong specifics.
In many cases, the principle of "the most specific option wins"
governs the behavior.
Here
I stumped across this change from
https://github.com/riscv-non-isa/riscv-toolchain-conventions/issues/88
and I want to express my strong disagreement with this change.
Perhaps I'm accustomed to Arm's behavior, but I believe using -march= to
target a specific CPU isn't ideal.
* -march=X: (exe
Inspired by the avg_ceil patches, notice there were even more
lines too long from autovec.md. So fix that format issues.
OK.
--
Regards
Robin
Hi Paul-Antoine,
overall the patch looks reasonable to me now, provided the fr2vr followup.
BTW it's the late-combine pass that performs the optimization, not the combine
pass. You might still want to fix this in the commit message.
Please CC patchworks...@rivosinc.com for the next version
Looks like the CI cannot tell patch series? There are 3 patches and the CI
will run for each one.
Of course, the first one will have scan failure due to expanding change, but
the second one reconciles them.
Finally the third one will have all test passed as below, I think it
indicates all test
Similar to the avg_floor, the avg_ceil has the rounding mode
towards +inf, while the vaadd.vv has the rnu which totally match
the sematics. From RVV spec, the fixed vaadd.vv with rnu,
The CI shows some scan failures in vls/avg-[456].c and widen/vec-avg-rv32gcv.c.
Also, the lint check complains
This patch fixes the typo in the test case `param-autovec-mode.c` in the
RISC-V autovec testsuite.
The option `autovec-mode` is changed to `riscv-autovec-mode` to match the
expected parameter name.
OK of course :)
--
Regards
Robin
This patch would like to introduce the combine of vec_dup + vmul.vv into
vmul.vx on the cost value of GR2VR. The late-combine will take place if
the cost of GR2VR is zero, or reject the combine if non-zero like 1, 15
in test. There will be two cases for the combine:
OK.
--
Regards
Robin
LGTM, thanks.
--
Regards
Robin
The first patch makes SLP paths unreachable and the second one removes those
entirely. The third patch does the actual strided-load work.
Bootstrapped and regtested on x86 and aarch64.
Regtested on rv64gcv_zvl512b.
Robin Dapp (3):
vect: Make non-SLP paths unreachable in strided slp
From: Robin Dapp
This patch enables strided loads for VMAT_STRIDED_SLP. Instead of
building vectors from scalars or other vectors we can use strided loads
directly when applicable.
The current implementation limits strided loads to cases where we can
load entire groups and not subsets of them
This removes the non-SLP paths that were made unreachable in the
previous patch.
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_load): Remove non-SLP paths.
---
gcc/tree-vect-stmts.cc | 49 --
1 file changed, 18 insertions(+), 31 deletions(-)
d
From: Robin Dapp
This replaces if (slp) with if (1) and if (!slp) with if (0).
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_load): Make non-SLP paths
unreachable.
---
gcc/tree-vect-stmts.cc | 10 +-
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/gcc
This mangles in the non-SLP path removal, can you please separate that
out?
So should patch 1/2 do more than it does, i.e. fully remove the non-slp
paths rather than just if (0) them?
--
Regards
Robin
That would be appreciated (but is of course a larger task - I was fine with
the partial thing you did).
Ok. Then to move things forward I'll do a 2/3 for this one first. Once we're
through the review cycle for the series I can work on the non-slp removal for
the full function.
--
Regards
R
On Tue, May 27, 2025 at 2:44 PM Robin Dapp wrote:
> This mangles in the non-SLP path removal, can you please separate that
> out?
So should patch 1/2 do more than it does, i.e. fully remove the non-slp
paths rather than just if (0) them?
There should be a separate 2/3 that does thi
Hi,
in check_builtin_call we eventually perform a division by zero when no
vector modes are present. This patch just avoids the division in that
case.
Regtested on rv64gcv_zvl512b. I guess this is obvious enough that it can be
pushed after the CI approves.
Regards
Robin
PR target/1
-(define_expand "avg3_floor"
- [(set (match_operand: 0 "register_operand")
- (truncate:
-(ashiftrt:VWEXTI
- (plus:VWEXTI
- (sign_extend:VWEXTI
- (match_operand: 1 "register_operand"))
- (sign_extend:VWEXTI
- (match_operand: 2 "register_operand"))]
+(define_expan
2. OK'ish: A bunch of testcases see more reads/writes as PRE of redundant
read/writes is punted to later passes which obviously needs more work.
3. NOK: We loose the ability to instrument local RM writes - especially in the
testsuite.
e.g.
a. instrinsic setting a static RM
b. get_frm
OK, thanks.
--
Regards
Robin
This patch would like to introduce the combine of vec_dup + vor.vv into
vor.vx on the cost value of GR2VR. The late-combine will take place if
the cost of GR2VR is zero, or reject the combine if non-zero like 1, 15
in test. There will be two cases for the combine:
OK, thanks.
--
Regards
Robin
AFAICT the main difference to standard mode switching is that we (ab)use it
to set the rounding mode to the value it had initially, either at function
entry or after a call. That's different to regular mode switching which
assumes "static" rounding modes for different instructions.
Standard c
Hi Paul-Antoine,
Please find attached a revised version of the patch.
Compared to the previous iteration, I have:
* Rebased on top of Pan's work;
* Updated the cost model;
* Added a second pattern to handle the case where PLUS_MINUS operands
are swapped;
* Added compile and run tests.
I boot
Could you make a simple testcase that could vectorize two loops in
different modes (e.g one SI and one SF) and with this param will only
auto vec on loop?
I added a test now in the attached v2 that checks that we vectorize with the
requested mode. Right now the patch only takes away "additiona
I could imagine that is a simpler way to set the march since the march
string becomes terribly long - we have an arch string more than 300
char...so I support this, although I think this should be discuss with
LLVM community, but I think it's fine to accept as a GCC extension.
So LGTM, go ahead t
Hi,
This patch allows an -march string like
-march=sifive-p670
in order to allow overriding a previous -march in a simple way.
Suppose we have a Makefile that specifies -march=rv64gc by default.
A user-specified -mcpu=sifive-p670 would be after the -march in the
options string and thus only s
Hi,
this patch initializes saved_vxrm_mode to VXRM_MODE_NONE. This is a
warning (but no error) when building the compiler so better fix it.
Regtested on rv64gcv_zvl512b. Going to commit as obvious if the CI
is happy.
Regards
Robin
gcc/ChangeLog:
* config/riscv/riscv.cc (singleton_vx
Hi,
This patch adds a --param=autovec-mode=. When the param is
specified we make autovectorize_vector_modes return exactly this mode if
it is available. This helps when testing different vectorizer settings.
Regtested on rv64gcv_zvl512b.
Regards
Robin
gcc/ChangeLog:
* config/riscv/r
This patch would like to introduce the combine of vec_dup + vand.vv into
vand.vx on the cost value of GR2VR. The late-combine will take place if
the cost of GR2VR is zero, or reject the combine if non-zero like 1, 15
in test. There will be two cases for the combine:
OK, thanks.
--
Regards
Rob
Maybe I'm missing something there. Particularly whether or not you can know
anything about frm's value after a call has returned. Normally the answer to
this kind of question is a hard no.
AFAICT the main difference to standard mode switching is that we (ab)use it to
set the rounding mode to
This patch enables strided loads for VMAT_STRIDED_SLP. Instead of
building vectors from scalars or other vectors we can use strided loads
directly when applicable.
The current implementation limits strided loads to cases where we can
load entire groups and not subsets of them. A future improveme
The second patch adds strided-load support for strided-slp memory
access. The first patch makes the respective non-slp paths unreachable.
Robin Dapp (2):
vect: Remove non-SLP paths in strided slp and elementwise.
vect: Use strided loads for VMAT_STRIDED_SLP.
gcc/internal-fn.cc
This replaces if (slp) with if (1) and if (!slp) with if (0).
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_load): Make non-slp paths
unreachable.
---
gcc/tree-vect-stmts.cc | 14 +++---
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/gcc/tree-vect-stmts
The series LGTM. I didn't check all the tests in detail to be honest :)
--
Regards
Robin
I was thinking of adding a vectorization_mode class that would
encapsulate the mode and whether to allow masking or alternatively
to make the vector_modes array (and the m_suggested_epilogue_mode)
a std::pair of mode and mask flag?
Without having a very strong opinion (or the full background) on
Excuse the delay, I was attending the RISC-V Summit Europe.
The series LGTM.
--
Regards
Robin
I think we need the run tests for each op combine up to a point. But for asm
check,
Seems we can put it together? I mean something like below:
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d --param=gpr2vr-cost=0" } */
+
+#include "vx_binary.h"
+
+DEF_VX_BINARY_CASE_0(int3
This patch would like to introduce the combine of vec_dup + vsub.vv into
vsub.vx on the cost value of GR2VR. The late-combine will take place if
the cost of GR2VR is zero, or reject the combine if non-zero like 1, 15
in test. There will be two cases for the combine:
The changes to add are very
it's just a vector cost model issue and some loops are not profitable
to vectorize?
Yes. For example, when gpr2vr is 1, int8_t cannot vectorize while uint8_t can.
OK, understood. I think that's expected given the fine granularity of the
tests. IMHO nothing that should block progress.
--
R
This patch series would like to add the testcases for this. However,
some test results is not that tidy, and we need more tuning for
the vector cost model.
The test adjustments LGTM but what do you mean by not tidy? I see you're
scanning just for the presence of "vx" instead of an exact numbe
Thanks Jeff. I will rebase and update my patch. One question though, I
noticed that Pan's patch introduced a command-line parameter to tweak the
GR2VR cost; do we need something equivalent for FR2VR?
Yes, we need it in order to be able to test both paths, i.e. combining and not
combining. Als
+/*
+ * Return the cost of operation that move from gpr to vr.
+ *
+ * It will take the value of --param=gpr2vr_cost if it is provided.
+ * Or the default regmove->GR2VR will be returned.
+ */
Please still remove the leading '*' of the comment. The series is OK with that
fixed. Thanks for you
1. those static const var initialized before options, can hardly initialize
correct.
2. The --param is somehow experimental, thus I prefer to keep the const GR2VR
in static structure as is.
I will append a new patch,aka let the reference goes to the new helper if that
is OK.
Yes that should
Hi Pan,
During investigate the combine from vec_dup and vop.vv into
vop.vx, we need to depend on the cost of the insn operate
from the gr to vr, for example, vadd.vx. Thus, for better
control and test, we introduce a new option, aka below:
--param=rvv-gr2vr-cost=
+static inline int
+get_vec
Although we already try to set the mode needed to FRM_DYN after a function call,
there are still some corner cases where both FRM_DYN and FRM_DYN_CALL may appear
on incoming edges.
Therefore, we use TARGET_MODE_CONFLUENCE to tell GCC that FRM_DYN, FRM_DYN_CALL,
and FRM_DYN_EXIT modes are compatib
I see, let the vec_dup enter the rtx_cost again to append the total to vmv, I
have a try testing. For example with below change:
+ switch (rcode)
+ {
+ case VEC_DUPLICATE:
+ *total += get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS
(1);
+ break;
+
But this is not that good enough here if my understanding is correct.
As vmv.v.x is somehow equivalent to vec_dup but doesn't ref GR2VR,
But it should. Can't we do something like:
if (riscv_v_ext_mode_p (mode))
{
switch (GET_CODE (x))
{
case VEC_DUPLICATE:
Make sense to me, it looks like the combine will always take place if GR2VR
is 0, 1 or 2 for now.
I am try to customize the cost here to make it fail to combine but get failed
with below change.
+ if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 0 {
+cost_val = 1;
+
Ah, I see, thanks. So vec_dup costs 1 + 2 and vadd.vv costs 1 totalling 4
while vadd.vx costs 1 + 2, making it cheaper?
Yes, looks we need to just assign the GR2VR when vec_dup. I also tried diff
cost here to see
the impact to late-combine.
+ if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (
These patches LGTM from myside. But please wait for other folks to comment.
The series LGTM as well. But please wait with merging until GCC 15.1 is
released (as requested by the release maintainers).
--
Regards
Robin
The only thing I think we want for the patch (as Pan also raised last time) is
the param to set those .vx costs to zero in order to ensure the tests test the
right thing (--param=vx_preferred/gr2vr_cost or something).
I see, shall we start a new series for this? AFAIK, we may need some more
al
/* TODO: We set RVV instruction cost as 1 by default.
Cost Model need to be well analyzed and supported in the future. */
+ int cost_val = 1;
+ enum rtx_code rcode = GET_CODE (x);
+
+ /* Aka (vec_duplicate:RVVM1DI (reg/v:DI 143 [ x ])) */
+ if (rcode == VEC_DUPLICATE && SCALAR_INT_MO
Hi Pan,
I am not sure if we have some options additional to below, like
-march=generic,
to ensure that the late-combine will take action as expected in testcases.
+/* { dg-options "-march=rv64gcv -mabi=lp64d" } */
I haven't gone through the rest yet (will take some more days) but yes, I
agr
The solution is to filter out abnormal edges from getting into LCM at
all. Existing invalid_opt_bb_p () has such checks for BB predecessors
but not for successors which is what the patch adds.
OK.
--
Regards
Robin
Hi,
in categorize_ctor_elements_1 we do
VECTOR_CST_NELTS (value).to_constant ()
but VALUE's type can be a VLA vector (since r15-5780-g17b520a10cdaab).
This patch uses constant_lower_bound instead.
Bootstrapped and regtested on x86, aarch64, and power 10.
Regtested on rv64gcv_zvl512b.
Regards
Tested with compilation of x86_64-linux -> riscv64-linux cross,
ok for trunk?
Yes.
--
Regards
Robin
I see, reverted. Thanks Robin for reminder.
Thanks!
BTW and just for open discussion, is this really a good way for such kind of
tests?
Though most of the tests are similar like this but it may hide possible
unexpected results up to a point.
Yeah we have several flaky tests and in those cas
Hi,
when lifting up a vsetvl into a block we currently don't consider the
block's transparency with respect to the vsetvl as in other parts of the
pass. This patch does not perform the lift when transparency is not
guaranteed.
This condition is more restrictive than necessary as we can still
pe
Hi Pan,
Richard committed combine patches that restored most of the previous behavior
so we shouldn't need the refinement any more.
AFAICT the tests should now pass in their previous state but definitely fail in
their current state. Do you want to revert this change?
Thanks.
--
Regards
Robi
Hi,
as usual, I forgot to add -mabi=lp64d to the test case. This patch adds
it. Going to push as obvious.
Regards
Robin
gcc/testsuite/ChangeLog:
* g++.target/riscv/rvv/autovec/pr116595.C: Add -mabi.
---
gcc/testsuite/g++.target/riscv/rvv/autovec/pr116595.C | 2 +-
1 file changed, 1 in
On 4/8/25 16:32, Vineet Gupta wrote:
Yay ! It does work. Awesome.
I've uploaded the further reduced test to PR/119533
Hmm, I'm seeing the same ICE as before with my patch. Did you happen to change
something else on your local tree still?
Yeah I had some debug stuff lying around. In particular
Yay ! It does work. Awesome.
I've uploaded the further reduced test to PR/119533
Hmm, I'm seeing the same ICE as before with my patch. Did you happen to change
something else on your local tree still?
On top, I'm now seeing a ton of vsetvl test failures vs just the one I
reported... No ide
Yay ! It does work. Awesome.
I've uploaded the further reduced test to PR/119533
Hmm, I'm seeing the same ICE as before with my patch. Did you happen to change
something else on your local tree still?
On top, I'm now seeing a ton of vsetvl test failures vs just the one I
reported... No ide
Yay ! It does work. Awesome.
I've uploaded the further reduced test to PR/119533
Hmm, I'm seeing the same ICE as before with my patch. Did you happen to change
something else on your local tree still?
--
Regards
Robin
Hi Vineet,
However we still see lift up using those blocks - the earliest set computed
contained the supposedly elided bbs.
Try lift up 0.
earliest:
Edge(bb 16 -> bb 17): n_bits = 3, set = {1 }
Try lift up 1.
earliest:
Edge(bb 15 -> bb
Hi,
before lifting up a vsetvl (that saves VL in a register) to a block we
need to ensure that this register is not live in the block. Otherwise
we would overwrite the register. There is some conceptual similarity to
LCM's transparency property (or ANTLOC) which deals with overwriting
an expres
Hi,
since r15-9062-g70391e3958db79 we perform vector bitmask initialization
via the vec_duplicate expander directly. This triggered a latent bug in
ours where we missed to mask out the single bit which resulted in an
execution FAIL of pr119114.c
The attached patch adds the 1-masking of the broa
Note it's not quite "whatever" -- there is a constraint that vl be
monotonically nonincreasing, which in some cases is the only important
property. No denying this is an annoyance, though.
Yes, I was hoping the smiley would convey that "whatever" was not to be taken
literally. In terms of SC
Some of the tests regressed with a fix for the vectorization of
shifts. The riscv cost models need to be adjusted to avoid the
unprofitable optimization. The failure of these tests has been known
since 2024-03-13, without a forthcoming fix, so I suggest we consider
it expected by now. Adjust th
Yeah...and I also don't like the magic "ceil(AVL / 2) ≤ vl ≤ VLMAX if
AVL < (2 * VLMAX)" rule...
+1, spec has some description about this but I am not sure if I really get the
point.
From Spec:
"For example, this permits an implementation to set vl = ceil(AVL
/ 2) for VLMAX <
LGTM (even though I still don't like the spec :D).
We still have an implicit assumption in riscv-vsetvl.cc that might modify LMUL:
In prev_ratio_valid_for_next_sew_p and next_ratio_valid_for_prev_sew_p we check
whether the ratio of two LMULs is <= 8. ISTR that with recent changes we only
re-u
So may be the way to go is add a field to the uarch tuning structure
indicating the additional cost (if any) of a register file crossing vector op
of this nature. Then query that in riscv_rtx_costs or whatever our rtx_cost
function is named.
Default that additional cost to zero initially. Th
Hi Paul-Antoine,
This pattern enables the combine pass to merge a vec_duplicate into a plus-mult
or minus-mult RTL instruction.
Before this patch, we have two instructions, e.g.:
vfmv.v.fv6,fa0
vfmadd.vv v9,v6,v7
After, we get only one:
vfmadd.vf v9,fa0,v7
On SPEC201
This does not only happen on ELEN=32 and VLEN=32, it happened on all
ELEN=32 arch, and one of our internal configurations hit this...
Wait, is there something I keep missing? There must be I guess.
Disregarding the SEW=8 case because that one is clear, but take for example:
ENTRY (RVVMF4HI,
zve32x_zvl64b will have the same requirement as zve32x_zvl32b,
I mean e16,mf4 could be allowed on zve32x_zvl64b, but it also spec
conformance
if implementation decides to raise an illegal instruction on e16,mf4, which
means
e16,mf4 is not safe to use on zve32x/zve32f.
OK I see, thanks. Sometime
- "TARGET_VECTOR"
+ "TARGET_VECTOR && 0"
Would you mind adding a comment here before committing, maybe even reference
the PR? Not that we want to keep this around for long anyway but just to make
sure :)
--
Regards
Robin
Sorry Kito, that we're having so much back and forth here, it's not my
intention to block anything (not that I could anyway). I just want to
make sure I properly understand the rationale (or the spec, rather).
Oh, ok, I got the point why you confused on this, the new condition is
little bit `i
Hi Kito,
So valid range fractional LMUL for SEW=8, 16 32 are:
mf8 = [8, (1/8)*32] = [8, 4] = [], no SEW is valid with mf8 for ELEN = 32
mf4 = [8, (1/4)*32] = [8, 8] = only SEW 8 with mf4 is valid
mf2 = [8, (1/2)*32] = [8, 16] = SEW 8 and 16 with mf2 are valid
[1]
https://github.com/riscvarchi
Hi,
since updating to Fedora 41 I have been seeing ignored python exceptions
like the following when using 'git gcc-verify' =
contrib/gcc_changelog/git_check_commit.py.
Checking 90fcc1f4f1a5537e8d30628895a07cbb2e7e16ff: OK
Exception ignored in:
Traceback (most recent call last):
File "/usr/l
I'm not opposed to refactoring but what's the reason for it? We have a large
number of similar tests that also include all possible types. And aren't all
the tests you touch FAILing anyway right now? (Due to the combine change...)
Yes, the cond_widen_complicate-3 need some tweak for the asm
From: Pan Li
Rearrange the test cases of cond_widen_complicate-3 by different types
into different files, instead of put all types together. Then we can
easily reduce the range when asm check fails.
I'm not opposed to refactoring but what's the reason for it? We have a large
number of simil
Hi,
in the somewhat convoluted vector code of PR119114 we extracting
a mask value from a vector mask. After some
middle-end simplifications we end up with a value of -2. Its
lowest bit is correctly unset representing "false".
When initializing a bitmak vector from values we compare the full
va
Hi,
in PR119115 we end up with an orphaned
vsetvli zero,t1,e16,m1,ta,ma.
t1 originally came from another vsetvl that was fused from
vsetvli a4,a3,e8,mf2,ta,ma
vsetvli t1,a3,e8,mf2,ta,ma (1)
to
vsetvli zero,a3,e16,m1,ta,ma.
This patch checks if t1, the VL operand
Hi,
when merging two vsetvls that both only demand "SEW >= ..." we
use their maximum SEW and keep the LMUL. That may lead to invalid
vector configurations like
e64, mf4.
As we make sure that the SEW requirements overlap we can use the SEW
and LMUL of the configuration with the larger SEW.
Ma J
Hi Jin,
I apologize for the delayed response. I spent quite a bit of time trying to
reproduce
the case, and given the passage of time, it wasn't easy to refine the testing.
Fortunately, you can see the results here.
https://godbolt.org/z/Mc8veW7oT
Using GCC version 14.2.0 should allow you to
Yeah I didn't know how to articulate it (and perhaps this still requires
clarification)
Say we have following
// reduced version of gcc.target/riscv/rvv/base/float-point-frm-run-1.c
main
set_frm (4); // orig global FRM update
test_float_point_frm_run_1 (op1, op2, vl)
set_fr
LGTM.
--
Regards
Robin
What we could do is
prev.set_ratio (calculate_ratio (prev.get_sew (), prev.get_vlmul ()));
prev.set_vlmul (calculate_vlmul (prev.get_sew (), prev.get_ratio ()));
No, that also doesn't work because the ratio can be invalid then.
We fuse two vsetvls. One of them has a larger SEW which w
Okay, let me explain the background of my previous patch.
Prior to applying my patch, for the test case bug-10.c (a reduced example of
a larger program with incorrect runtime results),
the vsetvli sequence compiled with --param=vsetvl-strategy=simple was as
follows:
1. vsetvli zero,a4,e16,m4,ta
It seems the issue is we didn't set "vlmul" ?
Can we do that:
int max_sew = MAX (prev.get_sew (), next.get_sew ());
prev.set_sew (max_sew);
prev.set_vlmul (calculate_vlmul (...));
prev.set_ratio (calculate_ratio (prev.get_sew (), prev.get_vlmul ()));
What we could do is
prev.set_ratio (cal
This patch modifies the sequence:
vsetvli zero,a4,e32,m4,ta,ma + vsetvli zero,a4,e8,m2,ta,ma
to:
vsetvli zero,a4,e32,m8,ta,ma + vsetvli zero,zero,e8,m2,ta,ma
Functionally, there is no difference. However, this change resolves the
issue with "e64,mf4", and allows the second vsetvli to omit a4, wh
Hi,
when merging two vsetvls that both only demand "SEW >= ..." we
use their maximum SEW and keep the LMUL. That may lead to invalid
vector configurations like
e64, mf4.
As we make sure that the SEW requirements overlap we can use the SEW
and LMUL of the configuration with the larger SEW.
Ma J
Sure thing, will send the v5 for CI system and commit it if no surprise.
BTW, shall we plan some refactor for expand_const_vector in next stage 1, which
grows to more than 500 lines and unfriendly for debugging up to a point.
Yeah, sounds very reasonable.
--
Regards
Robin
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-options "-O3 -march=rv64gcv -flto -mrvv-vector-bits=zvl" } */
Ah, the CI flagged the test in previous versions. It's missing the usual
-mabi=... I keep forgetting this...
--
Regards
Robin
Hi Pan,
+ poly_int64 base1_poly = rtx_to_poly_int64 (base1);
+ bool overflow_smode_p = false;
+
+ if (!step1.is_constant ())
+ overflow_smode_p = true;
+ else
+ {
+ int elem_count = XVECLEN (src, 0);
+ uint64_t step1_val
If you mean the last branch of interleave, I think it is safe because it
leverage the
merge to generate the result, instead of IOR. Only the IOR for final result have
this issue.
Yep, I meant checking overflow before the initial if
if (known_ge (step1, 0) && known_ge (step2, 0)
1 - 100 of 1256 matches
Mail list logo