Hi Lehua,
> May I ask if the compiler options "-march=rv64gcv_zvfh -mabi=lp64d"
> should be removed? Because we don't specify -march and -mabi when we
> run testcase (so but why we need to specify the -march and -mabi in
> this target check?), we run it with the default values. Assuming that
> t
OK.
Regards
Robin
LGTM.
Regards
Robin
> Thanks for giving it a go. Can you post the latest version of the
> regpressure patch too? The previous on-list version I could find
> seems to be too old.
Oh, sure, attached. Apologies, I added the regpressure_same_class
convenience helper but forgot to re-send it.
Regards
Robin
>From d3f
I have an almost identical patch locally that passed testing as well
but didn't get around to posting it yet. Therefore LGTM.
Regards
Robin
Thanks for looking at it in detail.
> Yeah, I think this is potentially a blocker for propagating A into B
> when A is used elsewhere. Combine is able to combine A and B while
> keeping A in parallel with the result. I think either fwprop would
> need to try that too, or it would need to be rest
Hi,
found in slp-reduc-7.c, this patch prevents optimizing e.g.
COND_LEN_ADD ({-1, ... }, a, 0, c, len, bias)
unconditionally into just "a".
Currently, we assume that COND_LEN operations can be optimized similarly
to COND operations. As the length is part of the mask (and usually not
compile-ti
Hi,
on riscv gcc.dg/pr70252.c ICEs at gimple-isel.cc:283. This is because
we created the gimple statement
mask__7.36_170 = VEC_COND_EXPR ;
during vrp2.
What happens is that, starting with
maskdest = (vec_cond mask1 1 0) >= (vec_cond mask2 1 0)
we fold to
maskdest = mask1 >= (vec_cond (ma
Hi Juzhe,
glad that we can use the dominator info directly. Could we move the
calculation of the info to the beginning (if it's not available)? That
makes it clearer that it's a prerequisite. Function comments look
good now.
Some general remarks kind of similar to v1:
- I would prefer a hash
Hi,
as Juzhe noticed in gcc.dg/pr92301.c there was still something missing in
the last patch. The attached v2 makes sure we always have a COND_LEN operation
before returning true and initializes len and bias even if they are unused.
Bootstrapped and regtested on aarch64 and x86.
Regards
Robin
Hi Juzhe,
> +max_number_of_live_regs (const basic_block bb,
> + const hash_map &live_ranges,
> + unsigned int max_point, machine_mode biggest_mode,
> + int lmul)
> +{
> + unsigned int max_nregs = 0;
> + unsigned int i;
> + unsigned
I did some benchmarks and, at least for calculix the differences are
miniscule. I'd say we can stick with the current approach and improve
as needed.
However, I noticed ICEs here:
+ gcc_assert (biggest_size >= mode_size);
and here:
+ mode = TYPE_MODE (TREE_TYPE (lhs));
when compiling calcul
> Is calculix big ?
It's 7 nested for loops IIRC and, when unrolling, can get pretty nasty.
I tested with -Ofast -funroll-loops. I think wrf is even larger, maybe I
can run a full comparison test tonight to have good coverage.
> Could you give me the testcase to reproduce it?
OK, I will try to
> This is first version of dynamic LMUL.
> I didn't test it with full GCC testsuite.
>
> My plan is to first pass all GCC testsuite (including vect.exp) with default
> LMUL = M1.
> Then enable dynamic LMUL to test it.
>
> Maybe we could tolerate this ICE issue for now. Then we can test it
> wi
LGTM. We should just keep in mind the restrictions discussed in the
other thread.
Regards
Robin
Maybe you want to add PR target/111337 to the changelog?
The rest LGTM.
Regards
Robin
The PR thing needs to be moved but I can commit it.
Regards
Robin
The current status (for rv64gcv) is:
=== gcc tests ===
Running target unix/-march=rv64gcv
XPASS: gcc.dg/vect/bb-slp-subgroups-3.c -flto -ffat-lto-objects
scan-tree-dump-times slp2 "optimized: basic block" 2
XPASS: gcc.dg/vect/bb-slp-subgroups-3.c scan-tree-dump-times slp2 "optim
> Most (all?) of those are due to:
> f951: Warning: command-line option '-Wno-psabi' is valid for
> C/C++/D/LTO/ObjC/ObjC++ but not for Fortran
> so no real bug.
When pushing this, I'd take the liberty of enabling the recently merged vector
ABI so we don't require -Wno-psabi anymore. All Fortran
> -(define_expand "vec_extract"
> +(define_expand "@vec_extract"
Do we need the additional helper function? If not let's rather not
add them for build-time reasons. The rest is OK, no need for v2.
Regards
Robin
> Yes. We need the additional helper function since I will cal emit_insn
> (gen_vec_extract (mode, mode)
> in the following patch which fixes PR111391 ICE.
OK.
Regards
Robin
> I am thinking what we are doing is something like we are allowing
> scalar mode within the vector register, so...not sure should we try to
> implement that within the mov pattern?
>
> I guess we need some inputs from Jeff.
Sorry for the late response. I have also been thinking about this and
i
Hi Thomas, Jakub,
is there anything we can do to assist from the riscv side in order to help
with this? I haven't really been involved with it but was wondering
what's missing. If I understand correctly Thomas has a major cleanup
operation in plan but might not get to it soon. The fix he propos
> You mean this patch is ok?
I thought about it a bit more. From my point of view the patch is OK
for now in order to get the bug out of the way.
In the longer term I would really prefer a more "regular" solution
(i.e. via hard_regno_mode_ok) and related. I can take care of that
once I have a b
> I must be missing something. Doesn't insn 10 broadcast the immediate
> 0x2 to both elements of r142?!? What am I missing?
It is indeed a bit misleading. The difference is in the mask which
is not displayed in the short form. So we actually use a vec_dup
for a single-element move, essentially
Ping.
Regards
Robin
LGTM.
Regards
Robin
Hi Lehua,
> +(define_expand "vcond_mask_"
> + [(set (match_operand:V_VLS 0 "register_operand")
> +(if_then_else:V_VLS
> + (match_operand: 3 "register_operand")
> + (match_operand:V_VLS 1 "nonmemory_operand")
> + (match_operand:V_VLS 2 "vector_register_or_const_0
> $ riscv64-unknown-linux-gnu-gcc
> --param=riscv-autovec-preference=fixed-vlmax
> gcc/testsuite/gcc.target/riscv/rvv/base/spill-10.c -O2 -march=rv64gcv
> -S
> ../riscv-gnu-toolchain-trunk/riscv-gcc/gcc/testsuite/gcc.target/riscv/rvv/base/spill-10.c:
> In function 'stach_check_alloca_1':
> ../riscv
> Can you give more comments about Robin's opinion that he want to change into
> "fixed" vs "varying" or "fixed vector size" vs "dynamic vector size" ?
It's not necessary to decide on this now as --params are not supposed
to be stable and can be changed quickly. I was just curious if this had
alr
> ../../gcc/config/riscv/generic.md:28:1: unknown value `smin' for attribute
> `type'
> make[3]: *** [Makefile:2528: s-attrtab] Error 1
>
>From 582c428258ce17ffac8ef1b96b4072f3d510480f Mon Sep 17 00:00:00 2001
From: Robin Dapp
Date: Fri, 21 Apr 2023 09:38:06 +0200
Subject: [PATCH] riscv: Fix f
Hi Michael,
I have the diff below for the binops in my tree locally.
Maybe something like this works for you? Untested but compiles and
the expander helpers would need to be fortified obviously.
Regards
Robin
--
gcc/ChangeLog:
* config/riscv/autovec.md (3): New binops expander.
Hi,
I figured I'm going to start sending some patches that build on top
of the upcoming RISC-V autovectorization. This one is obviously
not supposed to be installed before the basic support lands but
it's small enough that it shouldn't hurt to send it now.
This patch allows vector constants in r
Hi Juzhe,
I wasn't yet able to check this locally so just some minor comment nits:
> +/* Return the vectorization machine mode for RVV according to LMUL. */
> +machine_mode
> +preferred_simd_mode (scalar_mode mode)
> +{
> + /* We only enable auto-vectorization when TARGET_MIN_VLEN < 128 &&
> +
Hi,
this patch adds LEN_LOAD/LEN_STORE support for z14 and newer.
It defines a bias value of -1 and implements the LEN_LOAD and LEN_STORE
optabs.
It also includes various vll/vstl testcases adapted from Kewen Lin's patch
for Power.
Bootstrapped and regtested on z13-z16.
Is it OK?
Regards
Robi
Hi,
the gimplefe40 and gimplefe41.c tests expect vector capabilities
(vect_float etc.) yet are not in the vect subdirectory. This causes
both to be called unconditionally without prior target-specific vector
setup normally performed by vect/vect.exp.
There is a target-specific option for po
Do the testcases currently fail? How? In principle moving to vect/
is OK but then having the gimplefe testcases in one place is nice ...
yes, they ICE on targets that do not have vector capabilities:
gimplefe-40.c:7:1: internal compiler error: in emit_move_insn, at
expr.c:3821
Normally we
Hi,
checking for an osc break is somewhat brittle especially with many
passes potentially introducing new insns and moving them around.
Therefore, only run the test with -O1 -fschedule-insns in order to limit
the influence of other passes.
Is it OK?
Regards
Robin
--
gcc/testsuite/ChangeLog:
Hi,
while evaluating another patch that introduces more lvalue paradoxical
subregs I ran into an ICE in combine at
wide_int o = wi::insert (rtx_mode_t (outer, temp_mode),
rtx_mode_t (inner, dest_mode),
offset, widt
When if-converting multiple SETs and we encounter a swap-style idiom
if (a > b)
{
tmp = c; // [1]
c = d;
d = tmp;
}
ifcvt should not generate a conditional move for the instruction at
[1].
In order to achieve that, this patch goes through all relevant SETs
and marks
Currently we only ever call emit_conditional_move with the comparison
(as well as its comparands) we got from the jump. Thus, backends are
going to emit a CC comparison for every conditional move that is being
generated instead of re-using the existing CC.
This, combined with emitting temporaries
Hi,
finally I got around to reworking this patch set for
if conversion so it is rather a v3 already. Since so
much time has passed, I'm not replying to the old thread.
There are several problems with noce_convert_multiple
and most are due to the instructions costs not being
properly counted:
-
Following up on the previous patch, this patch makes
noce_convert_multiple emit two cmov sequences: The same one as before
and a second one that tries to re-use the existing CC. Then their costs
are compared and the cheaper one is selected.
---
gcc/ifcvt.c | 94 ++
Add new s390-specific tests that check if we convert two SETs into two
loads on condition. Remove the s390-specific target-check in
gcc.dg/ifcvt-4.c.
---
gcc/testsuite/gcc.dg/ifcvt-4.c| 2 +-
.../gcc.target/s390/ifcvt-two-insns-bool.c| 39 +++
.../gcc.target/s
This lifts the restriction of not allowing constants for
noce_convert_multiple. The code later checks if a valid sequence
is produced anyway.
---
gcc/ifcvt.c | 13 -
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index eef6490626a..6006055f26a
When noce_convert_multiple is called the original costs are not yet
initialized. Therefore, up to now, costs were only ever unfairly
compared against COSTS_N_INSNS (2). This would lead to
default_noce_conversion_profitable_p () rejecting all but the most
contrived of sequences.
This patch tempor
---
gcc/config/s390/s390.c | 2 +-
gcc/config/s390/s390.md | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 6bbeb640e1f..e3b1f99669f 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -3636,7 +3636,7 @@
Hi Richard,
I incorporated all your remarks (sorry for the hunk from a different
branch) except for this one:
> Think it would be better to make it:
>
> if (use_bias_adjusted_len)
> {
> gcc_assert (i == 0);
>
> But do we need to do this? Code should only care abou
Hi,
I included the outstanding minor remarks and believe everything is OK'ed
now. Still posting the ChangeLogs that I omitted before continuing.
I'd expect some fallout on other targets (hopefully nothing major) since
rtx costs are handled differently now for this code path.
Regards
Robin
--
Posting the ChangeLog before pushing.
--
gcc/ChangeLog:
* ifcvt.c (noce_convert_multiple_sets): Allow constants.
(bb_ok_for_noce_convert_multiple_sets): Likewise.
Posting the ChangeLog before pushing.
--
gcc/ChangeLog:
* ifcvt.c (bb_ok_for_noce_convert_multiple_sets): Estimate insns costs.
(noce_process_if_block): Use potential costs.
Posting the ChangeLog before pushing.
--
gcc/ChangeLog:
* rtl.h (struct rtx_comparison): New struct that holds an rtx
comparison.
* config/rs6000/rs6000.c (rs6000_emit_minmax): Use struct instead of
single parameters.
(rs6000_emit_swsqrt): Likewise.
Posting the ChangeLog before pushing.
--
gcc/ChangeLog:
* ifcvt.c (cond_exec_get_condition): New parameter to allow getting the
reversed comparison.
(try_emit_cmove_seq): New function to facilitate creating a cmov
sequence.
(noce_convert_multiple_sets): Cr
Posting the ChangeLog before pushing.
--
gcc/testsuite/ChangeLog:
* gcc.dg/ifcvt-4.c: Remove s390-specific check.
* gcc.target/s390/ifcvt-two-insns-bool.c: New test.
* gcc.target/s390/ifcvt-two-insns-int.c: New test.
* gcc.target/s390/ifcvt-two-insns-long.c: New t
Posting the ChangeLog before pushing.
--
gcc/ChangeLog:
* ifcvt.c (noce_convert_multiple_sets_1): New function.
(noce_convert_multiple_sets): Call function a second time if we can
improve the first try.
Hi Richard,
> I think it would be better to fold this into the existing documentation
> a bit more:
[..]
done.
Fixed the remaining nits in the attached v5.
Bootstrap and regtest are good on s390x, Power9 and i386.
Regards
Robin
--
gcc/ChangeLog:
* config/rs6000/vsx.md: Use const0 b
Hi Richard,
this causes a bootstrap error on s390 (where
IRA_HARD_REGNO_ADD_COST_MULTIPLIER is defined). rclass is used in the
#define-guarded area. I guess you also wanted to move this to the new
function ira_caller_save_cost?
Regards
Robin
--
../../gcc/ira-costs.c: In function ‘void ira_tun
Hi Richard,
> Could you try the attached?
build and bootstrap look OK with it. Testsuite shows lots of fallout
but the proper bisect isn't finished yet. The commit before your series
is still fine - the problem could also be after it, though. Will report
back later.
Thanks
Robin
> Could you try the attached?
The series with the patch is OK from a testsuite point of view. The
other problem appears later.
Regards
Robin
>> I think we can CC IBM folks to see whether we can make WHILE_LEN works
>> for both IBM and RVV ?
>
> I've CCed them. Adding WHILE_LEN support to rs6000/s390x would be
> mainly the "easy" way to get len-masked (epilog) loop support. I've
> figured actually implementing WHILE_ULT for AVX512 in
401 - 460 of 460 matches
Mail list logo