> So where do we stand with this? Juzhe asked it to be rebased, but I
> don't see a rebased version in my inbox and I don't see anything that
> looks like this on the trunk.
I missed this one and figured as we're pretty late in the cycle it can
wait until GCC 15. Therefore let's call it "deferre
LGTM, thanks.
Regards
Robin
> +static bool
> +vsetvl_pre_insn_p (rtx_insn *rinsn)
> +{
> + return recog_memoized (rinsn) >= 0
> + && get_attr_type (rinsn) == TYPE_VSETVL_PRE;
> +}
Indent looks off on my screen. Can you check?
Apart from that LGTM (no need for v2 of course).
Regards
Robin
> Testing is running. Ok for trunk if I passed the testing with no
> regression ?
OK.
Regards
Robin
> The root cause is this following RTL pattern, after fwprop1:
>
> (insn 82 78 84 9 (set (reg:DI 230)
> (sign_extend:DI (minus:SI (subreg/s/v:SI (reg:DI 150 [ niters.10 ]) 0)
> (subreg:SI (reg:DI 221) 0 13 {subsi3_extended}
> (expr_list:REG_EQUAL (sign_extend:DI (p
OK.
Regards
Robin
Hi,
scalar loads provide offset addressing while unit-stride vector
instructions cannot. The offset must be loaded into a general-purpose
register before it can be used. In order to account for this, this
patch adds an address arithmetic heuristic that keeps track of data
reference operands. If
On 2/14/24 20:46, Edwin Lu wrote:
> The testcase pr113742.c is failing for 32 bit targets due to the following cc1
> error:
> cc1: error: ABI requries '-march=rv64'
I think we usually just add exactly this to the test options (so
it is always run rather than just on a 64-bit target.
Regards
Robi
> Ah oops I glanced over the /* { dg-do compile } */part. It should be
> fine to add '-march=rv64gc' instead then?
Hmm it's a bit tricky. So generally -mcpu=sifive-p670 includes rv64
but it does not override a previously specified -march=rv32 (that might
have been added by the test harness or the
OK.
Regards
Robin
> For calling-convention-*.c, LGTM but one nit about change log. Take
> **Update** here may make others not easy to learn what you did about
> the file. You can say similar to "Rearrange and adjust the
> asm-checker times" or likewise. Of course, you can refine the
> changelog when commit.
>> * g
Hi,
for a vec_init (_a, _a, _a, _a) with _a of mode DImode we try to
construct a "superword" of two "_a"s. This only works for modes < Pmode
when we can "shift and or" two halves into one Pmode register.
This patch disallows the optimization for inner_mode == Pmode and emits
a simple broadcast in
> +/* { dg-final { scan-assembler-times "vmv\.v\.i\tv\[0-9\],0" 0 } } */
>
> I think you should use "scan-assembler-not"
Thanks, going to commit with that change.
Regards
Robin
On 2/24/24 00:10, Edwin Lu wrote:
> Given the recent change with adding the scheduler pipeline descriptions,
> many scan-dump failures emerged. Relax the expected assembler output
> conditions on the affected tests to reduce noise.
I'm not entirely sure yet about relaxing the scans like this.
Ther
Hi,
This has been sitting on my local tree - I've been wanting to post it
for a while but somehow forgot.
This patch makes segment loads and stores more expensive. It adds
segment_load and segment_store cost fields to the common vector costs
and adds handling to adjust_stmt_cost. In the future
> This patch looks odd to me.
> I don't see memrefs in the trunk code.
It's on top of the vle/vse offset handling patch from
a while back that I haven't committed yet.
> Also, I prefer list all cost in cost tune info for NF = 2 ~ 8 like ARM SVE
> does:
I don't mind having separate costs for each
> I suggest specify -fno-schedule-insns to force tests assembler never
> change for any scheduling model.
We already do that and that's the point - as I mentioned before, no
scheduling is worse than default scheduling here (for some definition
of worse). The way to reduce the number of vsetvls is
> I think it makes more sense to remove the whole
> --param=riscv-autovec-preference since we should use
> -fno-tree-vectorize instead of --param=riscv-autovec-preference=none
> which is more reasonable compile option for users.
>
> --param is just a internal testing option that we added before,
On 2/29/24 02:38, Li, Pan2 wrote:
>> So it's going to check if V2SF can be tied to DI and V4QI with SI. I
>> suspect those are going to fail for RISC-V as those aren't tieable.
>
> Yes, you are right. Different REG_CLASS are not allowed to be tieable in
> RISC-V.
>
> static bool
> riscv_modes_
Hi Joshua,
> For th.vmadc/th.vmsbc as well as narrowing arithmetic instructions
> and floating-point compare instructions, an illegal instruction
> exception will be raised if the destination vector register overlaps
> a source vector register group.
>
> To handle this issue, we use "group_overla
Hi Juzhe,
> The reason we want to switch to generic vector cost model is the default
> cost model generates inferior codegen for various benchmarks.
>
> For example, PR113247, we have performance bug that we end up having over 70%
> performance drop of SHA256. Currently, no matter how we adapt c
LGTM.
Regards
Robin
On 1/10/24 15:40, 钟居哲 wrote:
> I need to add these costs for segment load/stores:
>
> /* Generic costs for VLA vector operations. */
> static const scalable_vector_cost generic_vla_vector_cost = {
> {
> 1,/* int_stmt_cost */
> 1,/* fp_stmt_cost */
> 1,/* gather_load_cost */
>
> Current generic cost model makes dynamic-lmul2-7.c generate inferior codegen.
>
> I found if I tweak the cost a little bit then dynamic-lmul2-7.c codegen can
> be recovered.
> However, it makes other tests failed
> It's complicated story
Ok, makes sense. So the plan seems to be:
(1)
LGTM.
Regards
Robin
Hi Edwin,
> This patch copies the vector reservations from generic-ooo.md and
> inserts them into generic.md and sifive.md. Creates new vector crypto related
> insn reservations.
In principle, the changes look good to me but I wonder if we could
split off the vector parts from generic-ooo into th
> Since all the pipelines should be tuned to their cost model, they
> would be different anyway. If it would be simpler for now, I could
> separate the files out.
> I think I'm getting a bit confused. Is there a reason why we would
> want to exchange scheduler descriptions like the example you
> pr
LGTM now, thanks. I find it much more readable that way.
Regards
Robin
On 1/11/24 10:46, juzhe.zh...@rivai.ai wrote:
> Oh. I see I think I have done wrong here.
>
> I should adjust cost for VEC_EXTRACT not VEC_SET.
>
> But it's odd, I didn't see loop vectorizer is scanning scalar_to_vec
> cost in vect.dump.
The slidedown/vmv.x.s part is of course vec_extract but we
>> The slidedown/vmv.x.s part is of course vec_extract but we indeed
>> don't seem to cost it as vec_to_scalar here.
>
> It looks like a vectorized live operation as it's not in the loop body
> (and thus really irrelevant for costing in practice). This has
>
> /* ??? Enable for loop costi
> Yeah... I just noticed. I should set it as 4 to fix it with biggest VLEN
> size,
> that is, -march=rv64gcv_zvl4096b --param=riscv-autovec-lmul=m8...
>
> I am confused now how to fix this case.
4 is definitely too high compared to a regular instruction.
vmv.vx could even be zero-cost for const
On 1/11/24 11:20, juzhe.zh...@rivai.ai wrote:
> Ok I see your idea and we need to adjust scalar_to_vec accurately. Inside the
> loop we have these 2 scalar_to_vec:
>
> 1. MIN_EXPR 1 times scalar_to_vec costs 1 in prologue
>
> This scalar_to_vec cost should be 0 or 1 since it only generate si
> I think we shouldn't vectorize it with any vlen, since the non-vectorized
> codegen is much better.
> And also, I have tested -msve-vector-bits=2048, ARM SVE doesn't vectorize it.
> -zvl65536b, RVV Clang also doesn't vectorize it.
Of course I agree that optimizing everything to return 0 is
what
> 32872 spends 2 scalar instructions + 1 scalar_to_vec cost:
>
> lia4,-32768
> addiwa4,a4,104
> vmv.v.xv16,a4
>
> It seems reasonable but only can fix test with -march=rv64gcv_zvl256b but
> failed on -march=rv64gcv_zvl4096b.
The scalar version also needs both instructions:
li a0,32
> 1. This patch set scalar_to_vec cost as 2 instead 1 since scalar move
>instruction is slightly more costly than normal rvv instructions (e.g.
> vadd.vv).
We can go with 2 or 3 (if needed) for now but should later
really incorporate reg-move costs in this IMHO. Just like e.g.
static const
> Tested on both RV32/RV64 no regression, Ok for trunk ?
Yes, thanks!
Btw out of curiosity, did you see why we actually fail to
optimize away the VLA loop? We should open a bug for that
I suppose.
Regards
Robin
OK, thanks.
Regards
Robin
LGTM.
Regards
Robin
I gave it another shot now by introducing a separate function as
Richard suggested. It's probably not at the location he intended.
The way I read the discussion there hasn't been any consensus
on how (or rather where) to properly tackle the problem. Any
other ideas still?
Regards
Robin
Found
Hi Juzhe,
the change itself is OK but I don't think we should add binary
files like this. Even if not ideal, if you want to go forward
IMHO let's skip the test for now and add it at a (not much) later
time.
> diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/fortran/spec2017_cam4/ppgrid.mod
>
OK.
Regards
Robin
Ah, interesting that this was it. Thanks for fixing and also
thanks to Andrew for suggesting that fix.
Regards
Robin
Hi Juzhe,
in principle this seems ok to me but I wonder about:
> We shouldn't worry about subreg:...VL_REGNUM since it's impossible
> that we can have such situation,
I think we allow this in legitimize_move for situations like
(subreg:SI (reg:V4QI)). That was not added for correctness but
op
> No, we didn't undo the optimization.
>
> We just disallow move pattern for (set (reg) (VL_REGNUM)).
Ah, what I referred to was the opposite direction. We allow
(subreg:V8QI (reg:DI ...)) which is not touched by this patch.
Then it is OK.
Regards
Robin
LGTM.
Regards
Robin
> SPEC 2017 wrf benchmark expose unreasonble memory usage of VSETVL PASS
> that is, VSETVL PASS consume over 33 GB memory which make use impossible
> to compile SPEC 2017 wrf in a laptop.
>
> The root cause is wasting-memory variables:
LGTM. The new code matches compute_lcm_local_properties mor
Hi,
on Solaris/SPARC several vector tests appeared to be regressing. They
were never vectorized but the checks before r14-3612-ge40edf64995769
would match regardless if a loop was actually vectorized or not.
The refined checks only match a successful vectorization attempt
but are run unconditiona
The non-test parts are OK IMHO.
Regards
Robin
LGTM, thanks.
Regards
Robin
Thanks, that looks better IMHO.
> +;; Copyright (C) 2011-2024 Free Software Foundation, Inc.
> +;; Contributed by Andrew Waterman (and...@sifive.com).
> +;; Based on MIPS target for GNU compiler.
You might want to change that, as well as the date. While at
it you can also fix the broken date in
> Use default cost model scheduling on these test cases. All these tests
> introduce scan dump failures with -mtune generic-ooo. Since the vector
> cost models are the same across all three tunes, some of the tests
> in PR113249 will be fixed with this patch series.
This is OK, thanks.
> 39 addit
>/* If we ever encounter an insn without an insn reservation, trip
> an assert so we can find and fix this problem. */
> -#if 0
> + if (! insn_has_dfa_reservation_p (insn)) {
> +print_rtl(stderr, insn);
> +fprintf(stderr, "%d", get_attr_type (insn));
> + }
>gcc_assert (insn
Hi,
init_all_optabs initializes > 1 patterns for riscv targets. This
leads to pathological situations in dataflow analysis (which can occur
with many adjacent stores).
To alleviate this this patch makes genopinit split the init_all_optabs
function into several init_optabs_xx functions that ea
> I think removing the is_inorder attribute should be ok. I added it
> because I wanted to avoid having two matching insn reservations
> defined since matching solely on the type attribute should also match
> on all subsets as well (i.e. if eventually we add an insn reservation
> checking for type
LGTM.
Regards
Robin
Hi,
in PR113607 we see an invalid fold of
_429 = .COND_SHL (mask_patt_205.47_276, vect_cst__262, vect_cst__262, { 0,
... });
vect_prephitmp_129.51_282 = _429;
vect_iftmp.55_287 = VEC_COND_EXPR ;
to
Applying pattern match.pd:9607, gimple-match-10.cc:3817
gimple_simplified to vect_iftm
> + NULL, /* vector cost */
> +};
Does the P600 series include a vector unit? From what I found on
the web it looks like it. If so I would suggest specifying at least
the default (generic) vector cost model here. We fall back to the
default one for
Hi Juzhe,
in general looks OK to me.
Just a question for understanding:
> - if (header_info.valid_p ()
> - && (anticipated_exp_p (header_info) || block_info.full_available))
Why is full_available true if we cannot use it?
> +/* { dg-do compile } */
It would be nice if we could make
> I don”t choose to run since I didn”t have issue run on my local
> simulator no matter qemu or spike.
Yes it was flaky. That's kind of expected with the out-of-bounds
writes we did. They can depend on runtime environment and other
factors. Of course it's a bit counterintuitive to add a (befo
Thanks. The attached v2 goes with your suggestion and adds a
vec_extractbi expander. Apart from that it keeps the
MODE_PRECISION changes from before and uses
insn_data[icode].operand[0]'s mode.
Apart from that no changes on the riscv side.
Bootstrapped and regtested on x86 and aarch64. On cfar
> Do you mean add some comments in tests?
I meant add it as a run test as well and comment that the test
has caused out-of-bounds writes before and passed by the time of
adding it (or so) and is kept regardless.
Regards
Robin
Thanks, LGTM but please add a comment like:
These test cases used to cause out-of-bounds writes to the stack
and therefore showed unreliable behavior. Depending on the
execution environment they can either pass or fail. As of now,
with the latest QEMU version, they will pass even without the
und
Hi,
this is a bit of a follow up of the latest expmed change.
In extract_bit_field_1 we try to get a better vector mode before
extracting from it. Better refers to the case when the requested target
mode does not equal the inner mode of the vector to extract from and we
have an equivalent tieabl
> It looks like:
>
> FOR_EACH_MODE_FROM (new_mode, new_mode)
> if (known_eq (GET_MODE_SIZE (new_mode), GET_MODE_SIZE (GET_MODE (op0)))
> && known_eq (GET_MODE_UNIT_SIZE (new_mode), GET_MODE_SIZE (tmode))
> && targetm.vector_mode_supported_p (new_mode)
> &&
Hi Juzhe,
in general looks OK.
> + /* We need to use precomputed mask for such situation and such mask
> + can only be computed in compile-time known size modes. */
> + if (GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) == 8 && maybe_ge (vec_len,
> 256)
> + && !vec_len.is_constant ())
> +
On 12/15/23 13:16, juzhe.zh...@rivai.ai wrote:
>
>>> bool indices_fit_selector = maybe_ge (vec_len, 2 << GET_MODE_BITSIZE
>>> (GET_MODE_INNER (vmode)));
> No, I think it will make us miss some optimization.
>
> For example, for poly value [16,16] maybe_ge ([16,16], 65536) which makes us
> miss
> Oh. I think it should be renamed into not_fit.
>
> Is this following make sense to you ?
>
> /* We need to use precomputed mask for such situation and such mask
> can only be computed in compile-time known size modes. */
> bool indices_not_fit_selector_p
> = maybe_ge (vec_len, 2 <
On 12/15/23 13:52, juzhe.zh...@rivai.ai wrote:
> Do you mean :
>
> /* We need to use precomputed mask for such situation and such mask
> can only be computed in compile-time known size modes. */
> bool indices_fit_selector_p
> = GET_MODE_BITSIZE (GET_MODE_INNER (vmode)) > 8 || known_
LGTM.
Regards
Robin
Hi Juzhe,
> -} elseif [istarget riscv64-*-*] {
> +} elseif [istarget riscv*-*-*] {
> if [check_effective_target_riscv_v] {
> lappend DEFAULT_VECTCFLAGS "--param" "riscv-vector-abi"
> set dg-do-what-default run
Yes, that's reasonable. A bit further down we have
LGTM.
Regards
Robin
LGTM.
Regards
Robin
OK.
Regards
Robin
Hi,
found in PR112971, this patch adds folding support for bitwise operations
of const duplicate zero vectors and stepped vectors.
On riscv we have the situation that a folding would perpetually continue
without simplifying because e.g. {0, 0, 0, ...} & {7, 6, 5, ...} would
not fold to {0, 0, 0, .
> 1). We not only have vashl_optab,vashr_optab,vlshr_optab which vectorize
> shift with vector shift amount,
> that is, vectorization of 'a[i] >> x[i]', the shift amount is loop variant.
> 2). But also, we have ashl_optab, ashr_optab, lshr_optab which can vectorize
> shift with scalar shift amoun
> +/* Return true it is whole register-register move. */
> +bool
> +whole_reg_to_reg_move_p (rtx *ops, machine_mode mode)
> +{
> + if (register_operand (ops[0], mode)
> + && register_operand (ops[3], mode)
> + && satisfies_constraint_vu (ops[2])
> + && satisfies_constraint_Wc1 (ops
> > + if (is_gimple_min_invariant (op))
> > + return true;
> > + if (SSA_NAME_IS_DEFAULT_DEF (op)
> > + || !flow_bb_inside_loop_p (loop, gimple_bb (SSA_NAME_DEF_STMT
> (op
> > + return true;
> > + return gimple_uid (SSA_NAME_DEF_STMT (op)) & 1;
> > +}
OK. I'm still a bit unsure about whether SSA_NAME_IS_DEFAULT_DEF
can really occur but if it does it's not wrong to treat it as
loop invariant.
Regards
Robin
> While working on overlap for widening instructions, I realize that we set
> vwadd.wx/vfwadd.wf as earlyclobber which is incorrect.
>
> Since according to RVV ISA:
> "The destination EEW equals the source EEW."
>
> For both vwadd.wx/vfwadd.wf source vector and dest vector operand are same
> EEW
OK.
Regards
Robin
Hi,
this adds vectorized implementations of strcmp and strncmp as well as
strlen. strlen falls back to the previously implemented rawmemchr.
Also, it fixes a rawmemchr bug causing a SPEC2017 execution failure:
We would only ever increment the source address by 1 regardless of
the input type.
The
Yes, OK, thanks for that. CC'ing Juzhe as this is his pass.
Regards
Robin
LGTM
Regards
Robin
Hi,
this fixes a bug in the rawmemchr implementation by incrementing the
source address by vl * element_size instead of just vl.
This is normally harmless as we will just scan the same region more than
once but, in combination with an older qemu version, would lead to
an execution failure in SPEC
Hi,
now split into multiple patches.
In preparation for the vectorized strlen and strcmp support this NFC
patch unifies the stringop strategy handling a bit. The "auto"
strategy now is a combination of scalar and vector and an expander
should try the strategies in their preferred order.
For the
Hi,
this patch implements a vectorized strlen by re-using and slightly
adjusting the rawmemchr implementation. Rawmemchr returns the address
of the needle while strlen returns the difference between needle address
and start address.
As before, strlen expansion is guarded by -minline-strlen.
Whi
Hi,
this patch adds a vectorized strcmp implementation and tests. Similar
to strlen, expansion is still guarded by -minline-strcmp. I just
realized I forgot to make it a series but this one is actually
dependent on the NFC patch and the rawmemchr fix before.
Regards
Robin
gcc/ChangeLog:
Split it into four separate patches now.
Regards
Robin
LGTM.
Regards
Robin
Hi,
this changes the vec_extract path of extract_bit_field to use QImode
instead of BImode when extracting from mask vectors and changes
GET_MODE_BITSIZE to GET_MODE_PRECISION. This fixes an ICE on riscv
where we did not find a vec_extract optab and continued with the generic
code that requires 1
> +(define_mode_attr widen_ternop_dest_constraint [
> + (RVVM8QI "=vd, vr, vd, vr, vd, vr, ?&vr")
> + (RVVM4QI "=vd, vr, vd, vr, vd, vr, ?&vr")
> + (RVVM2QI "=vd, vr, vd, vr, vd, vr, ?&vr")
> + (RVVM1QI "=vd, vr, vd, vr, vd, vr, ?&vr")
> + (RVVMF2QI "=vd, vr, vd, vr, vd, vr, ?&vr")
> + (RVVMF
Hi,
recent -std changes caused testsuite failures. Fix those by adding
-std=gnu99 and -Wno-incompatible-pointer-types.
Going to commit as obvious.
Regards
Robin
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr112552.c: Add
-Wno-incompatible-pointer-types.
*
LGTM.
Regards
Robin
I'd suggest the same thing as in the other patch, i.e. not having
the large number of identical lines in the iterator. That's just
my opinion, though. Rest LGTM.
Regards
Robin
OK.
Regards
Robin
> But how do we know BImode fits in QImode?
I was kind of hoping that a "bit" always fits in a "byte"/unit
but yeah, I guess we don't always know :/
> I think the issue is more that we try to extract an element from
> the mask vector? How is element extraction defined for VLA vectors
> anyway?
Yes, makes sense. Just one clarifying question.
> +{
> + if (GET_MODE_CLASS (vls_mode) != MODE_VECTOR_BOOL
> + && !ordered_p (TARGET_MAX_LMUL * BITS_PER_RISCV_VECTOR,
> + GET_MODE_PRECISION (vls_mode)))
> + /* We enable VLS modes which are aligned with TARGE
> I think the VLS modes are excluded exactly meet we expected.
> For example, when zvl128b, LMUL = 1.
> We allow allow VLS modes <= 128bit, exclude VLS modes > 128bits.
> We have the same behavior as ARM SVE.
I just found the ordered_p a bit unintuitive here at first sight.
But when thinking about
Hi,
PR112854 shows a problem on rv32 with zvl1024b. During the course of
expand_constructor we try to overlay/subreg a 64-element mask by a
scalar (Pmode) register. This works for zvle512b and its maximum of
32 elements but fails for rv32 and 64 elements.
To circumvent this this patch adds a ve
LGTM.
+ /* Don't perform earliest fusion on unrelated edge. */
+ if (bitmap_count_bits (e) != 1)
+ continue;
This could still use a comment why e is "unrelated" in that case
(no v2 needed).
Regards
Robin
101 - 200 of 1264 matches
Mail list logo