https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734
--- Comment #3 from Robin Dapp ---
> probably -fwhole-program is enough, -flto not needed(?)
Yes, -fwhole-program is sufficient.
>
> # vectp_g.248_1401 = PHI
> ...
> _1411 = .SELECT_VL (ivtmp_1409, POLY_INT_CST [2, 2]);
> ..
> vect__19
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734
--- Comment #4 from Robin Dapp ---
Ok, it looks like we do 5 iterations with the last one being length-masked to
length 2 and then in the "live extraction" phase use "iteration 6".
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114714
Robin Dapp changed:
What|Removed |Added
CC||rdapp at gcc dot gnu.org
--- Comment #5 fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734
--- Comment #5 from Robin Dapp ---
What happens is that code sinking does:
Sinking # VUSE <.MEM_1235>
vect__173.251_1238 = .MASK_LEN_LOAD (_911, 32B, { -1, -1, -1, -1 },
loop_len_1064, 0);
from bb 3 to bb 4
so we have
vect__173.251_1238 = .M
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734
Robin Dapp changed:
What|Removed |Added
CC||rguenth at gcc dot gnu.org,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734
--- Comment #8 from Robin Dapp ---
Created attachment 58037
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58037&action=edit
Expand dump
Dump attached. Insn 209 is the problematic one.
The changing from _911 to 1078 happens in internal-f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734
--- Comment #10 from Robin Dapp ---
Yes it helps. Great that get_gimple_for_ssa_name is right below
get_rtx_for_ssa_name that I stepped through several times while debugging and I
didn't realize the connection, g.
But thanks! Good thing i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114196
--- Comment #7 from Robin Dapp ---
I can barely build a compiler on gcc185 due to disk space. I'm going to set up
a cross toolchain (that I need for other purposes as well) in order to test.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #18 from Robin Dapp ---
A bit of a follow-up: I'm working on a patch for reassociation that can handle
the mentioned cases and some more but it will still require a bit of time to
get everything regression free and correct. What it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115104
--- Comment #2 from Robin Dapp ---
Thanks, I was just about to open a PR.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113281
--- Comment #29 from Robin Dapp ---
Just to document again: The test case should not be vectorized and at some
point we will adjust the cost model so it is not going to be. I'd prefer to
base that decision on real uarchs rather than adjust the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340
Bug ID: 115340
Summary: Loop/SLP vectorization possible inefficiency
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tre
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115336
--- Comment #2 from Robin Dapp ---
It looks to me as if we're expecting the result of a gather_load to be zero
when it's masked out (semantics of mask_gather_load) but for
mask_len_gather_load we actually describe it as undefined. Here the mask
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115382
--- Comment #1 from Robin Dapp ---
Would something like this work? The testcase ran successfully with Intel's SME
with that change (and aarch64 qemu with SVE).
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 028692614bb..f9bf6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115382
--- Comment #3 from Robin Dapp ---
For the record - the hunk before bootstrapped and regtested on the cfarm
machines and tested successfully on aarch64 qemu with sve. I still need to set
up a regtest environment with SME.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600
--- Comment #23 from Robin Dapp ---
For the lack of a better idea (and time constraints as looking for compiler
bottlenecks is slow and tedious) I went with Kito's suggestion of splitting
insn-emit.cc
This reduces this part of the compilation w
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600
--- Comment #25 from Robin Dapp ---
At least here locally the maximum I saw was 1.4 GB of RES for insn-emit-10.cc.
That's still not ideal (especially when 8 or 10 of those files compile in
parallel) but at least no 8 GB for a single file anymor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600
--- Comment #26 from Robin Dapp ---
So insn-opinit.cc still takes 2-3 minutes to compile here, even though the file
is not gigantic.
With the same GCC 13.1 x86 host compiler I see:
phase opt and generate : 170.28 ( 99%) 0.75 ( 48
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794
--- Comment #4 from Robin Dapp ---
Just to mention here as well. As this seems ninstance++ where the
adjust_precision thing comes back to bite us, I'm going to go back and check if
the issue why it was introduced (DCE?) cannot be solved differe
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794
--- Comment #5 from Robin Dapp ---
Disregarding the reasons for the precision adjustment, for this case here, we
seem to fail at:
/* We do not handle bit-precision changes. */
if ((CONVERT_EXPR_CODE_P (code)
|| code == VIEW_CONVERT_
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794
--- Comment #7 from Robin Dapp ---
vectp.4_188 = x_50(D);
vect__1.5_189 = MEM [(int *)vectp.4_188];
mask__2.6_190 = { 1, 1, 1, 1, 1, 1, 1, 1 } == vect__1.5_189;
mask_patt_156.7_191 = VIEW_CONVERT_EXPR>(mask__2.6_190);
_1 = *x_50(D);
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794
--- Comment #9 from Robin Dapp ---
Yes, that's from pattern recog:
slp.c:11:20: note: === vect_pattern_recog ===
slp.c:11:20: note: vect_recog_mask_conversion_pattern: detected: _5 = _2 &
_4;
slp.c:11:20: note: mask_conversion pattern rec
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794
--- Comment #10 from Robin Dapp ---
>From what I can tell with my barely working connection no regressions on x86,
aarch64 or power10 with the adjusted check.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791
--- Comment #4 from Robin Dapp ---
This is a scalar popcount and as Kito already noted we will just emit
cpop a0, a0
once the zbb extension is present.
As to the question what is actually being vectorized here, I'm not so sure :D
It looks l
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112109
Bug ID: 112109
Summary: Missing riscv vectorized strcmp (and other) expanders
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Compo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600
--- Comment #30 from Robin Dapp ---
On my machine it is not nearly as bad as insn-emit.cc. What dominates for me
with a GCC 13 host compiler is the already fixed insn-opinit problem.
How long does it take for you (maybe in % of the total build)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111311
--- Comment #10 from Robin Dapp ---
As a general remark: Some of those are present on other backends as well, some
have been introduced by recent common-code changes and some are bogus test
prerequisites or checks. I'm not saying we are in per
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112361
--- Comment #2 from Robin Dapp ---
I can have a look. Of course I tested it but neither the compile farm machine
(gcc188) I used nor my local device have AVX512 run capability. Anywhere else
I can test it?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112363
--- Comment #1 from Robin Dapp ---
This test was introduced in order to check that we correctly "reduce" with -0.0
as neutral element, i.e. a reduction preserves an intial -0.0 and doesn't turn
it into 0.0 by adding 0.0. Kernel aborted means an
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112361
--- Comment #6 from Robin Dapp ---
So "before" we created
vect__3.12_55 = MEM [(float *)vectp_a.10_53];
vect__ifc__43.13_57 = VEC_COND_EXPR ;
// _ifc__43 = _24 ? _3 : 0.0;
stmp__44.14_58 = BIT_FIELD_REF ;
stmp__44.14_59 = r3_29 + stmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112359
--- Comment #2 from Robin Dapp ---
Would something like
+ bool allow_cond_op = flag_tree_loop_vectorize
+&& !gimple_bb (phi)->loop_father->dont_vectorize;
in convert_scalar_cond_reduction be sufficient or are the more conditions to
check
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374
--- Comment #6 from Robin Dapp ---
How does the test suite look without bootstrapping? Are there still new FAILs?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112406
--- Comment #7 from Robin Dapp ---
Ah, thanks, I can reproduce this on the cfarm/gcc185.
We don't expand:
vect__ifc__141.81_358 = .COND_ADD (vect_cst__356,
vect_GetImageChannelMoments_M00_0_lsm.74_338, { 1.0e+0, ... },
vect_GetImageChannelMomen
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112406
--- Comment #8 from Robin Dapp ---
Ah of course it's not the first argument but the mask. During vectorization we
already create
fail1.c:15:10: note: add new stmt: vect__ifc__141.81_358 = .COND_ADD
(vect_cst__356, vect_GetImageChannelMoments_
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112406
--- Comment #9 from Robin Dapp ---
I believe the problem is that in
if (vectype)
vector_type = vectype;
else if (VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (op))
&& VECTOR_BOOLEAN_TYPE_P (stmt_vectype))
vec
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112406
--- Comment #11 from Robin Dapp ---
Thanks, this is helpful.
I have a patch that I just bootstrapped and ran the testsuite with on aarch64.
Going to post it soon, maybe Richi still has a better idea how to work around
this.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112464
--- Comment #1 from Robin Dapp ---
We fail at:
void
vect_finish_replace_stmt (vec_info *vinfo,
stmt_vec_info stmt_info, gimple *vec_stmt)
{
gimple *scalar_stmt = vect_orig_stmt (stmt_info)->stmt;
gcc_assert (gimple
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112464
--- Comment #2 from Robin Dapp ---
I tested
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index a544bc9b059..257fd40793e 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7084,7 +7084,7 @@ vectorize_fold_left_reduc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112464
--- Comment #4 from Robin Dapp ---
Is there another way to make it more robust?
Or does the existing
void
vect_finish_replace_stmt (vec_info *vinfo,
stmt_vec_info stmt_info, gimple *vec_stmt)
{
gimple *scalar_stmt =
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374
--- Comment #11 from Robin Dapp ---
Thanks for figuring that out. No idea if the pattern is the problem, most
likely not? I rather suppose there is still a missing fixup somewhere in the
vectorizer that I didn't encounter with my testing.
So
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112481
Robin Dapp changed:
What|Removed |Added
CC||palmer at dabbelt dot com
--- Comment #10
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112527
Bug ID: 112527
Summary: RVV integer vector instructions generated with
rv64gc_zvfh
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Pr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112527
--- Comment #2 from Robin Dapp ---
Ah, thanks, so it depends on zve32f which implies zve32x. Ok, then all good
and we can close this.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112527
Robin Dapp changed:
What|Removed |Added
Resolution|--- |INVALID
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112531
--- Comment #2 from Robin Dapp ---
Yes, I'd also argue in favor of -fno-tree-vectorize here.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112531
--- Comment #4 from Robin Dapp ---
Personally, I don't mind having some FAILs as long as we know them and
understand the reason for them. I wouldn't insist on "fixing" them but don't
mind if others prefer to have the results "clean". Probably
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112552
--- Comment #7 from Robin Dapp ---
Ah, it's even easier to trigger then. I already have a somewhat working
solution by going with Richi's suggestion and adding the handling for COND_OPs
in vect patterns. Still needs a bit more polishing and te
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374
--- Comment #34 from Robin Dapp ---
(In reply to Jakub Jelinek from comment #29)
> --- gcc/tree-vect-loop.cc.jj 2023-11-14 10:35:52.0 +0100
> +++ gcc/tree-vect-loop.cc 2023-11-15 22:42:32.782007408 +0100
> @@ -4105,9 +4105,9 @@ pop:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374
--- Comment #35 from Robin Dapp ---
What does get rid of the comparison failures in the three last posted reduced
examples is:
gcall *call = dyn_cast (op_use_stmt);
internal_fn ifn;
if (call && gimple_call_internal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374
--- Comment #40 from Robin Dapp ---
(In reply to Jakub Jelinek from comment #37)
[..]
> The above isn't complete, so one just has to guess what you mean outside of
> that, but the above doesn't seem to be correct. There are many internal
> cal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374
--- Comment #46 from Robin Dapp ---
(In reply to Jakub Jelinek from comment #43)
> Now, the patch changed it to allow one extra use in certain cases (but I
> think only on use_stmt, because there should be one use on use_stmt and if
> there is
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374
--- Comment #47 from Robin Dapp ---
And, just to confirm: Testsuite is unchanged on riscv with your patch.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111970
--- Comment #18 from Robin Dapp ---
I did a quick testsuite run on rv32 and can confirm that this fixes the issue
for me.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112583
--- Comment #3 from Robin Dapp ---
I cannot reproduce this either. Just started with binop/* and don't see any
fails locally. Patrick, could you check what caused this?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112406
--- Comment #15 from Robin Dapp ---
Hmm, that's definitely related to the original change but most likely not to
the fixes.
gcc_assert (code == IFN_COND_ADD || code == IFN_COND_SUB
|| code == IFN_COND_MUL || code == IFN_
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112406
--- Comment #17 from Robin Dapp ---
Thanks, I reproduced it on the compile farm with this example. Going to have a
look. riscv doesn't fail in a similar way this time.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112406
--- Comment #18 from Robin Dapp ---
Already in ifcvt we have:
_ifc__60 = .COND_ADD (_2, _6, MADPictureC1_lsm.10_25, MADPictureC1_lsm.10_25);
which we should not. This is similar on riscv.
But during value numbering it still is
Value numberi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112406
--- Comment #20 from Robin Dapp ---
Not really depending on an order but rather expecting that the reduction
variable is in op[1] (as created by ifcvt).
That might already be the problem because here the reduction index is 2. It
just never hap
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112406
--- Comment #21 from Robin Dapp ---
Grml,
../../gcc/tree-vect-loop.cc:12248:1: fatal error: error writing to
/tmp/ccsMqSV2.s: No space left on device
on cfarm185, cannot even build anymore.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111488
Robin Dapp changed:
What|Removed |Added
Resolution|--- |FIXED
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112661
--- Comment #1 from Robin Dapp ---
Confirmed, smaller example:
program main
implicit none
integer, parameter :: n=5
character(len=6), dimension(n,n) :: a
character(len=6), dimension(n) :: r1
integer :: i
logical, dimension(n,n) :: m
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112661
--- Comment #3 from Robin Dapp ---
Yes, as agreed. Though today I probably won't be able to do much due to private
matters.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112670
--- Comment #1 from Robin Dapp ---
The problem is exposed with the ipa copy propagation pass. I haven't narrowed
it down yet but will continue tomorrow.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112464
Robin Dapp changed:
What|Removed |Added
Resolution|--- |FIXED
Status|NEW
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598
--- Comment #11 from Robin Dapp ---
On Friday I looked into one of the Fortran fails, class_67.f90 and debugged it
independently without reading here further. It is also due to the same reason
- alias analysis finds that the predicated store de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598
--- Comment #13 from Robin Dapp ---
It looks like the takeaway from the other thread is that there are many
likewise assumptions about masked stores in the middle end. It's probably
difficult to get them all right in a short time. Therefore I
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237
Robin Dapp changed:
What|Removed |Added
CC||rdapp at gcc dot gnu.org
Targe
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598
--- Comment #15 from Robin Dapp ---
Does the =m fix your issue? Or is the code gen different then and we're just
lucky? For my problem it doesn't help because we still don't recognize an
alias between load and store and the load is moved.
Ric
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112773
--- Comment #8 from Robin Dapp ---
Thanks for the testcase. It looks pretty similar to the situation why I
introduced the bitmask extract in the first place and I don't think that's the
root cause.
As last time the problem is that the generic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112773
--- Comment #9 from Robin Dapp ---
Ok, it's not the fold_extract_last expander. It just appeared that way here
because I disabled some other things.
What we want to do is extract the last element from a vector. This works as
long as we have a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112773
--- Comment #11 from Robin Dapp ---
When I define a vec_extract...bi pattern we don't enter the if (vec_extract) in
expmed because e.g.
bitsize = {1, 0}
bitnum = {3, 4}
and GET_MODE_BITSIZE (innermode) = {1, 0} with innermode = BImode.
This f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112773
--- Comment #13 from Robin Dapp ---
Mostly an issue because our expander is definitely not prepared to handle that
:)
It looks like aarch64's is, though, and ours can/should be changed then.
aarch64 doesn't need to implement a qi/bi extract fro
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112583
--- Comment #10 from Robin Dapp ---
I didn't yet look at all those closer because they are more dump failures than
real execution failures.
The ones I checked are
expected
"^foobar$" but got:
"foobar"
so I considered this rather an environmen
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112583
--- Comment #11 from Robin Dapp ---
Verified they work locally but also fail on a different server. Also fail
without vector and at -O0. Maybe it's different tcl versions or the shell
doing wonky stuff?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112583
--- Comment #12 from Robin Dapp ---
Ok, on my server the difference is that I didn't add vext_spec=v1.0 to the qemu
options. This caused the qemu diagnostic which would of course not match the
expected output.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112583
--- Comment #14 from Robin Dapp ---
Yes, that's the culprit. I already pushed a fix yesterday.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112854
--- Comment #2 from Robin Dapp ---
Hehe I was hoping we wouldn't hit a vec_set on a mask but apparently this
happens as well. We don't have a pattern for that either, yet.
Thanks for the test. I would expect this to be fixed in a similar way
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112853
--- Comment #5 from Robin Dapp ---
Can confirm. The scalable build works with qemu vlen=128 but fails with
vlen=256. That's a good data point as I'm not sure we're already covering this
with the current runs?
I'm going to start a testsuite ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112854
--- Comment #3 from Robin Dapp ---
The problem seems to be that we can overlay a 32-bit bitmask with an SImode
subreg and work with it. For zvl1024b on rv32 we don't allow this causing the
ICE.
We might be able to work around it by providing a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112853
--- Comment #6 from Robin Dapp ---
I indeed see more failures with _zvl128b, vlen=256 (than with _zvl128b,
vlen=128):
FAIL: gcc.dg/vect/pr66251.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr66251.c execution test
FAIL: gcc.dg/vec
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112853
--- Comment #7 from Robin Dapp ---
Ah, forgot three tests:
FAIL: gcc.dg/vect/bb-slp-cond-1.c execution test
FAIL: gcc.dg/vect/bb-slp-pr101668.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/bb-slp-pr101668.c execution test
On vlen=5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112872
--- Comment #2 from Robin Dapp ---
Thanks. Yes that's similar and also looks fixed by the introduction of the
vec_init expander. Added this test case to the patch and will push it soon.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112853
--- Comment #8 from Robin Dapp ---
With Juzhe's latest fix that disables VLS modes >= 128 bit for zvl128b x264
runs without issues here and some of the additional execution failures are
gone.
Will post the current comparison later.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929
--- Comment #6 from Robin Dapp ---
This seems to be gone when simple vsetvl (instead of lazy) is used or with
-fno-schedule-insns which might indicate a vsetvl pass problem.
We might have a few more of those. Maybe it would make sense to run t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929
--- Comment #7 from Robin Dapp ---
Here
0x105c6 vse8.v v8,(a5)
is where we overwrite m. The vl is 128 but the preceding vsetvl gets a4 =
46912504507016 as AVL which seems already borken.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929
--- Comment #9 from Robin Dapp ---
In the good version the length is 32 here because directly before the vsetvl we
have:
li a4,32
That seems to get lost somehow.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929
--- Comment #13 from Robin Dapp ---
I just built from the most recent commit and it still fails for me.
Could there be a difference in qemu? I'm on qemu-riscv64 version 8.1.91 but
yours is even newer so that might not explain it.
You could ste
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112853
--- Comment #10 from Robin Dapp ---
I just realized that I forgot to post the comparison recently. With the patch
now upstream I don't see any differences for zvl128b and different vlens
anymore. What I haven't fully tested yet is zvl256b or h
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110559
--- Comment #1 from Robin Dapp ---
This can be improved in parts by enabling register-pressure aware scheduling.
The rest is due to the default issue rate of 1. Setting proper instruction
latency will then obviously cause a bit more reordering
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929
--- Comment #15 from Robin Dapp ---
I think we need to make sure that we're not writing out of bounds. In that
case anything might happen and if we just don't happen to overwrite this
variable we might hit another one but the test can still pas
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112971
--- Comment #2 from Robin Dapp ---
It doesn't look like the same issue to me. The other bug is related to TImode
handling in combination with mask registers. I will also have a look at this
one.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112971
--- Comment #3 from Robin Dapp ---
In match.pd we do something like this:
;; Function e (e, funcdef_no=0, decl_uid=2751, cgraph_uid=1, symbol_order=4)
Pass statistics of "forwprop":
Matching expression match.pd:2771, gimple
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112971
--- Comment #5 from Robin Dapp ---
Yes that's what I just tried. No infinite loop anymore then. But that's not a
new simplification and looks reasonable so there must be something special for
our backend.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112971
--- Comment #8 from Robin Dapp ---
Yes, can confirm that this helps.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112999
Bug ID: 112999
Summary: riscv: Infinite loop with mask extraction
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112999
--- Comment #1 from Robin Dapp ---
What actually gets in the way of vec_extract here is changing to a "better"
vector mode (which is RVVMF4QI here). If we tried to extract from the mask
directly everything would work directly.
I have a patch l
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113014
--- Comment #2 from Robin Dapp ---
Yes, that's right.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113014
--- Comment #4 from Robin Dapp ---
Richard has posted it and asked for reviews. I have tested it and we have
several testsuite regressions with it but no severe ones. Most or all of them
are dump fails because we combine into vx variants that
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112773
--- Comment #16 from Robin Dapp ---
I'd hope it was not fixed by this but just latent because we chose a VLS-mode
vectorization instead. Hopefully we're better off with the fix than without :)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112999
Robin Dapp changed:
What|Removed |Added
Resolution|--- |FIXED
Status|UNCONFIRMED
101 - 200 of 399 matches
Mail list logo