https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118832
--- Comment #3 from Robin Dapp ---
The mechanism that introduces those unsplit instructions always seems to be via
reload. At some point we see a REG_EQUAL note with a const_vector. As we,
generally, can rematerialize constants we try to do th
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115458
--- Comment #9 from Robin Dapp ---
Bisecting further only leads to the commit that introduced the vector ABI.
Comparing the dumps with and without vector ABI is very tedious because a lot
of things differ.
It looks like we cannot create a relo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722
--- Comment #18 from Robin Dapp ---
> But the point here really here is we don't need the widening semantics, more
> twice. The min+max+sub in loops with a final reducing sum should do the
> trick.
OK I guess it can be argued that
minus (max
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722
--- Comment #15 from Robin Dapp ---
(In reply to Vineet Gupta from comment #14)
> (In reply to Li Pan from comment #7)
> > Created attachment 59661 [details]
> > with usad pattern
>
> Can you please post the patch, lest we duplicate your effort
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117990
--- Comment #7 from Robin Dapp ---
Does it work when we use the old gather-load code path instead of the strided
load or does this have the same problem?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118019
--- Comment #12 from Robin Dapp ---
Could you please check if the patch helped?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117383
Robin Dapp changed:
What|Removed |Added
Resolution|--- |FIXED
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118057
--- Comment #2 from Robin Dapp ---
I think depending on the performance of strided loads/stores this can be
profitable to vectorize. Looks like we need loop versioning to account for the
possible aliasing but once this is out of the way we coul
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117682
--- Comment #3 from Robin Dapp ---
The issue is in the way we construct an interleaved VLA const pattern. For
efficiency we try to use a larger element width, here 16 bits, to initialize
two values in one. I believe this doesn't go along well
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118057
--- Comment #6 from Robin Dapp ---
(In reply to Richard Biener from comment #5)
> I would expect this to be always slower when vectorized unless the core is
> seriously bottle-necked on the frontend. The loads/stores need to be
> decomposed to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116242
Bug 116242 depends on bug 115995, which changed state.
Bug 115995 Summary: RISC-V: Can't generate portable RVV code for rv64gcv_zvl512b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115995
What|Removed |Added
--
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115995
Robin Dapp changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118032
--- Comment #15 from Robin Dapp ---
> Based on earlier builds this file will take 2.5 to 3 hours to build (while
> all other cores are idle).
insn-attrtab.c doesn't consist of many functions so a split won't help. Given
that we have a number o
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118032
--- Comment #12 from Robin Dapp ---
It looks like the insn-recog split didn't help here but maybe of of the
mentioned commits slowed down the compilation of insn-attrtab.c?
Has anybody made progress with narrowing down the problem?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116146
Robin Dapp changed:
What|Removed |Added
Resolution|--- |FIXED
Status|NEW
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111619
Robin Dapp changed:
What|Removed |Added
CC||rdapp at gcc dot gnu.org
--- Comment #14 f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402
Bug 84402 depends on bug 116146, which changed state.
Bug 116146 Summary: Split insn-recog.cc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116146
What|Removed |Added
-
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116166
Robin Dapp changed:
What|Removed |Added
CC||rdapp at gcc dot gnu.org
--- Comment #32 f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117878
Robin Dapp changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 117878, which changed state.
Bug 117878 Summary: RISC-V: ICE when build spec17 526.blender_r with -O3
-march=rv64gcv_zvl256b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117878
What|Removed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117353
Robin Dapp changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117990
Robin Dapp changed:
What|Removed |Added
Resolution|--- |FIXED
Status|NEW
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117878
--- Comment #11 from Robin Dapp ---
I'm not really sure. For now I hope not. If we hit similar problems again
that are not easily fixable we can reconsider.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118019
--- Comment #9 from Robin Dapp ---
I think I'll post a patch to increase vec_construct costs first. It's just too
cheap right now. That should already help with the default settings.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118019
--- Comment #5 from Robin Dapp ---
According to Li Pan's results this is "just" vector strict align again?
We should be vectorizing the first loop, in particular after the SLP-grouping
changes.
I realize it's annoying having to resort to strict
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118019
--- Comment #7 from Robin Dapp ---
> The problem is GCC-15 has performance regression compare to GCC-14 on both
> strict align and we should fix it, we can't specify use no strict align in
> GCC-15 to pretend that we don't have such performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118036
Robin Dapp changed:
What|Removed |Added
Status|UNCONFIRMED |ASSIGNED
Assignee|unassigned at
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118019
--- Comment #10 from Robin Dapp ---
Ah I see - the actual vector code isn't even that bad and the vec_constructs
aren't either. The problem is rather that we have slow unaligned (scalar)
access with the default tune model. Thus we need to load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118140
--- Comment #8 from Robin Dapp ---
The optimized tree looks good apart from 'd' and the return value :)
[local count: 76665171]:
e_lsm.8_12 = e;
_55 = .MASK_LEN_LOAD (&MEM <_Bool[17]> [(void *)&f + 4B], 8B, { -1, ... },
_54(D), 13, 0);
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722
--- Comment #11 from Robin Dapp ---
(In reply to Li Pan from comment #9)
> Created attachment 59663 [details]
> before_vs_after when outer loop is 128
Ok, that's a different loop then. I'm seeing vmv1rs in the current version, is
that what you
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722
--- Comment #8 from Robin Dapp ---
So the difference is
(usad expansion)
vmax
vmin
vsub
vsext
vadd
vs (right now)
vwsub
vneg
vmax
vwadd
Why is that preferable?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118182
--- Comment #1 from Robin Dapp ---
We should probably always populate the initial value as the VL=0 only refers
(should refer) to the actual reduction?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115458
--- Comment #10 from Robin Dapp ---
It's also odd to see single-register spills for an LMUL8 register group, that
doesn't seem right.
(insn 174 169 180 2 (set (reg:RVVM1SF 247)
(reg:RVVM1SF 112 v16)) "pr115458.c":48:33 2749 {*movrvvm1sf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118140
--- Comment #18 from Robin Dapp ---
Fixed on trunk. Guess we still need a backport.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118154
Robin Dapp changed:
What|Removed |Added
Resolution|--- |FIXED
Status|NEW
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118140
Robin Dapp changed:
What|Removed |Added
Resolution|--- |FIXED
Status|NEW
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118019
--- Comment #15 from Robin Dapp ---
I think it's r15-2820-gab18785840d7b8.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118154
Robin Dapp changed:
What|Removed |Added
Component|tree-optimization |target
--- Comment #6 from Robin Dapp ---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340
--- Comment #2 from Robin Dapp ---
> The stores are not considered "grouped" because they have gaps.
> To do better we'd have to improve the store dataref analysis to see
> that a vectorization factor of four would "close" the gaps, or more
> g
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118154
--- Comment #5 from Robin Dapp ---
Confirmed, funnily only happens with a QEMU VLEN=128 and not with VLEN >= 256.
-fwrapv and -fno-strict-aliasing are not necessary for me.
Another "funny" thing:
vect__5.15_44 = .COND_LEN_MAX ({ -1, ... },
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118154
--- Comment #3 from Robin Dapp ---
Uh, what a nice small test case ;) I'll have a look when I'm back mid next
week.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118832
--- Comment #8 from Robin Dapp ---
I think for vec_duplicate the idea is the same as for all the other splits -
keep it in simple shape so we can combine/fwprop etc. It also helps converting
e.g.
vmv.v.x v3,a3 vadd.vv v1, v2, v3
into
vad
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118832
--- Comment #6 from Robin Dapp ---
Thanks for laying it out so clearly. Helps to put things into perspective.
I believe all our insn_and_split patterns already have can_create_pseudo_p in
their condition so shouldn't match after reload. Warra
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116351
--- Comment #3 from Robin Dapp ---
I started with a fix here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671939.html
but, due to other priorities, dropped the ball :/ Feel free to pick up from
there.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118950
Robin Dapp changed:
What|Removed |Added
CC||rguenth at gcc dot gnu.org
--- Comment #6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116686
Robin Dapp changed:
What|Removed |Added
CC||rdapp at gcc dot gnu.org
--- Comment #7 fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114516
--- Comment #1 from Robin Dapp ---
The issue is that we're not considering pattern statements for costing. It's
rather straightforward to include those as well which would fix this PR.
I'm going to test a patch locally.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118950
--- Comment #5 from Robin Dapp ---
Yeah, the original statement is recognized as a mask conversion pattern:
pr118950.c:9:21: note: vect_recog_mask_conversion_pattern: detected: _152 =
.MASK_LOAD (_230, 8B, _229, 0);
pr118950.c:9:21: note: m
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118950
--- Comment #11 from Robin Dapp ---
I figured this particular problem on RISC-V won't be fixed on GCC 14 because we
don't have the zeroing of masked elements there. But you're referring to
backporting just this patch, right?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118595
--- Comment #2 from Robin Dapp ---
Hmm I'm not seeing those locally with -march=rv64gcv_zvl256b at least. Which
exact options were used to run the test suite? Or have those fails disappeared
in the meanwhile?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114516
Robin Dapp changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116773
Bug 116773 depends on bug 114516, which changed state.
Bug 114516 Summary: RISC-V: TSVC2 s315 has spill with dynamic lmul
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114516
What|Removed |Added
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118950
Robin Dapp changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118950
--- Comment #4 from Robin Dapp ---
It indeed appears is if we need zeroing of the loaded gather values but
bool type_mode_padding_p
= TYPE_PRECISION (scalar_type) < GET_MODE_PRECISION (GET_MODE_INNER
(mode));
is false.
The last of the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117955
Robin Dapp changed:
What|Removed |Added
CC||rdapp at gcc dot gnu.org
--- Comment #7 fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115703
Robin Dapp changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116242
Bug 116242 depends on bug 115703, which changed state.
Bug 115703 Summary: [15 Regression] rv64gcv_zvl256b miscompile since
r15-1579-g792f97b44ff
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115703
What|Removed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119114
--- Comment #9 from Robin Dapp ---
I suspect the problem lies somewhere here:
_11 = .VEC_EXTRACT (mask__83.22_110, 0);
_23 = MEM[(short int *)&t + 20B];
_24 = _23 & _132;
_25 = _24 != 0;
_121 = () _25;
_157 = _11 ^ _121;
For
_121
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119114
--- Comment #17 from Robin Dapp ---
> No you got it wrong.
> _121 will either be -1 or 0. _11 should be -1 or 0 too.
> So the question is what was the VEC_EXTRACT doing the right thing? Is it
> 0/-1 or 0/1?
I literally mentioned VEC_EXTRACT in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119114
--- Comment #4 from Robin Dapp ---
Very weird indeed. It looks like we're not even vectorizing? I mean, sure, we
use vector instructions but they are all broadcast from scalars?
(VMAT_INVARIANT) And in the end we extract the first element wit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119114
--- Comment #10 from Robin Dapp ---
The test passes with -fno-vrp, so maybe the optimized tree isn't correct after
all?
Folding statement: _157 = _26 ? -1 : 0;
Matching expression match.pd:161, gimple-match-10.cc:33
Matching expression match.pd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119114
--- Comment #6 from Robin Dapp ---
As convoluted (and redundant) as it looks but the optimized tree looks at least
correct to me. Maybe a backend issue?
But I don't see costing for what we emit in the vectorizer and I didn't yet
find where we
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119114
--- Comment #11 from Robin Dapp ---
/* In GIMPLE, getting rid of 2 conversions for one new results
in smaller IL. */
(simplify
(convert (bitop:cs@2 (nop_convert:s @0) @1))
(if (GIMPLE
&& TREE_CODE (@1) != INTEGER_CST
&&
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119114
--- Comment #20 from Robin Dapp ---
Hmm, so right now we return "1" or "0" when extracting from a mask, not "-1" or
"0" and that's what aarch64/SVE does as well. We cannot start returning a
sign-extended -1 all of a sudden.
There is an inconsi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119114
--- Comment #22 from Robin Dapp ---
> Is that not happening? What value does _164 actually end up being?
>
> In other words, if the XOR is happening in GPRs, it doesn't matter whether
> the register holds 1 or -1 (or 3) for a true boolean. Th
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119224
--- Comment #2 from Robin Dapp ---
I'm afraid that's due to scheduling (and not RA spilling). Of course there
shouldn't be any vector stores in this loop and with -fno-schedule-insns there
aren't any.
It's much worse for zvl128b even. While t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119224
--- Comment #4 from Robin Dapp ---
Ah, sorry, I always specify -mno-vector-strict-align by default. It's always
that option that allows us to unroll, otherwise unrolling will lead to
misaligned accesses. And -mtune=generic-ooo defaults to
-mno
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119224
--- Comment #7 from Robin Dapp ---
> So this why you weren't seeing it but I'm confused about the rationale...
> I unpack above to following statements
>
> 1. -mno-vector-strict-align allows us to unroll - seems ok.
> 2. Otherwise (-mvector-str
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119115
--- Comment #5 from Robin Dapp ---
The problematic vsetvl is
vsetvli zero,a3,e16,m1,ta,ma
which was a
vsetvli a4,a3,e8,mf2,ta,ma
vsetvli t1,a3,e8,mf2,ta,ma
with the simple strategy.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119115
--- Comment #2 from Robin Dapp ---
(In reply to Andrew Pinski from comment #1)
> Could this be another one of the vsetivli failures?
100% as I get "0" with --param=vsetvl-strategy=simple. But at first sight
unrelated to the previous ones. Wil
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119115
Robin Dapp changed:
What|Removed |Added
Last reconfirmed||2025-3-4
--- Comment #3 from Robin Dapp -
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119115
Robin Dapp changed:
What|Removed |Added
Resolution|--- |FIXED
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117955
Robin Dapp changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119361
--- Comment #4 from Robin Dapp ---
(In reply to Edwin Lu from comment #3)
> I'm not familiar enough with how the two modes interact with each other but
> I guess my question is, why do we have so many conversions between the two
> modes? What's
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119361
--- Comment #2 from Robin Dapp ---
I looked into this some more and it points to a general deficiency in how we
handle the split between VLA and VLS modes.
With ...bits=zvl the RVVM1SI etc modes. become VLS modes. In turn, this means
that whene
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119114
Robin Dapp changed:
What|Removed |Added
Resolution|--- |FIXED
Status|NEW
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119361
--- Comment #1 from Robin Dapp ---
The issue is due to:
_279 = BIT_FIELD_REF <_480, 64, 0>;
_330 = BIT_FIELD_REF <_480, 64, 64>;
_340 = BIT_FIELD_REF <_481, 64, 0>;
_350 = BIT_FIELD_REF <_481, 64, 64>;
Ideally they expand to simple sl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119672
Robin Dapp changed:
What|Removed |Added
CC||rdapp at gcc dot gnu.org
--- Comment #6 fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119672
--- Comment #8 from Robin Dapp ---
(In reply to Jakub Jelinek from comment #7)
> Thanks, I've posted it to gcc-patches in case some CI picks it up too:
> https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680408.html
Testing looked good on rv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119547
--- Comment #5 from Robin Dapp ---
Do you happen to have an excution test ready so I can have a look?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119577
Bug ID: 119577
Summary: RISC-V: Redundant vector IV roundtrip.
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: enhancement
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119572
Robin Dapp changed:
What|Removed |Added
Priority|P1 |P3
--- Comment #3 from Robin Dapp ---
(In
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116398
Robin Dapp changed:
What|Removed |Added
CC||rdapp at gcc dot gnu.org
--- Comment #17 f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119547
--- Comment #16 from Robin Dapp ---
> Yes, it is precisely the issue I have encountered in cvtScale8s64f (actually
> in cvt_64f). After the commit 34ae3a99, the default value of
> LOGICAL_OP_NON_SHORT_CIRCUIT has changed from 0 into 1, it will c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116595
Robin Dapp changed:
What|Removed |Added
Resolution|--- |FIXED
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119547
--- Comment #15 from Robin Dapp ---
> Yes, it is precisely the issue I have encountered in cvtScale8s64f (actually
> in cvt_64f). After the commit 34ae3a99, the default value of
> LOGICAL_OP_NON_SHORT_CIRCUIT has changed from 0 into 1, it will c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119547
--- Comment #12 from Robin Dapp ---
> I recompile the opencv application with current gcc(commit b6aafe9a5b), and
> it still reproduce this bug. Do you have apply the patch of step 3 which
> enable vector implement of cvt_64f function?
Yes, I
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119547
--- Comment #13 from Robin Dapp ---
Hmm, now I compiled with -O3 on top of --param logical-op-non-short-circuit=0
(which shouldn't actually be necessary or change anything as it's the default)
but there is a segmentation fault in
_ZN2cv12cpu_b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119547
--- Comment #9 from Robin Dapp ---
> cmake --build cross-build/$BUILD_DIR-gcc --target opencv_test_core -j10
> ```
> 4. run
> ```
> export LD_LIBRARY_PATH=//lib
> ./opencv_test_core --gtest_filter="Core_ConvertScale/ElemWiseTest.accuracy/0"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116595
Robin Dapp changed:
What|Removed |Added
See Also||https://gcc.gnu.org/bugzill
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116595
--- Comment #7 from Robin Dapp ---
Ah, not a regression but just a checking assert, sorry.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119547
--- Comment #10 from Robin Dapp ---
> 4. run
> ```
> export LD_LIBRARY_PATH=//lib
> ./opencv_test_core --gtest_filter="Core_ConvertScale/ElemWiseTest.accuracy/0"
> ```
[==] Running 1 test from 1 test case.
[--] Global test e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119373
--- Comment #5 from Robin Dapp ---
> The analysis of SPEC2017's 510.parest_r shows that the topmost basic block
> is a tight loop (see attached reducer). Once vectorised, by unrolling and
> mutualising 4 instructions, AArch64 achieves a 22% redu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119832
Robin Dapp changed:
What|Removed |Added
CC||rdapp at gcc dot gnu.org,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120067
Robin Dapp changed:
What|Removed |Added
CC||rdapp at gcc dot gnu.org
--- Comment #4 fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119547
Robin Dapp changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119577
--- Comment #3 from Robin Dapp ---
I manage to have a quick look at the code now. It looks like we force live
every induction and build slp instances for the IV increments.
I don't think adjusting the actual IV creation in vectorizable_inducti
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118950
--- Comment #13 from Robin Dapp ---
Going to push this to the 14 branch later today if the x86 testsuite shows no
regressions.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118950
Robin Dapp changed:
What|Removed |Added
Resolution|--- |FIXED
Status|NEW
301 - 399 of 399 matches
Mail list logo