[Bug tree-optimization/100756] [12 Regression] vect: Superfluous epilog created on s390x

2023-02-01 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100756 --- Comment #8 from rdapp at gcc dot gnu.org --- For completeness: haven't observed any fallout on s390 since and the regression is fixed.

[Bug middle-end/106527] New: ICE with modulo scheduling dump (-fdump-rtl-sms)

2022-08-04 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106527 Bug ID: 106527 Summary: ICE with modulo scheduling dump (-fdump-rtl-sms) Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component:

[Bug rtl-optimization/105988] [10/11/12/13 Regression] ICE in linemap_ordinary_map_lookup, at libcpp/line-map.cc:1064 since r6-4873-gebedc9a3414d8422

2022-08-04 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105988 rdapp at gcc dot gnu.org changed: What|Removed |Added Target|x86_64-pc-linux-gnu |x86_64-pc-linux-gnu s390 ---

[Bug target/106701] Compiler does not take into account number range limitation to avoid subtract from immediate

2022-08-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106701 rdapp at gcc dot gnu.org changed: What|Removed |Added Target|s390|s390 x86_64-linux-gnu

[Bug target/106701] Compiler does not take into account number range limitation to avoid subtract from immediate

2022-08-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106701 --- Comment #3 from rdapp at gcc dot gnu.org --- I though expand (or combine) were independent of value range. What would be the proper place for it then?

[Bug middle-end/91213] Missed optimization: (sub X Y) -> (xor X Y) when Y <= X and isPowerOf2(X + 1)

2022-08-29 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91213 rdapp at gcc dot gnu.org changed: What|Removed |Added CC||rdapp at gcc dot gnu.org --- C

[Bug middle-end/91213] Missed optimization: (sub X Y) -> (xor X Y) when Y <= X and isPowerOf2(X + 1)

2022-08-31 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91213 --- Comment #8 from rdapp at gcc dot gnu.org --- Hacked something together, inspired by the other cases that try two different sequences. Does this go into the right direction? Works for me on s390. I see some regressions related to predictive c

[Bug middle-end/91213] Missed optimization: (sub X Y) -> (xor X Y) when Y <= X and isPowerOf2(X + 1)

2022-08-31 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91213 --- Comment #9 from rdapp at gcc dot gnu.org --- The regressions are unrelated and due to another patch that I still had on the same branch.

[Bug target/106919] [13 Regression] RTL check: expected code 'set' or 'clobber', have 'if_then_else' in s390_rtx_costs, at config/s390/s390.cc:3672on s390x-linux-gnu since r13-2251-g1930c5d05ceff2

2022-09-23 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106919 --- Comment #8 from rdapp at gcc dot gnu.org --- Yes, one of dst and dest is superflous. Looks good like that. I bootstrapped the same patch locally already, no regressions.

[Bug tree-optimization/100756] vect: Superfluous epilog created on s390x

2022-10-20 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100756 rdapp at gcc dot gnu.org changed: What|Removed |Added CC||rdapp at gcc dot gnu.org ---

[Bug middle-end/107617] New: SCC-VN with len_store and big endian

2022-11-10 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107617 Bug ID: 107617 Summary: SCC-VN with len_store and big endian Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end

[Bug middle-end/107617] SCC-VN with len_store and big endian

2022-11-10 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107617 rdapp at gcc dot gnu.org changed: What|Removed |Added Priority|P3 |P4

[Bug middle-end/107617] SCC-VN with len_store and big endian

2022-11-10 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107617 --- Comment #1 from rdapp at gcc dot gnu.org --- For completeness, the mailing list thread is here: https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602252.html

[Bug target/113827] New: MrBayes benchmark redundant load

2024-02-08 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113827 Bug ID: 113827 Summary: MrBayes benchmark redundant load Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target

[Bug target/113827] MrBayes benchmark redundant load on riscv

2024-02-08 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113827 --- Comment #1 from Robin Dapp --- x86 (-march=native -O3 on an i7 12th gen) looks pretty similar: .L3: movq(%rdi), %rax vmovups (%rax), %xmm1 vdivps %xmm0, %xmm1, %xmm1 vmovups %xmm1, (%rax) addq

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-02-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #4 from Robin Dapp --- Judging by the graph it looks like it was slow before, then got faster and now slower again. Is there some more info on why it got faster in the first place? Did the patch reverse something or is it rather a

[Bug target/114027] [14] RISC-V vector: miscompile at -O3

2024-02-22 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027 Robin Dapp changed: What|Removed |Added CC||rguenth at gcc dot gnu.org Last reconfir

[Bug target/114027] [14] RISC-V vector: miscompile at -O3

2024-02-22 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027 --- Comment #9 from Robin Dapp --- Argh, I actually just did a gcc -O3 -march=native pr114027.c -fno-vect-cost-model on cfarm188 with a recent-ish GCC but realized that I used my slightly modified version and not the original test case. long a

[Bug target/114028] [14] RISC-V rv64gcv_zvl256b: miscompile at -O3

2024-02-22 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114028 --- Comment #2 from Robin Dapp --- This is a target issue. It looks like we try to construct a "superword" sequence when the element size is already == Pmode. Testing a patch.

[Bug middle-end/114109] New: x264 satd vectorization vs LLVM

2024-02-26 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114109 Bug ID: 114109 Summary: x264 satd vectorization vs LLVM Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Prio

[Bug middle-end/114109] x264 satd vectorization vs LLVM

2024-02-26 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114109 --- Comment #2 from Robin Dapp --- It is vectorized with a higher zvl, e.g. zvl512b, refer https://godbolt.org/z/vbfjYn5Kd.

[Bug middle-end/114109] x264 satd vectorization vs LLVM

2024-02-26 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114109 --- Comment #4 from Robin Dapp --- Yes, as mentioned, vectorization of the first loop is debatable.

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-04 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #6 from Robin Dapp --- Honestly, I don't know how to analyze/debug this without a zen4, in particular as it only seems to happen with PGO. I tried locally but of course the execution time doesn't change (same as with zen3 according

[Bug target/114200] [14] RISC-V fixed-length vector miscompile at -O3

2024-03-06 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114200 --- Comment #1 from Robin Dapp --- Took me a while to analyze this... needed more time than I'd like to admit to make sense of the somewhat weird code created by fully unrolling and peeling. I believe the problem is that we reload the output re

[Bug middle-end/114196] [13/14 Regression] Fixed length vector ICE: in vect_peel_nonlinear_iv_init, at tree-vect-loop.cc:9454

2024-03-06 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114196 Robin Dapp changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill

[Bug target/114200] [14] RISC-V fixed-length vector miscompile at -O3

2024-03-06 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114200 --- Comment #3 from Robin Dapp --- *** Bug 114202 has been marked as a duplicate of this bug. ***

[Bug target/114202] [14] RISC-V rv64gcv: miscompile at -O3

2024-03-06 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114202 Robin Dapp changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-08 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #7 from Robin Dapp --- I built executables with and without the commit (-Ofast -march=znver4 -flto). There is no difference so it must really be something that happens with PGO. I'd really need access to a zen4 box or the pgo execut

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #10 from Robin Dapp --- (In reply to Sam James from comment #9) > (In reply to Filip Kastl from comment #8) > > I'd like to help but I'm afraid I cannot send you the SPEC binaries with PGO > > applied since SPEC is licensed nor can I

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-14 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #16 from Robin Dapp --- Thank you! I'm having a problem with the data, though. Compiling with -Ofast -march=znver4 -mtune=znver4 -flto -fprofile-use=/tmp. Would you mind showing your exact final options for compilation of e.g. pbeam

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-14 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #18 from Robin Dapp --- Hmm, doesn't help unfortunately. A full command line for me looks like: x86_64-pc-linux-gnu-gcc -c -o pbeampp.o -DSPEC_CPU -DNDEBUG -DWANT_STDC_PROTO -Ofast -march=znver4 -mtune=znver4 -flto=32 -g -fprofil

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-14 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #20 from Robin Dapp --- No change with -std=gnu99 unfortunately.

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-14 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #22 from Robin Dapp --- Still the same problem unfortunately. I'm a bit out of ideas - maybe your compiler executables could help?

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-14 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #24 from Robin Dapp --- I rebuilt GCC from scratch with your options but still have the same problem. Could our sources differ? My SPEC version might not be the most recent but I'm not aware that mcf changed at some point. Just to

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-15 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #27 from Robin Dapp --- Can you try it with a simpler (non SPEC) test? Maybe there is still something weird happening with SPEC's scripting.

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-15 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #29 from Robin Dapp --- Yes, that also appears to work here. There was no lto involved this time? Now we need to figure out what's different with SPEC.

[Bug target/114396] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3 with -fwrapv

2024-03-19 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396 Robin Dapp changed: What|Removed |Added Target|riscv*-*-* |x86_64-*-* riscv*-*-* --- Comment #2 from

[Bug target/114396] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3 with -fwrapv

2024-03-19 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396 --- Comment #3 from Robin Dapp --- -O3 -mavx2 -fno-vect-cost-model -fwrapv seems to be sufficient.

[Bug tree-optimization/114396] [14 Regression] Vector: Runtime mismatch at -O2 with -fwrapv

2024-03-19 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396 --- Comment #7 from Robin Dapp --- diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 4375ebdcb49..f8f7ba0ccc1 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -9454,7 +9454,7 @@ vect_peel_nonlinear_iv_init (gimple

[Bug tree-optimization/114396] [14 Regression] Vector: Runtime mismatch at -O2 with -fwrapv

2024-03-19 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396 --- Comment #8 from Robin Dapp --- No fallout on x86 or aarch64. Of course using false instead of TYPE_SIGN (utype) is also possible and maybe clearer?

[Bug tree-optimization/114476] [13/14 Regression] wrong code with -fwrapv -O3 -fno-vector-cost-mode (and -march=armv9-a+sve2 on aarch64 and -march=rv64gcv on riscv)

2024-03-26 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114476 --- Comment #5 from Robin Dapp --- So the result is -9 instead of 9 (or vice versa) and this happens (just) with vectorization. We only vectorize with -fwrapv. >From a first quick look, the following is what we have before vect: (loop) [lo

[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64

2024-03-27 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485 --- Comment #4 from Robin Dapp --- Yes, the vectorization looks ok. The extracted live values are not used afterwards and therefore the whole vectorized loop is being thrown away. Then we do one iteration of the epilogue loop, inverting the ori

[Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523

2024-04-02 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515 Robin Dapp changed: What|Removed |Added CC||ewlu at rivosinc dot com,

[Bug rtl-optimization/108412] RISC-V: Negative optimization of GCSE && LOOP INVARIANTS

2023-08-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108412 Robin Dapp changed: What|Removed |Added CC||rdapp at gcc dot gnu.org --- Comment #3 fr

[Bug tree-optimization/111136] New: ICE in RISC-V test case since r14-3441-ga1558e9ad85693

2023-08-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36 Bug ID: 36 Summary: ICE in RISC-V test case since r14-3441-ga1558e9ad85693 Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Comp

[Bug target/108271] Missed RVV cost model

2023-08-25 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108271 Robin Dapp changed: What|Removed |Added CC||rdapp at gcc dot gnu.org --- Comment #3 fr

[Bug tree-optimization/111136] ICE in RISC-V test case since r14-3441-ga1558e9ad85693

2023-08-25 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36 --- Comment #4 from Robin Dapp --- All gather-scatter tests pass for me again (the given example in particular) after applying this.

[Bug c/111153] RISC-V: Incorrect Vector cost model for reduction

2023-08-25 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53 --- Comment #1 from Robin Dapp --- We seem to decide that a slightly more expensive loop (one instruction more) without an epilogue is better than a loop with an epilogue. This looks intentional in the vectorizer cost estimation and is not spec

[Bug target/110559] Bad mask_load/mask_store codegen of RVV

2023-08-25 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110559 --- Comment #3 from Robin Dapp --- I got back to this again today, now that pressure-aware scheduling is the default. As mentioned before, it helps but doesn't get rid of the spills. Testing with the "generic ooo" scheduling model it looks lik

[Bug target/111311] New: RISC-V regression testsuite errors with --param=riscv-autovec-preference=scalable

2023-09-06 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111311 Bug ID: 111311 Summary: RISC-V regression testsuite errors with --param=riscv-autovec-preference=scalable Product: gcc Version: 14.0 Status: UNCONFIRMED Severi

[Bug c/111337] ICE in gimple-isel.cc for RISC-V port

2023-09-08 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111337 Robin Dapp changed: What|Removed |Added CC||rdapp at gcc dot gnu.org --- Comment #1 fr

[Bug middle-end/111337] ICE in gimple-isel.cc for RISC-V port

2023-09-12 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111337 --- Comment #8 from Robin Dapp --- Yes, I doubt we would get much below 4 instructions with riscv specifics. A quick grep yesterday didn't reveal any aarch64 or gcn patterns for those (as long as they are not hidden behind some pattern replacem

[Bug middle-end/111337] ICE in gimple-isel.cc for RISC-V port

2023-09-12 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111337 --- Comment #10 from Robin Dapp --- I would be OK with the riscv implementation, then we don't need to touch isel. Maybe a future vector extension will also help us here so we could just switch the implementation then.

[Bug middle-end/111337] ICE in gimple-isel.cc for RISC-V port

2023-09-12 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111337 --- Comment #12 from Robin Dapp --- Yes, as far as I know. I would also go ahead and merge the test suite patch now as there is already a v2 fix posted. Even if it's not the correct one it will be done soon so we should not let that block enab

[Bug target/111317] RISC-V: Incorrect COST model for RVV conversions

2023-09-12 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111317 --- Comment #1 from Robin Dapp --- I think the default cost model is not too bad for these simple cases. Our emitted instructions match gimple pretty well. The thing we don't model is vsetvl. We could ignore it under the assumption that it is

[Bug c/111153] RISC-V: Incorrect Vector cost model for reduction

2023-09-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53 --- Comment #2 from Robin Dapp --- With the current trunk we don't spill anymore: (VLS) .L4: vle32.v v2,0(a5) vadd.vv v1,v1,v2 addia5,a5,16 bne a5,a4,.L4 Considering just that loop I'd say costing works

[Bug c/111153] RISC-V: Incorrect Vector cost model for reduction

2023-09-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53 --- Comment #4 from Robin Dapp --- Yes, with VLS reduction this will improve. On aarch64 + sve I see loop inside costs: 2 This is similar to our VLS costs. And their loop is indeed short: ld1wz30.s, p7/z, [x0, x2, lsl 2] a

[Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS

2023-09-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401 Robin Dapp changed: What|Removed |Added CC||rdapp at gcc dot gnu.org --- Comment #2 fr

[Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS

2023-09-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401 --- Comment #3 from Robin Dapp --- Several other things came up, so I'm just going to post the latest status here without having revised or tested it. Going to try fixing it and testing tomorrow. --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect

[Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS

2023-09-14 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401 --- Comment #6 from Robin Dapp --- Created attachment 55902 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55902&action=edit Tentative You're referring to the case where we have init = -0.0, the condition is false and we end up wrongly do

[Bug target/111488] New: ICE ion riscv gcc.dg/vect/vect-126.c

2023-09-19 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111488 Bug ID: 111488 Summary: ICE ion riscv gcc.dg/vect/vect-126.c Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target

[Bug target/111488] ICE ion riscv gcc.dg/vect/vect-126.c

2023-09-19 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111488 Robin Dapp changed: What|Removed |Added CC||juzhe.zhong at rivai dot ai --- Comment #1

[Bug target/111428] RISC-V vector: Flaky segfault in {min|max}val_char_{1|2}.f90

2023-09-21 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111428 --- Comment #2 from Robin Dapp --- Reproduced locally. The identical binary sometimes works and sometimes doesn't so it must be a race...

[Bug target/111506] RISC-V: Failed to vectorize conversion from INT64 -> _Float16

2023-10-02 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111506 Robin Dapp changed: What|Removed |Added CC||joseph at codesourcery dot com --- Comment

[Bug target/111600] [14 Regression] RISC-V bootstrap time regression

2023-10-02 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600 Robin Dapp changed: What|Removed |Added CC||law at gcc dot gnu.org --- Comment #12 fro

[Bug target/111506] RISC-V: Failed to vectorize conversion from INT64 -> _Float16

2023-10-02 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111506 --- Comment #5 from Robin Dapp --- Ah, thanks Joseph, so this at least means that we do not need !flag_trapping_math here. However, the vectorizer emulates the 64-bit integer to _Float16 conversion via an intermediate int32_t and now the riscv

[Bug target/111600] [14 Regression] RISC-V bootstrap time regression

2023-10-04 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600 --- Comment #16 from Robin Dapp --- Confirming that it's the compilation of insn-emit.cc which takes > 10 minutes. The rest (including auto generating of files) is reasonably fast. Going to do some experiments with it and see which pass takes

[Bug target/111600] [14 Regression] RISC-V bootstrap time regression

2023-10-04 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600 --- Comment #18 from Robin Dapp --- Just finished an initial timing run, sorted, first 10: Time variable usr sys wall GGC phase opt and generate : 567.60 ( 97%) 38.23

[Bug target/111600] [14 Regression] RISC-V bootstrap time regression

2023-10-04 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600 --- Comment #20 from Robin Dapp --- Mhm, why is your profile so different from mine? I'm also on an x86_64 host with a 13.2.1 host compiler (Fedora). Is it because of the preprocessed source? Or am I just reading the timing report wrong?

[Bug target/111600] [14 Regression] RISC-V bootstrap time regression

2023-10-04 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600 --- Comment #22 from Robin Dapp --- Ah, then it's not that different, your machine is just faster ;) callgraph ipa passes : 69.77 ( 11%) 5.97 ( 13%) 76.05 ( 12%) 2409M ( 10%) integration: 91.95 ( 15

[Bug target/111428] RISC-V vector: Flaky segfault in {min|max}val_char_{1|2}.f90

2023-10-10 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111428 --- Comment #3 from Robin Dapp --- Still difficult to track down. The following is a smaller reproducer: program main implicit none integer, parameter :: n=5, m=3 integer, dimension(n,m) :: v real, dimension(n,m) :: r do call r

[Bug tree-optimization/111760] risc-v regression: COND_LEN_* incorrect fold/simplify in middle-end

2023-10-10 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111760 Robin Dapp changed: What|Removed |Added CC||rdapp at gcc dot gnu.org,

[Bug tree-optimization/111760] risc-v regression: COND_LEN_* incorrect fold/simplify in middle-end

2023-10-10 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111760 --- Comment #6 from Robin Dapp --- Yes, thanks for filing this bug separately. The patch doesn't disable all of those optimizations, of course I paid special attention not mess up with them. The difference here is that we valueize, add stateme

[Bug bootstrap/116146] Split insn-recog.cc

2024-07-31 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116146 --- Comment #3 from Robin Dapp --- On riscv insn-output is the largest file right now as well. I have a local patch that splits it - it's a bit cumbersome because the static initializer needs to be made non-static i.e. the initialization must b

[Bug target/111600] [14/15 Regression] RISC-V bootstrap time regression

2024-07-31 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600 --- Comment #37 from Robin Dapp --- > The size of the partitions is a little uneven though. Using > --with-emitinsn-partitions=48 I get some empty partitions and some bigger > than 2MB: > Another problematic file is insn-recog.cc which is 19MB

[Bug target/116149] RISC-V: Miscompile at -O3 with zvl256b -mrvv-vector-bits=zvl

2024-07-31 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116149 --- Comment #1 from Robin Dapp --- > Still present when rvv_ta_all_1s=true is omitted. My result is '0' when rvv_ta_all_1s=false, is that what you meant? I didn't have time to check this in detail but it's not the missing else for masked loads

[Bug target/116149] RISC-V: Miscompile at -O3 with zvl256b -mrvv-vector-bits=zvl

2024-07-31 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116149 --- Comment #2 from Robin Dapp --- Correction, it's actually just the wx adds with a length of 1 and those should be "tu". Quite likely this only got exposed recently with the late-combine change in place.

[Bug target/116149] RISC-V: Miscompile at -O3 with zvl256b -mrvv-vector-bits=zvl

2024-07-31 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116149 --- Comment #3 from Robin Dapp --- It looks like the problem is a wrong mode_idx attribute for the wx variants of the adds. The widening adds's mode is the one of the non-widened input operand but for the wx/scalar variants this is a scalar mod

[Bug target/116149] RISC-V: Miscompile at -O3 with zvl256b -mrvv-vector-bits=zvl

2024-08-01 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116149 Robin Dapp changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/116202] RISC-V: Miscompile at -O3 with zvl256b

2024-08-03 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116202 Robin Dapp changed: What|Removed |Added CC||pan2.li at intel dot com --- Comment #1 fr

[Bug middle-end/115495] [15 Regression] ICE in smallest_mode_for_size, at stor-layout.cc:356 during combine on RISC-V rv64gcv_zvl256b at -O3 since r15-1042-g68b0742a49d

2024-08-20 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115495 Robin Dapp changed: What|Removed |Added Component|rtl-optimization|middle-end --- Comment #6 from Robin Dapp

[Bug middle-end/115495] [15 Regression] ICE in smallest_mode_for_size, at stor-layout.cc:356 during combine on RISC-V rv64gcv_zvl256b at -O3 since r15-1042-g68b0742a49d

2024-08-20 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115495 --- Comment #7 from Robin Dapp --- Ah, hmm, this doesn't seem to occur on trunk anymore for me. It's still likely latent. Patrick, does it still happen for you?

[Bug middle-end/115495] [15 Regression] ICE in smallest_mode_for_size, at stor-layout.cc:356 during combine on RISC-V rv64gcv_zvl256b at -O3 since r15-1042-g68b0742a49d

2024-08-23 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115495 Robin Dapp changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug target/116086] RISC-V: Hash mismatch with vectorized 557.xz_r at zvl128b and LMUL=m2

2024-08-29 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116086 Robin Dapp changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/116242] [meta-bug] Tracker for zvl issues in RISC-V

2024-08-29 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116242 Bug 116242 depends on bug 116086, which changed state. Bug 116086 Summary: RISC-V: Hash mismatch with vectorized 557.xz_r at zvl128b and LMUL=m2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116086 What|Removed

[Bug target/116611] Inefficient mix of contiguous and load-lane vectorization due to missing permutes

2024-09-05 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116611 --- Comment #1 from Robin Dapp --- For the record, with the default -march=rv64gcv I don't see any LOAD_LANES, with -march=rv64gcv -mrvv-vector-bits=zvl I do.

[Bug target/116611] Inefficient mix of contiguous and load-lane vectorization due to missing permutes

2024-09-06 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116611 --- Comment #3 from Robin Dapp --- Actually we're already supposed to be handling all constant permutes. Maybe what's in the way is /* FIXME: Explicitly disable VLA interleave SLP vectorization when we may encounter ICE for poly size (1

[Bug target/116611] Inefficient mix of contiguous and load-lane vectorization due to missing permutes

2024-09-06 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116611 --- Comment #4 from Robin Dapp --- I just send a patch to get rid of this early exit in our backend. However with test testsuite compile options -O3 -march=rv64gcv -fno-vect-cost-model I still see MASK_LEN_LOAD_LANES.

[Bug tree-optimization/116573] [15 Regression] Recent SLP work appears to generate significantly worse code on RISC-V

2024-09-17 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116573 --- Comment #7 from Robin Dapp --- I'm testing a patch that basically does what Richi proposes. I was also playing around with mixed lane configurations where we potentially reuse the pointer increment from another pointer update. To me the co

[Bug tree-optimization/114476] [13/14 Regression] wrong code with -fwrapv -O3 -fno-vect-cost-model (and -march=armv9-a+sve2 on aarch64 and -march=rv64gcv on riscv)

2024-04-03 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114476 --- Comment #8 from Robin Dapp --- I tried some things (for the related bug without -fwrapv) then got busy with some other things. I'm going to have another look later this week.

[Bug ipa/114247] RISC-V: miscompile at -O3 and IPA SRA

2024-04-04 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114247 --- Comment #5 from Robin Dapp --- This fixes the test case for me locally, thanks. I can run the testsuite with it later if you'd like.

[Bug ipa/114247] RISC-V: miscompile at -O3 and IPA SRA

2024-04-04 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114247 --- Comment #6 from Robin Dapp --- Testsuite looks unchanged on rv64gcv.

[Bug target/114665] [14] RISC-V rv64gcv: miscompile at -O3

2024-04-10 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114665 --- Comment #1 from Robin Dapp --- Hmm, my local version is a bit older and seems to give the same result for both -O2 and -O3. At least a good starting point for bisection then.

[Bug target/114665] [14] RISC-V rv64gcv: miscompile at -O3

2024-04-10 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114665 --- Comment #2 from Robin Dapp --- Checked with the latest commit on a different machine but still cannot reproduce the error. PR114668 I can reproduce. Maybe a copy and paste problem?

[Bug target/114668] [14] RISC-V rv64gcv: miscompile at -O3

2024-04-10 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114668 --- Comment #2 from Robin Dapp --- This, again, seems to be a problem with bit extraction from masks. For some reason I didn't add the VLS modes to the corresponding vec_extract patterns. With those in place the problem is gone because we go th

[Bug target/114686] Feature request: Dynamic LMUL should be the default for the RISC-V Vector extension

2024-04-15 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114686 --- Comment #3 from Robin Dapp --- I think we have always maintained that this can definitely be a per-uarch default but shouldn't be a generic default. > I don't see any reason why this wouldn't be the case for the vast majority of > implement

[Bug target/114668] [14] RISC-V rv64gcv: miscompile at -O3

2024-04-15 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114668 Robin Dapp changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/114665] [14] RISC-V rv64gcv: miscompile at -O3

2024-04-15 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114665 --- Comment #5 from Robin Dapp --- Weird, I tried your exact qemu version and still can't reproduce the problem. My results are always FFB5. Binutils difference? Very unlikely. Could you post your QEMU_CPU settings just to be sure?

[Bug middle-end/114733] [14] Miscompile with -march=rv64gcv -O3 on riscv

2024-04-16 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114733 --- Comment #1 from Robin Dapp --- Confirmed, also shows up here.

[Bug target/114734] [14] RISC-V rv64gcv_zvl256b miscompile with -flto -O3 -mrvv-vector-bits=zvl

2024-04-16 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734 --- Comment #1 from Robin Dapp --- Confirmed.

  1   2   3   4   >