[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-03-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comm

[Bug rtl-optimization/99462] Enhance scheduling to split instructions

2021-03-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99462 --- Comment #3 from Alexander Monakov --- (for context, the above patch was for PR 98856, but it's based on incorrect latency analysis, see bug 98856 comment #38 ) Right now schedulers cannot easily split instructions for that purpose, it would

[Bug rtl-optimization/99469] ICE: qsort checking failed with selective scheduling on aarch64

2021-03-09 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99469 Alexander Monakov changed: What|Removed |Added Blocks||82407 --- Comment #2 from Alexander

[Bug middle-end/99619] New: fails to infer local-dynamic TLS model from hidden visibility

2021-03-16 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99619 Bug ID: 99619 Summary: fails to infer local-dynamic TLS model from hidden visibility Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: missed-optimizatio

[Bug c++/99728] code pessimization when using wrapper classes around SIMD types

2021-03-23 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99728 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comm

[Bug target/99582] No intrinsics to access rcl or rcr instruction on x86_64

2021-03-23 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99582 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comm

[Bug rtl-optimization/100225] [8/9/10/11/12 Regression] ICE in add_cross_iteration_register_deps, at ddg.c:291

2021-04-23 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100225 Alexander Monakov changed: What|Removed |Added Blocks|85099 | CC|

[Bug target/97366] [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics

2020-10-11 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comm

[Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call

2020-10-12 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203 --- Comment #8 from Alexander Monakov --- No, -msoft-stack-reserve-local is really meant to be in bytes: it may not exceed the amount of .local memory reserved by CUDA driver (which is just 1-2 KB, unless overridden via cuCtxSetLimit, which nvptx

[Bug target/97366] [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics

2020-10-12 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366 --- Comment #5 from Alexander Monakov --- afaict LRA is just following IRA decisions, and IRA allocates that pseudo to memory due to costs. Not sure where strange cost is coming from, but it depends on x86 tuning options: with -mtune=skylake we

[Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call

2020-10-12 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203 --- Comment #11 from Alexander Monakov --- Yes, that.

[Bug inline-asm/97708] Inline asm does not use the local register asm specified with register ... asm() as input

2020-11-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97708 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comm

[Bug target/97734] GCC using branches when a conditional move would be better

2020-11-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97734 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comm

[Bug inline-asm/97708] Inline asm does not use the local register asm specified with register ... asm() as input

2020-11-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97708 --- Comment #30 from Alexander Monakov --- Asm operand binding should work by looking at bound lvalue: "c"(a) binds an lvalue so if 'a' is a register var the compiler must remember its associated register; "c"(a+0) binds an rvalue, so what kind o

[Bug libstdc++/98226] Slow std::countr_one

2020-12-11 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98226 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comm

[Bug target/97127] FMA3 code transformation leads to slowdown on Skylake

2020-09-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127 --- Comment #16 from Alexander Monakov --- Mostly because prior to register allocation the compiler does not naturally see that x = *mem + a*b will need an extra mov when both 'a' and 'b' are live (as in that case registers allocated for them can

[Bug target/97127] FMA3 code transformation leads to slowdown on Skylake

2020-09-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127 --- Comment #17 from Alexander Monakov --- To me this suggests that in fact it's okay to carry the combined form in RTL up to register allocation, but RA should decompose it to load+fma instead of inserting a register copy that preserves the live

[Bug target/97194] optimize vector element set/extract at variable position

2020-09-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comm

[Bug target/97194] optimize vector element set/extract at variable position

2020-09-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194 --- Comment #9 from Alexander Monakov --- (In reply to Richard Biener from comment #8) > Note that currently RTL expansion forces a local vector typed variable > to the stack (instead of allocating a pseudo) when there are > variable-index access

[Bug target/97194] optimize vector element set/extract at variable position

2020-09-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194 --- Comment #11 from Alexander Monakov --- Yeah, for inserts such tactic would be inappropriate due to bad store forwarding stalls anyway. As you've shown in earlier comments, inserts have a very nice generic way to expand them (that does not tou

[Bug target/97194] optimize vector element set/extract at variable position

2020-09-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194 --- Comment #14 from Alexander Monakov --- I see, there are more weaknesses than I thought. For CSE (or rather fwprop?) I was thinking about a simpler case where the extracted-from value is loaded from memory, but even in trivial cases RTL optimi

[Bug libgomp/97291] [SIMT] Move SIMT_XCHG_* out of non-uniform execution region

2020-10-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97291 --- Comment #1 from Alexander Monakov --- Reshuffling statements and piling up extra abstraction doesn't help solve the core issue that GIMPLE passes can duplicate any basic block, but basic blocks of SIMT loop epilogue should be protected from t

[Bug middle-end/95189] [9/10 Regression] memcmp being wrongly stripped like strcmp

2020-10-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95189 Alexander Monakov changed: What|Removed |Added Known to fail||9.3.0 Known to work|9.3.0

[Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call

2020-10-09 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comm

[Bug libgomp/98258] Can't compile programs for both OpenMP (CPU) + OpenACC (GPU)

2021-01-04 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98258 --- Comment #5 from Alexander Monakov --- One possible solution is -foffload=-fno-openmp Another possible solution is separate compilation and linking, with only OpenACC enabled at link step (needs explicit -lgomp): gfortran -fopenmp -fopenacc

[Bug libgomp/98258] Can't compile programs for both OpenMP (CPU) + OpenACC (GPU)

2021-01-04 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98258 --- Comment #8 from Alexander Monakov --- (In reply to Chinoune from comment #7) > $ gfortran-10 -O3 -fopenmp -fopenacc -c bug_omp_acc.f90 > $ gfortran-10 bug_omp_acc.o -lgomp -o test.x Contrary to my suggestion, you have omitted -fopenacc from

[Bug libgomp/98258] Can't compile programs for both OpenMP (CPU) + OpenACC (GPU)

2021-01-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98258 --- Comment #10 from Alexander Monakov --- Thanks for checking. As for this: > Please, stop suggesting untested workarounds. Yes, I should have mentioned those are untested. I was typing the response late at night without access to offloading-c

[Bug tree-optimization/98906] New: [8/9/10/11 Regression] Miscompiles code even at -O1

2021-01-31 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98906 Bug ID: 98906 Summary: [8/9/10/11 Regression] Miscompiles code even at -O1 Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal

[Bug tree-optimization/98906] [8/9/10/11 Regression] Miscompiles code even at -O1

2021-02-01 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98906 --- Comment #6 from Alexander Monakov --- Ah, -fsanitize=float-cast-overflow catches it, but it needs to be enabled explicitly (not implied by -fsanitize=undefined). Thank you!

[Bug rtl-optimization/86096] [8 Regression] ICE: qsort checking failed (error: qsort comparator non-negative on sorted output: 0)

2021-02-16 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86096 --- Comment #8 from Alexander Monakov --- It was fixed on the trunk only, so as the title says it remains an issue on the gcc-8 branch (which is still open). Bugzilla doesn't have separate resolutions for different branches, we cannot have this "

[Bug tree-optimization/100363] gcc generating wider load/store than warranted at -O3

2021-05-01 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Com

[Bug c/93031] Wish: When the underlying ISA does not force pointer alignment, option to make GCC not assume it

2021-05-03 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93031 --- Comment #7 from Alexander Monakov --- In comment #2 I touched upon a potentially more practical way to offer -fno-strict-alignment: Run early work with ABI alignments: compute __alignof correctly, lay out composite types as required by ABI,

[Bug other/99903] 32-bit x86 frontends randomly crash while reporting timing on Windows

2021-05-04 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99903 Alexander Monakov changed: What|Removed |Added Ever confirmed|1 |0 Status|WAITING

[Bug c/100618] Add a -fno-semantic-interposition variant which allows variable interposition

2021-05-16 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100618 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Com

[Bug c/100483] Extend -fno-semantic-interposition to global variables

2021-05-16 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100483 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Com

[Bug c/100618] Add a -fno-semantic-interposition variant which allows variable interposition

2021-05-16 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100618 --- Comment #3 from Alexander Monakov --- Furthermore as discussed in bug 100483 this request appears based on a misunderstanding what the 'semantic-' part of the option is about. It does not affect assembly/linker-level binding mechanism, so th

[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function

2021-05-16 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Com

[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function

2021-05-16 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 --- Comment #3 from Alexander Monakov --- I understand what you're saying, but it seems we're talking past each other. I agree that if a library is linked with any -Bsymbolic* flag, the main executable is at risk of broken address uniqueness un

[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function

2021-05-17 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 --- Comment #5 from Alexander Monakov --- Hm, I still don't think I'm misunderstanding what you're saying. I'm familiar with the ELF standard (and FWIW I have read your blog posts on related matters). I am responding to this sentiment from the o

[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function

2021-05-18 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 --- Comment #7 from Alexander Monakov --- Thanks. I agree that inferring address significance on the linker side is problematic. Thinking about your original request, I was about to say that it would be very reasonable to do under -fno-plt flag

[Bug libgomp/100573] [OpenMP] 'omp target teams' fails with nvptx and GCN offloading: FAIL libgomp.c-c++-common/for-3.c + for-9.c

2021-05-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100573 --- Comment #14 from Alexander Monakov --- I would break in gdb on cuModuleGetFunction and x/s $rdx to print the failing symbol (it's the third argument to the function). It seems the "inner" entrypoint (which your patch attempted to nullif

[Bug libgomp/100573] [OpenMP] 'omp target teams' fails with nvptx and GCN offloading: FAIL libgomp.c-c++-common/for-3.c + for-9.c

2021-05-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100573 --- Comment #17 from Alexander Monakov --- Yes, I'd agree normally it's present in the offload table, but ideally if you're trying to stub out the call, it should not be present in the offload table. I think Tobias is saying that on GIMPLE this

[Bug libgomp/100573] [OpenMP] 'omp target teams' fails with nvptx and GCN offloading: FAIL libgomp.c-c++-common/for-3.c + for-9.c

2021-05-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100573 --- Comment #19 from Alexander Monakov --- Ah, does the issue arise because foo._omp_fn.0 is (before the patch) callable in two contexts, in one it's called from host and should be 'omp target entrypoint', and in the other it's called from offlo

[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function

2021-05-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 --- Comment #10 from Alexander Monakov --- Is there something wrong or undesirable with making this under -fno-plt (or the noplt attribute as in your example)? (after all, it is a kind of PLT-avoidance transformation, just for addressing rather

[Bug target/108322] Using __restrict parameter with -ftree-vectorize (default with -O2) results in massive code bloat

2023-01-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108322 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Com

[Bug target/108322] Using __restrict parameter with -ftree-vectorize (default with -O2) results in massive code bloat

2023-01-10 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108322 --- Comment #5 from Alexander Monakov --- (In reply to Richard Biener from comment #4) > > For the case at hand loading two vectors from the destination and then > punpck{h,l}bw and storing them again might be the most efficient thing > to do h

[Bug middle-end/108376] TSVC s1279 runs 40% faster with aocc than gcc at zen4

2023-01-11 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108376 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Com

[Bug target/108401] gcc defeats vector constant generation with intrinsics

2023-01-15 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108401 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Com

[Bug tree-optimization/108487] [10/11/12/13 Regression] ~20-30x slowdown in populating std::vector from std::ranges::iota_view

2023-01-21 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108487 Alexander Monakov changed: What|Removed |Added Component|rtl-optimization|tree-optimization Keyword

[Bug libstdc++/108487] [10/11/12/13 Regression] ~20-30x slowdown in populating std::vector from std::ranges::iota_view

2023-01-21 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108487 Alexander Monakov changed: What|Removed |Added Component|tree-optimization |libstdc++ --- Comment #3 from Alexa

[Bug libgomp/108494] Slow thread creation with nested loops in GFortran

2023-01-23 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108494 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Com

[Bug target/108491] cross compiler does not work: cc1: error: ‘-msecure-plt’ not supported by your assembler

2023-01-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108491 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Com

[Bug rtl-optimization/108519] [13 regression] gcc.target/powerpc/pr105586.c fails after r13-5154-g733a1b777f16cd

2023-01-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108519 --- Comment #1 from Alexander Monakov --- We diverge in sched1 due to extra calls to advance_one_cycle when scheduling a BB that is empty apart from one debug insn. The following patch adds a hexdump of automaton state to make the problem eviden

[Bug rtl-optimization/108519] [13 regression] gcc.target/powerpc/pr105586.c fails after r13-5154-g733a1b777f16cd

2023-01-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108519 --- Comment #3 from Alexander Monakov --- Ah, a worthy sequel to "Note that I wasn't able to figure out a usable email address for the submitter" from PR 107353. Nevermind then.

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Com

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 --- Comment #4 from Alexander Monakov --- Plus, Glibc does use fprem/fprem1 for fmodl/remainderl on x86_64, as well as for {fmod,remainder,remquo}{,f,l} on i386 without any branches for corner cases. So in practice CPUs apparently implement the

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 --- Comment #7 from Alexander Monakov --- I saw that. That's why I'm pointing out that Glibc (and musl) uses the instruction without any additional checks: real CPUs produce the expected result in st(0), despite the documentation making no promi

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 --- Comment #9 from Alexander Monakov --- (In reply to Jan Kratochvil from comment #8) > The revert makes it 13x faster. But the produced code still falls back to > calling glibc fmod() as shown in the disassembly in Comment 0. > If I use the "f

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 --- Comment #15 from Alexander Monakov --- That is the fancy-error-handling path that is reached under _LIB_VERSION != _IEEE_. Before glibc-2.27, linking with -lieee would set _LIB_VERSION = _IEEE_, and then glibc would use the fprem[1] instruct

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 --- Comment #19 from Alexander Monakov --- I get the feeling that you're ignoring me, but gcc-4.8.3 was already emitting a helper fmod call for setting errno without any flag_errno_math checks in i386.md, i.e. it was already in the middle-end. A

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 --- Comment #22 from Alexander Monakov --- Strange, comment #8 claims the opposite (unless Jan tested the revert not on trunk, but on some branch).

[Bug target/108315] -mcpu=power10 changes ABI

2023-02-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315 --- Comment #3 from Alexander Monakov --- Alan implemented the special case of .localentry 1 in this patch for the BFD linker (that appeared in binutils 2.32 if my calculations are correct): https://sourceware.org/pipermail/binutils/2018-July/10

[Bug target/108315] -mcpu=power10 changes ABI

2023-02-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315 --- Comment #4 from Alexander Monakov --- Let me address one point separately: (In reply to Peter Bergner from comment #1) > CCing Alan, since he probably knows best how this all works, but yes, > -mcpu-power10 changes the ABI, namely it adds p

[Bug target/108315] -mcpu=power10 changes ABI

2023-03-02 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315 --- Comment #10 from Alexander Monakov --- (In reply to Rui Ueyama from comment #9) > I'm the maintainer of the mold linker. I didn't implement that POWER10 ABI > because I didn't have an access to a POWER10 machine and therefore couldn't > veri

[Bug target/108315] -mcpu=power10 changes ABI

2023-03-03 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315 --- Comment #14 from Alexander Monakov --- Are you guys really sure you want to blame the user here, considering that all linkers, including the BFD linker, initially misinterpreted the ABI the same way?

[Bug target/108315] -mcpu=power10 changes ABI

2023-03-03 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315 Alexander Monakov changed: What|Removed |Added Resolution|INVALID |--- Status|RESOLVED

[Bug target/108315] -mcpu=power10 changes ABI

2023-03-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315 --- Comment #18 from Alexander Monakov --- It seems you are saying that as long as GCC emits code according to the Holy Scripture that is the ABI spec, everything is fine. I imagine on other architectures maintainers are able to consider how the

[Bug c++/104631] Visibility of static member s yields duplicate symbols.

2022-04-22 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104631 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Com

[Bug rtl-optimization/101347] [11/12 Regression] ICE in cfg_layout_initialize with __builtin_setjmp and -fprofile-generate -fprofile-use

2022-07-20 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101347 Alexander Monakov changed: What|Removed |Added Summary|[11/12/13 Regression] ICE |[11/12 Regression] ICE in

[Bug middle-end/106421] New: ICE with computed goto from a nested functon

2022-07-23 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106421 Bug ID: 106421 Summary: ICE with computed goto from a nested functon Product: gcc Version: unknown Status: UNCONFIRMED Keywords: ice-on-invalid-code Severity: normal

[Bug tree-optimization/106422] [13 Regression] ice in duplicate_block, at cfghooks.cc:1115

2022-07-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106422 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Com

[Bug tree-optimization/106422] [13 Regression] ice in duplicate_block, at cfghooks.cc:1115

2022-07-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106422 --- Comment #4 from Alexander Monakov --- Regarding point 1 above, I should mention that Glibc headers mark both 'vfork' and 'raise' as leaf.

[Bug tree-optimization/106422] [13 Regression] ice in duplicate_block, at cfghooks.cc:1115

2022-07-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106422 --- Comment #7 from Alexander Monakov --- I think item 2 from comment #3 (jump threading) still needs to be solved independently of what is decided about item 1 (leaf functions resuming earlier returns_twice call). --- The problem with 'leaf'

[Bug tree-optimization/106422] [13 Regression] ice in duplicate_block, at cfghooks.cc:1115

2022-07-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106422 --- Comment #8 from Alexander Monakov --- I mean the minimized testcase, the original attachment does execve/_exit after vfork.

[Bug ipa/106437] New: Glibc marks functions that resume a returns_twice call as leaf

2022-07-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106437 Bug ID: 106437 Summary: Glibc marks functions that resume a returns_twice call as leaf Product: gcc Version: 13.0 Status: UNCONFIRMED Keywords: wrong-code

[Bug ipa/106437] Glibc marks functions that resume a returns_twice call as leaf

2022-07-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106437 --- Comment #1 from Alexander Monakov --- With the exception of '_exit', exit family of functions (exit, _Exit, quick_exit) are also marked leaf despite exit and quick_exit invoking atexit/on_exit/at_quick_exit handlers. Only _Exit is specified

[Bug tree-optimization/106422] [13 Regression] ice in duplicate_block, at cfghooks.cc:1115

2022-07-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106422 --- Comment #10 from Alexander Monakov --- The leaf issue is now PR 106437.

[Bug tree-optimization/106422] [13 Regression] ice in duplicate_block, at cfghooks.cc:1115

2022-07-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106422 --- Comment #11 from Alexander Monakov --- A cleaner testcase for jump threading (still ICEs despite presence of ABNORMAL_DISPATCHER): void vfork() __attribute__((__leaf__)); void semanage_reload_policy(char *arg, void cb(void)) { if (!arg) {

[Bug lto/91299] LTO inlines a weak definition in presence of a non-weak definition from an ELF file

2022-07-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91299 --- Comment #11 from Alexander Monakov --- Marxin, you've marked this as WAITING, can you please re-evaluate? The nice testcase from comment #2 is reproducible on trunk as well.

[Bug target/105135] [11/12/13 Regression] Optimization regression for handrolled branchless assignment since r11-4717-g3e190757fa332d32

2022-07-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105135 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Com

[Bug target/106453] New: Redundant zero extension after crc32q

2022-07-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106453 Bug ID: 106453 Summary: Redundant zero extension after crc32q Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target

[Bug tree-optimization/106422] [13 Regression] ice in duplicate_block, at cfghooks.cc:1115

2022-07-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106422 Alexander Monakov changed: What|Removed |Added CC||aldyh at gcc dot gnu.org --- Commen

[Bug target/106453] Redundant zero extension after crc32q

2022-07-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106453 --- Comment #1 from Alexander Monakov --- Any idea if the following is reasonable? It compiles and achieves the desired result. diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index bdde577dd..d82656678 100644 --- a/gcc/config/i3

[Bug middle-end/106470] Subscribed access to __m256i casted to (uint16_t *) produces garbage or a warning

2022-07-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106470 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Com

[Bug middle-end/106470] Subscribed access to __m256i casted to (uint16_t *) produces garbage or a warning

2022-07-29 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106470 --- Comment #8 from Alexander Monakov --- But that's the point of many warnings, isn't it? To help the user understand what's wrong when the code is bad? And bogus warnings just confuse more.

[Bug rtl-optimization/106553] pre-register allocation scheduler is now RMW aware

2022-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106553 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Com

[Bug middle-end/106688] New: leaving SSA emits assignment into the inner loop

2022-08-19 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106688 Bug ID: 106688 Summary: leaving SSA emits assignment into the inner loop Product: gcc Version: 13.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal

[Bug tree-optimization/106781] [13 Regression] ICE: verify_flow_info failed (error: returns_twice call is not first in basic block 2)

2022-08-31 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106781 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Com

[Bug ipa/106783] New: [12/13 Regression] ICE in ipa-modref.cc:analyze_function

2022-08-31 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106783 Bug ID: 106783 Summary: [12/13 Regression] ICE in ipa-modref.cc:analyze_function Product: gcc Version: 13.0 Status: UNCONFIRMED Keywords: ice-on-valid-code

[Bug tree-optimization/106781] [13 Regression] ICE: verify_flow_info failed (error: returns_twice call is not first in basic block 2) since r13-1754-g7a158a5776f5ca95

2022-08-31 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106781 --- Comment #4 from Alexander Monakov --- (In reply to Martin Liška from comment #3) > > Also ICEs in ipa-modref when 'noclone' added to 'noinline', a 12/13 > > regression (different cause, needs a separate PR). > > Can't reproduce Alexander, p

[Bug tree-optimization/106781] [13 Regression] ICE: verify_flow_info failed (error: returns_twice call is not first in basic block 2) since r13-1754-g7a158a5776f5ca95

2022-08-31 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106781 --- Comment #5 from Alexander Monakov --- GCC discovers that 'bar' is noreturn, tries to remove its LHS but unfortunately cgraph.cc:cgraph_edge::redirect_call_stmt_to_callee wants to emit an assignment of SSA default-def to the LHS. fixup_noretu

[Bug middle-end/106804] Poor codegen for selecting and incrementing value behind a reference

2022-09-02 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106804 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Com

[Bug c++/106834] GCC creates R_X86_64_GOTOFF64 for 4-bytes immediate

2022-09-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106834 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Com

[Bug c/106835] [i386] Taking an address of _GLOBAL_OFFSET_TABLE_ produces a wrong value

2022-09-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106835 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Com

[Bug c++/106834] GCC creates R_X86_64_GOTOFF64 for 4-bytes immediate

2022-09-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106834 Alexander Monakov changed: What|Removed |Added CC||hjl.tools at gmail dot com --- Comm

[Bug c/106835] [i386] Taking an address of _GLOBAL_OFFSET_TABLE_ produces a wrong value

2022-09-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106835 --- Comment #3 from Alexander Monakov --- It would be unfortunate if that makes it difficult or even impossible to make a R_386_32 relocation for the address of GOT in hand-written assembly. In any case, it seems GCC is not making the rules her

[Bug c++/106834] GCC creates R_X86_64_GOTOFF64 for 4-bytes immediate

2022-09-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106834 --- Comment #6 from Alexander Monakov --- (In reply to Martin Liška from comment #5) > Do you mean gas or ld? gas > How did you get this output, please (from foo.o or final executable)? >From foo.o like in comment #0.

[Bug c++/106834] GCC creates R_X86_64_GOTOFF64 for 4-bytes immediate

2022-09-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106834 --- Comment #8 from Alexander Monakov --- Right, sorry, due to presence of 'main' I overlooked -fPIC in comment #0, and then after my prompt it got dropped in comment #3. If you modify the testcase as follows and compile it with -fPIC, it's evi

[Bug target/106453] Redundant zero extension after crc32q

2022-09-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106453 Alexander Monakov changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/106834] GCC creates R_X86_64_GOTOFF64 for 4-bytes immediate

2022-09-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106834 --- Comment #10 from Alexander Monakov --- Okay, so this should have been reported against Binutils, but since we are having the conversation here: the current behavior is not good, gas is silently selecting a different relocation kind for no cl

  1   2   3   4   5   >