[Bug target/99378] [8/9/10/11 Regression] ICE in decompose_normal_address, at rtlanal.c:6710
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99378 --- Comment #2 from Vladimir Makarov --- Thank you for reporting this. I've reproduced the bug. The fix will be ready this week.
[Bug target/99422] [11 Regression] ICE in extract_constrain_insn building glibc pthread_create
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99422 --- Comment #6 from Vladimir Makarov --- (In reply to Joseph S. Myers from comment #0) > Created attachment 50314 [details] > preprocessed source > > Commit 9105757a59b890194ebf5b4fcbacd58db34ef332 ("[PR99378] LRA: Skip > decomposing address for asm insn operand with unknown constraint.") > introduced the following ICE building glibc for i686-linux-gnu. Tested with > an x86_64 compiler; compile the attached .i file with the following options: > -m32 -march=i686 -O2 -pg -S > Sorry for causing the troubles. I'll fix this tomorrow.
[Bug target/99422] [11 Regression] ICE in extract_constrain_insn building glibc pthread_create
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99422 --- Comment #7 from Vladimir Makarov --- (In reply to Jakub Jelinek from comment #2) > I see the function is called before selecting a particular alternative, so > perhaps it means to care only about constraints like "X" and "" and not say > that mixed with other constraints etc. > But, shouldn't the code at least skip the =, +, &, % and whitespace from the > start? What about other modifiers (the various disparage slightly etc. > chars)? > And only consider as empty constraint if after those skips constraint is ""? > Not really sure if ",,," constraint is valid... > And, regarding of Eric's change to handle "X" that way, does that really > apply just to MEM and not SUBREG of MEM too? Yes. It seems my bad job on reviewing Richard Sandiford's patch 777e635f1a6c. Before this patch constraint string was checked only for 'p' which can not have modifiers (although spaces are still possible). I am afraid that fixing this mess can result in new failures. But we should do this anyway.
[Bug target/99422] [11 Regression] ICE in extract_constrain_insn building glibc pthread_create
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99422 --- Comment #11 from Vladimir Makarov --- I think I fixed the PR. Although there may be necessity for one more patch to solve other process_address_1 issues. I did not decide this yet.
[Bug target/99461] [11 Regression] ICE in extract_constrain_insn, at recog.c:2670 since r11-7526-g9105757a59b89019
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99461 Vladimir Makarov changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |DUPLICATE CC||vmakarov at gcc dot gnu.org --- Comment #3 from Vladimir Makarov --- It is a duplicate of PR99422. With patch for PR99422, the crash is gone. *** This bug has been marked as a duplicate of bug 99422 ***
[Bug target/99422] [11 Regression] ICE in extract_constrain_insn building glibc pthread_create
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99422 --- Comment #13 from Vladimir Makarov --- *** Bug 99461 has been marked as a duplicate of this bug. ***
[Bug rtl-optimization/99467] [11 Regression] ICE in lra_set_insn_recog_data, at lra.c:1006 since r11-7526-g9105757a59b89019
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99467 Bug 99467 depends on bug 99461, which changed state. Bug 99461 Summary: [11 Regression] ICE in extract_constrain_insn, at recog.c:2670 since r11-7526-g9105757a59b89019 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99461 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |DUPLICATE
[Bug rtl-optimization/99467] [11 Regression] ICE in lra_set_insn_recog_data, at lra.c:1006 since r11-7526-g9105757a59b89019
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99467 Vladimir Makarov changed: What|Removed |Added Resolution|--- |DUPLICATE Status|NEW |RESOLVED --- Comment #2 from Vladimir Makarov --- I confirm it is a duplicate of PR99422. With the patch for PR99422, the crash is gone. *** This bug has been marked as a duplicate of bug 99422 ***
[Bug target/99422] [11 Regression] ICE in extract_constrain_insn building glibc pthread_create
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99422 --- Comment #14 from Vladimir Makarov --- *** Bug 99467 has been marked as a duplicate of this bug. ***
[Bug c/99454] internal compiler error: kernel module tg3 tg3_start_xmit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99454 Vladimir Makarov changed: What|Removed |Added CC||vmakarov at gcc dot gnu.org --- Comment #4 from Vladimir Makarov --- I've reproduced it. Sorry for all the troubles. I'll try to fix it tomorrow.
[Bug c/99454] internal compiler error: kernel module tg3 tg3_start_xmit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99454 --- Comment #6 from Vladimir Makarov --- The patch is not enough. It seems that there are other asms in the test which results in LRA crash.
[Bug target/99422] [11 Regression] ICE in extract_constrain_insn building glibc pthread_create
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99422 --- Comment #20 from Vladimir Makarov --- I started to work on another, more safe approach to solve the problems. I hope the patch will be ready today or tomorrow.
[Bug target/99422] [11 Regression] ICE in extract_constrain_insn building glibc pthread_create
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99422 --- Comment #22 from Vladimir Makarov --- Could you check the patch on the failed bootstraps. I have no access to solaris machines. Thank you.
[Bug target/99422] [11 Regression] ICE in extract_constrain_insn building glibc pthread_create
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99422 --- Comment #25 from Vladimir Makarov --- (In reply to Eric Botcazou from comment #24) > This can be reproduced with a minimal Ada cross-compiler, i.e. you just need > the gnat1 compiler, the skeleton of libada and the command line: > $(srcdir)/configure --target=sparc-sun-solaris2.11 > --enable-languages=c,c++,ada > make CFLAGS=-g CXXFLAGS=-g ADAFLAGS="-gnatpga -gnatws" > cd gcc/ada; make gnatlib > cd rts; ../../gnat1 -quiet a-lfztio.ads -gnatpg > > a-ztflau.adb: In function 'Ada.Long_Float_Wide_Wide_Text_Io.Aux_Float.Put': > a-ztflau.adb:101:8: error: insn does not satisfy its constraints: > (insn 32 31 88 2 (set (mem/c:DF (plus:SI (reg/f:SI 30 %fp) > (const_int -5232 [0xeb90])) [29 %sfp+-5232 S8 > A64]) > (reg:DF 40 %f8 [orig:109 _1 ] [109])) "a-ztflau.adb":99:7 153 > {*movdf_insn_sp32} > (nil)) > Thank you. I reproduced it. The patches probably triggered a hidden bug. I'll investigate more and write my findings today.
[Bug target/99422] [11 Regression] ICE in extract_constrain_insn building glibc pthread_create
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99422 --- Comment #26 from Vladimir Makarov --- Here are my findings. Before the patches function process_address_1 used CONSTRAINT__UNKNOWN (taken from '=' of constraint "=T,..." and this is wrong) to check validity address. It was invalid and LRA added reloads for the address. After the patches, the function uses CONTSTRAINT_T (taken from 'T'). For constraint T sparc code says that the memory address is ok and LRA keeps the address and does not generate reloads. That is wrong. Sparc code should say LRA that the address is wrong. Function sparc.c:memory_ok_for_ldd is responsible for this. If I apply the following patch diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c index f1504172022..ac83f900964 100644 --- a/gcc/config/sparc/sparc.c +++ b/gcc/config/sparc/sparc.c @@ -9230,6 +9230,9 @@ memory_ok_for_ldd (rtx op) if (! can_create_pseudo_p () && !strict_memory_address_p (Pmode, XEXP (op, 0))) return 0; + if (can_create_pseudo_p () + && !memory_address_p (Pmode, XEXP (op, 0))) +return 0; return 1; } the problem is gone. I think target code is responsible for the bug and fix should be there not in LRA.
[Bug target/99581] [11 Regression] internal compiler error: during RTL pass: final - void QTWTF::TCMalloc_PageHeap::scavengerThread() since r11-7526
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99581 --- Comment #4 from Vladimir Makarov --- I've reproduced it too and started to work on it.
[Bug target/99581] [11 Regression] internal compiler error: during RTL pass: final - void QTWTF::TCMalloc_PageHeap::scavengerThread() since r11-7526
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99581 --- Comment #6 from Vladimir Makarov --- (In reply to Segher Boessenkool from comment #5) > Thanks Vladimir. It is indeed a problem in LRA (or triggered by it). > We have > 8: {[r121:DI+low(unspec[`*.LANCHOR0',%2:DI] > 47+0x92a4)]=asm_operands;clobber > > so this is an offset that is too big for a machine instruction, those can > take > -32768..32767. > > Changing the constraint to "m" you get in LRA > Inserting insn reload before: >13: r121:DI=high(unspec[`*.LANCHOR0',%2:DI] 47+0x92a4) > > but this doesn't happen if you keep it "o", and it dies later. The problem is that before the patches we used wrongly '=' as a constraint for memory and after the patches we use (rightly) 'o' as the constraint. The culprit is a function added by Richard Sandiford for arm64 sve: commit 1aeffdce2dfe718e1337d75eb4f22c3c300df9bb Author: Richard Sandiford Date: Mon Nov 18 15:26:07 2019 + LRA: handle memory constraints that accept more than "m" LRA allows address constraints that are more relaxed than "p": /* Target hooks sometimes don't treat extra-constraint addresses as legitimate address_operands, so handle them specially. */ if (insn_extra_address_constraint (cn) && satisfies_address_constraint_p (&ad, cn)) return change_p; For SVE it's useful to allow the same thing for memory constraints. The particular use case is LD1RQ, which is an SVE instruction that addresses Advanced SIMD vector modes and that accepts some addresses that normal Advanced SIMD moves don't. Normally we require every memory to satisfy at least "m", which is defined to be a memory "with any kind of address that the machine supports in general". However, LD1RQ is very much special-purpose: it doesn't really have any relation to normal operations on these modes. Adding its addressing modes to "m" would lead to bad Advanced SIMD optimisation decisions in passes like ivopts. LD1RQ therefore has a memory constraint that accepts things "m" doesn't. ... static bool valid_address_p (rtx op, struct address_info *ad, enum constraint_num constraint) { address_eliminator eliminator (ad); /* Allow a memory OP if it matches CONSTRAINT, even if CONSTRAINT is more forgiving than "m". Need to extract memory from op for special memory constraint, i.e. bcst_mem_operand in i386 backend. */ if (MEM_P (extract_mem_from_operand (op)) && (insn_extra_memory_constraint (constraint) || insn_extra_special_memory_constraint (constraint)) =>&& constraint_satisfied_p (op, constraint)) return true; return valid_address_p (ad->mode, *ad->outer, ad->as); } He actually added the if-stmt. And the condition of this if-stmt is true for our case because constraint_satisfied_p returns true for the memory and CONSTRAINT_o. If the condition were false, we would use machine-dependent legitimate_address_p and it would return false and we would reload the memory address. constraint_satisfied_p returns true because **infrastructure** function offsettable_nonstrict_memref_p returns true for the memory. So you are right it is not ppc64 target code problem. But I am stuck right now how to fix the PR w/o breaking arm sve. Right now I see only adding a machine-dependent hook but I don't like it as we already have too many hooks for RA.
[Bug target/99581] [11 Regression] internal compiler error: during RTL pass: final - void QTWTF::TCMalloc_PageHeap::scavengerThread() since r11-7526
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99581 --- Comment #9 from Vladimir Makarov --- (In reply to Jakub Jelinek from comment #8) > Rather than a target hook, isn't it a property of a particular constraint? > This constraint implies "m", this one doesn't? > Make the implies "m" behavior the default one and add some syntax in the > *.md files to declare constraints that have the new behavior? > Kind like define_memory_constraint vs. define_special_memory_constraint > difference. Yes, I think it is better solution. I guess aarch64 might have the same problem as in this PR. We should have different treatment of memory constraints even for one target.
[Bug target/99581] [11 Regression] internal compiler error: during RTL pass: final - void QTWTF::TCMalloc_PageHeap::scavengerThread() since r11-7526
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99581 --- Comment #10 from Vladimir Makarov --- (In reply to Segher Boessenkool from comment #7) > The addition of those extra args makes clear that the function is no > longer just testing if it is a valid address. It should be renamed. > I don't like it too. When I first look at the patch I though it is a recursive call. For active C++ programmer function overloading is not the problem but imho it is better to rename the function/
[Bug target/99581] [11 Regression] internal compiler error: during RTL pass: final - void QTWTF::TCMalloc_PageHeap::scavengerThread() since r11-7526
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99581 --- Comment #11 from Vladimir Makarov --- Introducing a new memory constraint can take some time. I guess we could switch off the offending code meanwhile because it is compiler crash vs unoptimal generated code choice. I'll investigate how switching the code off affects GCC tests on aarhc64.
[Bug target/99581] [11 Regression] internal compiler error: during RTL pass: final - void QTWTF::TCMalloc_PageHeap::scavengerThread() since r11-7526
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99581 --- Comment #12 from Vladimir Makarov --- (In reply to Vladimir Makarov from comment #11) > Introducing a new memory constraint can take some time. > > I guess we could switch off the offending code meanwhile because it is > compiler crash vs unoptimal generated code choice. > > I'll investigate how switching the code off affects GCC tests on aarhc64. Unfortunately, switching off the code results in 480 GCC test failures on aarhc64. I'll work on introducing new memory constraint. I hope to have a patch and submit it for review on Friday.
[Bug target/99581] [11 Regression] internal compiler error: during RTL pass: final - void QTWTF::TCMalloc_PageHeap::scavengerThread() since r11-7526
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99581 --- Comment #15 from Vladimir Makarov --- I've submitted the patch defining a new memory constraint: https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566976.html The patch itself: https://gcc.gnu.org/pipermail/gcc-patches/attachments/20210319/c979c698/attachment-0001.bin
[Bug target/99663] [11 Regression] ICE in extract_constrain_insn, at recog.c:2670 on s390x-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99663 --- Comment #9 from Vladimir Makarov --- Thank you for reporting this. I've reproduced this crash. ETA of the patch is Monday at worst.
[Bug rtl-optimization/99680] [11 Regression] AddressSanitizer: global-buffer-overflow since g:04b4828c6dd2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99680 --- Comment #2 from Vladimir Makarov --- Sorry for the troubles with my previous patch. I should have not be in hurry to fix PR99663. I'll fix it today. Jakub's patch could be a candidate but I prefer check constraint[0] on '\0'.
[Bug rtl-optimization/99680] [11 Regression] AddressSanitizer: global-buffer-overflow since g:04b4828c6dd2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99680 --- Comment #5 from Vladimir Makarov --- (In reply to Jakub Jelinek from comment #4) > I was worried that letters that introduce multi-letter constraints followed > by '\0' could be a problem too. Or do we rely on those being dropped > already earlier? > Something like "=B" on x86_64 etc. In what I've tried it was dropped during > vregs pass though. > And when cn already is CONSTRAINT__UNKNOWN, performing checks whether to set > it to CONSTRAINT__UNKNOWN is just wasted time. I like more direct approach. Just looking at CONSTRAINT_LEN. Multichracter constraints returns their length, all others (including modifiers and '\0') returns 1. Using CONSTRAINT__UNKNOWN adds one more function (lookup_constraint) in the decision chain. If somebody uses starting character of multi-character constraint without all constraint characters, a lot of things will be broken at least in RA. If this happens RA will read besides constraint string anyway in other RA code places and also RA will also consider garbage after the string as additional constraints and make unwanted reloads. Reading behind constraint string in process_address_1 would have less serious consequences.
[Bug target/99766] [11 Regression] ICE: unable to generate reloads with SVE code since r11-7807-gbe70bb5e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99766 --- Comment #4 from Vladimir Makarov --- (In reply to Alex Coplan from comment #2) > The above ICEs with just -O3 -march=armv8.2-a+sve. Thank you for reporting. I reproduced it тоо. I think соме constraint was not categorized rightly. It might be simply to find and to fix but it will need a lot of testing.
[Bug target/99766] [11 Regression] ICE: unable to generate reloads with SVE code since r11-7807-gbe70bb5e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99766 --- Comment #6 from Vladimir Makarov --- (In reply to rsand...@gcc.gnu.org from comment #5) > I wonder if the CT_RELAXED_MEMORY cases should be following > on from CT_MEMORY rather than CT_SPECIAL_MEMORY. They're really > normal memory constraints that just happen to accept more than > a standard constraint. Yes, this is the right question. The patch I am working on treats CT_SPECIAL_MEMORY the same way as CT_MEMORY everywhere although it is enough to do this only in lra-constraints.c. After finishing testing I'll commit the patch. Unfortunately compiler farm arm64 machines are too slow (2 runs of gcc tests take almost 8 hours to run with -j8). I guess I'll fix it tomorrow.
[Bug target/99787] [11 Regression] ICE in curr_insn_transform, at lra-constraints.c:4133 since r11-7807-gbe70bb5e4babdf9d3d33e8f4658452038407fa8e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99787 Vladimir Makarov changed: What|Removed |Added Resolution|--- |DUPLICATE Status|NEW |RESOLVED --- Comment #2 from Vladimir Makarov --- The patch for PR99766 has fixed this. *** This bug has been marked as a duplicate of bug 99766 ***
[Bug target/99766] [11 Regression] ICE: unable to generate reloads with SVE code since r11-7807-gbe70bb5e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99766 Vladimir Makarov changed: What|Removed |Added CC||marxin at gcc dot gnu.org --- Comment #9 from Vladimir Makarov --- *** Bug 99787 has been marked as a duplicate of this bug. ***
[Bug target/99781] [11 Regression] ICE in partial_subreg_p, at rtl.h:3144
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99781 --- Comment #5 from Vladimir Makarov --- I've reproduced it too and started to work on it. I hope the fix will be ready this week.
[Bug rtl-optimization/96264] [10 Regression] wrong code with -Os -fno-forward-propagate -fschedule-insns -fno-tree-ter
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96264 --- Comment #17 from Vladimir Makarov --- (In reply to Peter Bergner from comment #16) > (In reply to seurer from comment #15) > > It still fails on gcc 10, though > > Vlad, can we get this backported to GCC 10? Maybe in time for GCC 10.3? Nobody complained about this patch since its commit. So I believe we can backport it and the patch should be safe for GCC 10 branch.
[Bug rtl-optimization/96264] [10 Regression] wrong code with -Os -fno-forward-propagate -fschedule-insns -fno-tree-ter
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96264 --- Comment #19 from Vladimir Makarov --- (In reply to Richard Biener from comment #18) > Please somebody do it quick then (not omitting necessary testing, of course). I am working on it. It is my highest priority work. The patch is ready. If the testing is ok (arm64 machines are a bottleneck for me), I'll commit it today.
[Bug rtl-optimization/96264] [10 Regression] wrong code with -Os -fno-forward-propagate -fschedule-insns -fno-tree-ter
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96264 --- Comment #22 from Vladimir Makarov --- I've committed the patch to gcc-10 branch. I also committed patch modifying the test -- see PR99233.
[Bug rtl-optimization/100066] [11 Regression] ICE in lra_assign, at lra-assigns.c:1649
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100066 --- Comment #2 from Vladimir Makarov --- Thank you for reporting this. I've reproduced this bug. It seems something wrong with hard reg live range splitting. This code is complicated so I can not say when it will be fixed but I'll do my best to fix this as soon as possible.
[Bug middle-end/95464] [10 Regression] Miscompilation of mesa on x86_64-linux since r10-6426
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95464 --- Comment #11 from Vladimir Makarov --- (In reply to Jakub Jelinek from comment #7) > Vlad, do you plan to backport this to 10.3? Unfortunately, a few days ago people reported a serious problem with the patch (see PR97313). I've just submitted a patch fixing the new problem https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=bb37ad8cc0fc937c7afcdab471e5d65d176041c3 I believe this new patch should be also cherry-picked to gcc-10 branch.
[Bug middle-end/95464] [10 Regression] Miscompilation of mesa on x86_64-linux since r10-6426
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95464 --- Comment #13 from Vladimir Makarov --- (In reply to Jakub Jelinek from comment #12) > Certainly. I've just committed it into the branch https://gcc.gnu.org/g:70a66ff0228277b4dd89263a663c0a87eb5d782f
[Bug rtl-optimization/97313] [11 Regression] ICE in lra_set_insn_recog_data, at lra.c:1004 since r11-937-g5261cf8ce824bfc7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97313 --- Comment #5 from Vladimir Makarov --- (In reply to Martin Liška from comment #4) > Thank you Vladimir for the fix. > Can we close it now? There are no complaints about the patch for more a week. So I guess the PR can be closed.
[Bug target/97532] [11 Regression] Error: insn does not satisfy its constraints, internal compiler error: in extract_constrain_insn, at recog.c:2196
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97532 --- Comment #4 from Vladimir Makarov --- (In reply to Jakub Jelinek from comment #3) > The IRA dump says: > a2(r189,l0) costs: AREG:2000,2000 DREG:2000,2000 CREG:26000,-1000 > BREG:2000,2000 SIREG:2000,2000 DIREG:2000,2000 AD_REGS:2000,2000 > CLOBBERED_REGS > :2000,2000 Q_REGS:2000,2000 TLS_GOTBASE_REGS:2000,2000 > GENERAL_REGS:2000,2000 SSE_FIRST_REG:12000,12000 NO_REX_SSE_REGS:12000,12000 > SSE_REGS:12000, > 12000 ALL_SSE_REGS:12000,12000 MMX_REGS:26000,26000 INT_SSE_REGS:26000,26000 > ALL_REGS:424000,424000 MEM:12000,12000 > so the INT_SSE_REGS cost of 26000 seems might higher than GENERAL_REGS cost. It does not matter what class IRA gives the address reg. LRA should fix this anyway. For usual memory the correct address reg class is checked by legitimate_address_p hook. I guess in this case it is a special memory and its constraint does not check address reg class at all. I believe the bug should be fixed on machine-dependent side of GCC (in constraints and predicates).
[Bug target/97532] [11 Regression] Error: insn does not satisfy its constraints, internal compiler error: in extract_constrain_insn, at recog.c:2196
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97532 --- Comment #9 from Vladimir Makarov --- (In reply to Hongtao.liu from comment #6) > > > Shouldn't memory_operand (XEXP (op, 0), GET_MODE (XEXP (op, 0))) imply > legitimate_address_p? memory_operand does not imply legitimate_address_p. When LRA processes regular memory it calls legitimate_address_p. But for special memory it did not do this. It is a responsibility of special constraint to check it. That is why it is called special (it might require more constraints on addressing or hard registers can be used). I guess you should check it in the constraint that base and index registers are of the right class.
[Bug target/97870] [11 Regression] ICE: verify_flow_info failed (error: too many outgoing branch edges from bb 2)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97870 --- Comment #2 from Vladimir Makarov --- (In reply to Richard Biener from comment #1) > Possibly related to the asm goto enhancements. This test should work only for x86-64. Running it on other targets can give an error. So error about inconsistent operand constraints is ok. What is wrong is an internal compiler error. The problem is in deleting wrong asm goto in LRA which results in wrong CFG. I've started work on this PR. The fix will be ready this week.
[Bug bootstrap/97933] [11 Regression] Bootstrap failure on s390x-linux
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97933 --- Comment #3 from Vladimir Makarov --- (In reply to Jakub Jelinek from comment #1) > Started with r11-5034-g253c415a1acba50711c82693426391743ac18040 Sorry for causing this error. It is clearly my mistake. I've started to work on this. The fix will be ready tomorrow.
[Bug bootstrap/97983] [11 Regression] Bootstrap failure on s390x-linux related to vec-perm-indices.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97983 --- Comment #3 from Vladimir Makarov --- (In reply to Jakub Jelinek from comment #1) > Created attachment 49623 [details] > vec-perm-indices.ii.xz > > The preprocessed file. I've reproduced the problem. The emitted insn got wrong bb. The patch is on the way. I guess it will be ready today or at most tomorrow.
[Bug rtl-optimization/97954] [11 Regression] ICE in maybe_record_trace_start, at dwarf2cfi.c:2360
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97954 --- Comment #2 from Vladimir Makarov --- (In reply to Martin Liška from comment #1) > Started with r11-5002-ge3b3b59683c1e7d3. Before the patch, gcc just reported an error. Now it is a crash. The problem is not the patch itself but in the loop invariant motion. For some reason, loop invariant motion optimization moves asm goto insn out of the loop as the invariant insn. asm goto is a branch. When we move branch as a regular insn, we have what we have. Here is loop2_invariant dump: ... (jump_insn 8 6 11 4 (parallel [ (set (reg:SI 83 [ x ]) (asm_operands:SI ("") ("=a") 0 [] [] [ (label_ref:DI 21) ] z1.c:6)) (clobber (reg:CC 17 flags)) ]) "z1.c":6:3 -1 (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)) -> 21) ... *ending processing of loop 1 ** Set in insn 8 is invariant (0), cost 4, depends on Decided to move invariant 0 -- gain 4 Invariant 0 moved without introducing a new temporary register changing bb of uid 8 from 4 to 2 starting the processing of deferred insns ending the processing of deferred insns I think a maintainer of loop optimizations should look at this.
[Bug middle-end/95464] [10 Regression] Miscompilation of mesa on x86_64-linux since r10-6426
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95464 --- Comment #8 from Vladimir Makarov --- (In reply to Jakub Jelinek from comment #7) > Vlad, do you plan to backport this to 10.3? I guess so. We had enough time to test it. I don't see any complaints about this patch. I'll backport it today into release/gcc-10 branch.
[Bug rtl-optimization/97313] [11 Regression] ICE in lra_set_insn_recog_data, at lra.c:1004 since r11-937-g5261cf8ce824bfc7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97313 --- Comment #1 from Vladimir Makarov --- Thank you for reporting this. I am trying to find the best place to fix this: either in memory subreg elimination or in match_reload itself. I hope the fix will be ready tomorrow.
[Bug rtl-optimization/97978] [11 Regression] ICE in lra_assign, at lra-assigns.c:1648 since r11-5066-gbe39636d9f68c437
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97978 --- Comment #3 from Vladimir Makarov --- Thank you for reporting it. I've started work on the PR. It seems a rare but dangerous bug and its fix might affect many targets and will require a lot of testing but I try to fix the PR on this week.
[Bug target/97969] [9/10/11 Regression][ARM/Thumb] Certain combo of codegen options leads to compilation infinite loop with growing memory use
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97969 --- Comment #13 from Vladimir Makarov --- Thank you for reducing the test. I've reproduced the problem and started working on it. I think the fix will be ready on this week.
[Bug target/97969] [9/10/11 Regression][ARM/Thumb] Certain combo of codegen options leads to compilation infinite loop with growing memory use
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97969 --- Comment #15 from Vladimir Makarov --- Created attachment 49955 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49955&action=edit a patch fixing the PR
[Bug target/97969] [9/10/11 Regression][ARM/Thumb] Certain combo of codegen options leads to compilation infinite loop with growing memory use
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97969 --- Comment #16 from Vladimir Makarov --- (In reply to Przemyslaw Wirkus from comment #14) > Hi Vladimir, > > I'm assigned to the issue and I'm working on it. I think I got the root > cause. > I'm in the process of creating a patch after I complete few tests. > > kind regards > Przemyslaw Sorry, it was not clear for me that somebody was working on the patch. The bug is actually severe and should be classified as P1. In any case, here is my patch fixing this. The patch is in the attachment. May be your patch will use a better approach.
[Bug target/97847] [11 Regression] ICE in insert_insn_on_edge, at cfgrtl.c:1976
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97847 --- Comment #6 from Vladimir Makarov --- I've reproduced the bug too. The fix will be on the next week.
[Bug rtl-optimization/98722] [11 Regression] ICE in lra_set_insn_recog_data, at lra.c:1004 since r11-6615-gcf2ac1c30af0fa783c8d72e527904dda5d8cc330
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98722 --- Comment #1 from Vladimir Makarov --- Sorry, I can not reproduce this on today trunk using reint.cpp test. I guess it is x86-64. I am using the following configuration (meaning with enabled checking) /home/cygnus/vmakarov/build1/gcc-git/gcc/configure --srcdir=/home/cygnus/vmakarov/build1/gcc-git/gcc --prefix=/home/cygnus/vmakarov/build1/gcc-git/64 --disable-bootstrap --enable-languages=c,c++,fortran Could you check it again or provide more info how to reproduce it.
[Bug rtl-optimization/98722] [11 Regression] ICE in lra_set_insn_recog_data, at lra.c:1004 since r11-6615-gcf2ac1c30af0fa783c8d72e527904dda5d8cc330
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98722 --- Comment #3 from Vladimir Makarov --- (In reply to Martin Liška from comment #2) > It happens for s390x target, so you will need to build a cross compiler with: > --target=s390x-linux-gnu. Thank you, Martin. I've reproduced it on s390x.
[Bug target/97969] [9/10/11 Regression][ARM/Thumb] Certain combo of codegen options leads to compilation infinite loop with growing memory use
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97969 --- Comment #21 from Vladimir Makarov --- Additional fix for PR98722 is necessary for this PR. 4334b524274203125193a08a8485250c41c2daa9
[Bug testsuite/98643] [11 regression] r11-6615 causes failure in gcc.target/powerpc/fold-vec-extract- char.p7.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98643 --- Comment #3 from Vladimir Makarov --- I believe a patch for PR98722 fixes this PR too. 4334b524274203125193a08a8485250c41c2daa9
[Bug rtl-optimization/98777] [11 Regression] ICE in update_equiv at gcc/lra-constraints.c:504 since r11-6819-g4334b52427420312
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98777 --- Comment #1 from Vladimir Makarov --- Thank you for reporting this. I've reproduced the bug on riscv64 and started to work on fixing it.
[Bug target/97969] [9/10/11 Regression][ARM/Thumb] Certain combo of codegen options leads to compilation infinite loop with growing memory use
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97969 --- Comment #22 from Vladimir Makarov --- (In reply to Vladimir Makarov from comment #21) > Additional fix for PR98722 is necessary for this PR. > > 4334b524274203125193a08a8485250c41c2daa9 Sorry, one more fix for PR98777 is necessary for the PR: https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=68ba1039c7daf0485b167fe199ed7e8031158091
[Bug rtl-optimization/97684] [11 Regression] ICE in reg_preferred_class, at reginfo.c:789 by r11-4577
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97684 Vladimir Makarov changed: What|Removed |Added CC||vmakarov at gcc dot gnu.org --- Comment #5 from Vladimir Makarov --- I've reproduced x86-64 case and started to work on it. I think the patch will be ready soon.
[Bug target/97701] [10/11 Regression] aarch64: ICE in extract_constrain_insn since r10-4447-g095f78c6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97701 --- Comment #7 from Vladimir Makarov --- I've reproduced it on gcc-10 branch. For some reason, LRA does not change register class for reload pseudo. This bug will take some time for a fix as the fix will probably affect very sensitive part of LRA and will need a lot of testing. But I guess the patch will be ready on the next week.
[Bug target/97701] [10/11 Regression] aarch64: ICE in extract_constrain_insn since r10-4447-g095f78c6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97701 --- Comment #9 from Vladimir Makarov --- I've committed the patch only to the trunk. I believe the bug on the trunk is still present but not triggered by the test. I'll commit a bit modified patch to gcc 10 branch after some time if there is no problems with the current patch on the trunk.
[Bug target/97701] [10/11 Regression] aarch64: ICE in extract_constrain_insn since r10-4447-g095f78c6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97701 --- Comment #11 from Vladimir Makarov --- (In reply to Jakub Jelinek from comment #10) > (In reply to Vladimir Makarov from comment #9) > > I've committed the patch only to the trunk. I believe the bug on the trunk > > is still present but not triggered by the test. > > See #c5, by using ? 2 : 1 instead of ? 2 : 0 it should trigger it on the > trunk too. Ok. Thank you, Jakub. I've just checked the modified test with the patch on the trunk. The test is successfully compiled. I'll modify the test on the trunk.
[Bug target/97510] [9/10/11 Regression] ICE in check_bool_attrs, at recog.c:2168 since r9-2793-gf6b95f78f8048e2f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97510 --- Comment #1 from Vladimir Makarov --- I cannot reproduce this on today trunk. The bug might be fixed by some recent patches (probably for PR98777).
[Bug rtl-optimization/98777] [11 Regression] ICE in update_equiv at gcc/lra-constraints.c:504 since r11-6819-g4334b52427420312
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98777 --- Comment #5 from Vladimir Makarov --- (In reply to Peter Bergner from comment #4) > Vlad, is this fixed now and we can close it? It's marked as a P1, so would > be nice to close if fixed. I believe it is fixed and we could close the PR but I think a reporter (or release manager) should do this.
[Bug rtl-optimization/98722] [11 Regression] ICE in lra_set_insn_recog_data, at lra.c:1004 since r11-6615-gcf2ac1c30af0fa783c8d72e527904dda5d8cc330
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98722 --- Comment #6 from Vladimir Makarov --- (In reply to Peter Bergner from comment #5) > Another P1 that looks like it might be fixed. Vlad, can we marked this as > fixed? I believe it is fixed and we could close the PR.
[Bug rtl-optimization/96264] [10/11 Regression] wrong code with -Os -fno-forward-propagate -fschedule-insns -fno-tree-ter
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96264 --- Comment #11 from Vladimir Makarov --- I've reproduced this bug and started to work on it. The bug is serious and should be probably considered as P1 one. I try to fix it on this week.
[Bug target/97366] [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366 --- Comment #7 from Vladimir Makarov --- Created attachment 50225 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50225&action=edit A candidate patch
[Bug target/97366] [8/9/10/11 Regression] Redundant load with SSE/AVX vector intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97366 --- Comment #8 from Vladimir Makarov --- I've tried different approaches to fix it. The best patch I have now is in the attachment. Unfortunately, the best patch results in two new failures on ppc64 (other patches are even worse): gcc.target/powerpc/dform-3.c scan-assembler-not mfvsrd gcc.target/powerpc/dform-3.c scan-assembler-not mfvsrld I'll think more how to avoid these 2 failures. If I succeed, I'll submit a patch. But there is probability that the PR will not be fixed at all.
[Bug inline-asm/99123] [8/9/10/11 Regression] ICE in decompose_normal_address, at rtlanal.c:6710
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99123 --- Comment #2 from Vladimir Makarov --- (In reply to G. Steinmetz from comment #0) > Affects versions down to at least r5 at -O1+. > Testfile derived from gcc/testsuite/gcc.target/i386/20020729-1.c > with string "1" changed to "" : > > > $ gcc-11-20210214 -c z1.c -O2 > during RTL pass: reload > z1.c: In function 'foo': > z1.c:48:1: internal compiler error: in decompose_normal_address, at > rtlanal.c:6710 >48 | } > | ^ > 0xb10f70 decompose_normal_address > ../../gcc/rtlanal.c:6710 Thank you for reporting this. I've reproduced the crash and started working on it. I guess the fix will be ready this week.
[Bug rtl-optimization/100328] IRA doesn't model matching constraint well
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328 --- Comment #2 from Vladimir Makarov --- (In reply to Kewen Lin from comment #1) > Created attachment 50715 [details] > ira:consider matching cstr in all alternatives > > With little understanding on ira, I am not quite sure this patch is on the > reasonable direction. It aims to check the matching constraint in all > alternatives, if there is one alternative with matching constraint and > matches the current preferred regclass, it will record the output operand > number and further create one copy for it. Normally it can get the priority > against shuffle copies and the matching constraint will get satisfied with > higher possibility, reload doesn't create extra copies to meet the matching > constraint or the desirable register class when it has to. > > For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly stay as > shuffle copies, and later any of A,B,C,D gets assigned by one hardware > register which is a VSX register but not a FP register, which means it has > to pay costs once we can NOT go with VSX alternatives, so at that time we > can increase the freq for the remaining copies related to this, once the > matching constraint gets satisfied further, there aren't any extra costs to > pay. This idea seems a bit complicated in the current framework, so the > proposed patch aggressively emphasizes the matching constraint at the time > of creating copies. > > FWIW bootstrapped/regtested on powerpc64le-linux-gnu P9. The evaluation with > Power9 SPEC2017 all run shows 505.mcf_r +2.98%, 508.namd_r +3.37%, 519.lbm_r > +2.51%, no remarkable degradation is observed. Thank you for working on this issue. The current implementation of ira_get_dup_out_num was specifically tuned for better register allocation for x86-64 div insns. Your patch definitely improves code for power9 and I would say significantly (congratulations!). The patch you proposed makes me think that it might work for major targets as well. I would prefer to avoid introducing new parameter because there are too many of them already and its description is cryptic. It would be nice if you benchmark the patch on x86-64 too, If there is no overall degradation with new behaviour we could remove the parameter and make the new behaviour as a default. If it is not, well we will keep the parameter. As for the patch itself, I don't like some variable names. Sorry. Could you use op_regno, out_regno, and present_alt instead of op_no, out_no, tot. Please, in general use longer variable names reflecting their purpose as GCC developers reads code in many times more than writing it.
[Bug rtl-optimization/108388] LRA generates RTL that violates constraints
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108388 --- Comment #1 from Vladimir Makarov --- Thank you for reporting this. I've been working on this PR. I believe the PR reveals the problem not only for PDP11. I guess the same can happen for some other targets. I hope the patch will be ready the next week as it requires a good testing for several major targets. Unfortunately, practically any change in LRA might have unexpected effect on other targets.
[Bug tree-optimization/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552 --- Comment #35 from Vladimir Makarov --- (In reply to Jakub Jelinek from comment #34) > Seems right now DECL_NONALIASED is only used on these coverage vars and on > Fortran caf tokens, so perhaps a quick workaround would be on the LRA side > never reread stuff from MEMs with VAR_P && DECL_NONALIASED MEM_EXPRs. CCing > Vlad on that. The following patch can do this: diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc index 7bffbc07ee2..d80a6a9f41d 100644 --- a/gcc/lra-constraints.cc +++ b/gcc/lra-constraints.cc @@ -515,6 +515,7 @@ get_equiv (rtx x) { int regno; rtx res; + tree expr; if (! REG_P (x) || (regno = REGNO (x)) < FIRST_PSEUDO_REGISTER || ! ira_reg_equiv[regno].defined_p @@ -525,6 +526,10 @@ get_equiv (rtx x) { if (targetm.cannot_substitute_mem_equiv_p (res)) return x; + if ((expr = MEM_EXPR (res)) != NULL + && (expr = get_base_address (expr)) != NULL + && VAR_P (expr) && DECL_NONALIASED (expr)) + return x; return res; } if ((res = ira_reg_equiv[regno].constant) != NULL_RTX)
[Bug rtl-optimization/103541] unnecessary spills around const functions calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103541 Vladimir Makarov changed: What|Removed |Added CC||vmakarov at gcc dot gnu.org --- Comment #4 from Vladimir Makarov --- Honza, thank you for reporting this. Fixing just the following code will not solve the problem as LRA uses only equiv expression valid for the whole function. > ret = valid_combine; > if (!MEM_READONLY_P (memref) > && !RTL_CONST_OR_PURE_CALL_P (insn)) > return valid_none; > By the way, the old reload pass still works on the test and producing the same code as LRA currently, also reserving stack slot and using it around the call instead of reload from a. I've been working on this problem and I hope the fix will be ready on the next week.
[Bug tree-optimization/108500] [11/12 Regression] -O -finline-small-functions results in "internal compiler error: Segmentation fault" on a very large program (700k function calls)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108500 --- Comment #20 from Vladimir Makarov --- (In reply to Richard Biener from comment #14) > Thanks for the new testcase. With -O0 (and a --enable-checking=release > built compiler) this builds in ~11 minutes (on a Ryzen 9 7900X) with > > integrated RA : 38.96 ( 6%) 1.94 ( 20%) 42.00 ( > 6%) 3392M ( 23%) > LRA non-specific : 18.93 ( 3%) 1.24 ( 13%) 23.78 ( > 4%) 450M ( 3%) > LRA virtuals elimination : 5.67 ( 1%) 0.05 ( 1%) 5.75 ( > 1%) 457M ( 3%) > LRA reload inheritance : 318.25 ( 49%) 0.24 ( 2%) 318.51 ( > 48%) 0 ( 0%) > LRA create live ranges : 199.24 ( 31%) 0.12 ( 1%) 199.38 ( > 30%) 228M ( 2%) > 645.67user 10.29system 11:04.42elapsed 98%CPU (0avgtext+0avgdata > 30577844maxresident)k > 3936200inputs+1091808outputs (122053major+10664929minor)pagefaults 0swaps > I've tried test-1M.i with -O0 for clang-14. It took about 12hours on E5-2697 v3 vs about 30min for GCC. The most time (99%) of clang is spent in "fast register allocator": Total Execution Time: 42103.9395 seconds (42243.9819 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 41533.7657 ( 99.5%) 269.5347 ( 78.6%) 41803.3005 ( 99.3%) 41942.4177 ( 99.3%) Fast Register Allocator 139.1669 ( 0.3%) 16.4785 ( 4.8%) 155.6454 ( 0.4%) 156.3196 ( 0.4%) X86 DAG->DAG Instruction Selection I've tried the same for -O1. Again gcc took about 30min and I stopped clang (with another used RA algorithm) after 120hours. So the situation with RA is not so bad for GCC. But in any case I'll try to improve the speed for this case. > so register allocation taking all of the time. There's maybe the possibility > to gate some of its features on the # of BBs or insns (or whatever the actual > "bad" thing is - I didn't look closer yet). > > It also seems to use 30GB of peak memory at -O0 ... > I see only 3GB. Improving this is hard task. The IRA for -O0 uses very simple algorithm with usage of very few resources. We could use even simpler method (assigning memory only for all pseudos) but I think it does not worth to do as the generated code will be much bigger and probably will be 1.5-2 times slower.
[Bug middle-end/108754] [13 Regression] multiple testsuite errors with r13-5761-g10827a92f1a8c3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108754 --- Comment #1 from Vladimir Makarov --- I think the problem is that cris uses the old reload pass. Could you check the following patch: diff --git a/gcc/ira.cc b/gcc/ira.cc index d0b6ea062e8..9f9af808f63 100644 --- a/gcc/ira.cc +++ b/gcc/ira.cc @@ -3773,7 +3773,7 @@ update_equiv_regs (void) { note = set_unique_reg_note (insn, REG_EQUIV, replacement); } - else + else if (ira_use_lra_p) { /* We still can use this equivalence for caller save optimization in LRA. Mark this. */
[Bug middle-end/108754] [13 Regression] multiple testsuite errors with r13-5761-g10827a92f1a8c3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108754 --- Comment #4 from Vladimir Makarov --- (In reply to Hans-Peter Nilsson from comment #3) > (In reply to Vladimir Makarov from comment #1) > > I think the problem is that cris uses the old reload pass. Could you check > > the following patch: > > Will do, thanks! OK. I'll submit the patch then.
[Bug middle-end/108754] [13 Regression] multiple testsuite errors with r13-5761-g10827a92f1a8c3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108754 --- Comment #9 from Vladimir Makarov --- (In reply to Hans-Peter Nilsson from comment #8) > My test-run with the suggested change on top of r13-5761-g10827a92f1a8c3 > came out clean (all regressions resolved, no new ones added) so I'll close > this issue. Thanks for promptly taking care of this! Thank you for your help. And sorry for the inconveniences because of my patch. It is hard to do changes in RA as they might affect different targets in some unexpected way.
[Bug rtl-optimization/108774] [13 Regression] ICE: in get_equiv, at lra-constraints.cc:534 with -Os -ftrapv -mcmodel=large
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108774 --- Comment #1 from Vladimir Makarov --- Thank you for reporting this. I'll try to fix it as soon as possible, today or tomorrow.
[Bug target/108145] [13 regression] ICE in from_reg_br_prob_base, at profile-count.h:259
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108145 --- Comment #6 from Vladimir Makarov --- FYI, I think my patch did not cause this problem. I've just check fresh trunk (w/o my patch and the compilation still fails). So the PR probably should be still open.
[Bug rtl-optimization/108999] Maybe LRA produce inaccurate hardware register occupancy information for subreg operand
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108999 Vladimir Makarov changed: What|Removed |Added CC||vmakarov at gcc dot gnu.org --- Comment #1 from Vladimir Makarov --- Thank you for filling this PR up. I am going to fix this on the next week.
[Bug target/108141] [13 Regression] gcc.target/i386/pr64110.c FAIL since r13-4727 on ia32
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108141 --- Comment #7 from Vladimir Makarov --- (In reply to Jakub Jelinek from comment #6) > The change has been reverted, so this is no longer a regression. Just for the info. The patch I reverted resulted in wrong calculation of pressure classes (there was a single pressure class ALL_REGS). This affected register pressure calculation and as a consequence using one region only. W/o the patch IRA uses regional register allocation for the loops in the test. I pushed another patch for PR90706. I hope it will not create such problems as the previous patch.
[Bug target/104637] [9/10/11/12 Regression] ICE: maximum number of LRA assignment passes is achieved (30) with -Og -fno-forward-propagate -mavx since r9-5221-gd8fcab689435a29d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104637 --- Comment #3 from Vladimir Makarov --- (In reply to Jakub Jelinek from comment #2) > If I change the testcase to following (so that it doesn't rely on > __builtin_convertvector), it started ICEing with > r0-122162-gb7aa4e9afcd3da4f09d6f982a663ea2094b1f2cf > typedef short __attribute__((__vector_size__ (64))) U; > typedef unsigned long long __attribute__((__vector_size__ (32))) V; > typedef long double __attribute__((__vector_size__ (64))) F; > > int i; > U u; > F f; > > void > foo (char a, char b, _Complex char c, V v) > { > u = (U) { u[0] / 0, u[1] / 0, u[2] / 0, u[3] / 0, u[4] / 0, u[5] / 0, u[6] > / 0, u[7] / 0, > u[8] / 0, u[0] / 0, u[9] / 0, u[10] / 0, u[11] / 0, u[12] / 0, > u[13] / > 0, u[14] / 0, u[15] / 0, > u[16] / 0, u[17] / 0, u[18] / 0, u[19] / 0, u[20] / 0, u[21] / 0, > u[22] > / 0, u[23] / 0, > u[24] / 0, u[25] / 0, u[26] / 0, u[27] / 0, u[28] / 0, u[29] / 0, > u[30] > / 0, u[31] / 0 }; > c += i; > f = (F) { v[0], v[1], v[2], v[3] }; > i = (char) (__imag__ c + i); > } > > In any case, I don't see anything wrong on the GIMPLE side and it isn't > clear on reloading which insn it is ICEing. It is a pitfall of LRA hard reg split subpass. It is a small subpass used as the last resort for LRA when it can not assign a hard reg to a reload pseudo by other ways (e.g. by spilling non-reload pseudos). For simplicity the subpass works on one split base (as each split changes pseudo live range info). To solve the problem the subpass should make as many splits as possible. This requires to check overlapping hard reg splits. In other words, the subpass should be considerably modified. I hope to commit the patch on the next week.
[Bug target/104686] [12 Regression] Huge compile-time regression building SPEC 2017 538.imagick_r with -march=skylake
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104686 --- Comment #19 from Vladimir Makarov --- (In reply to Richard Biener from comment #16) > it doesn't make a difference for this testcase but profiling shows that > allocnos_conflict_p is quite expensive so it's best to do it after the other > continue checks like the following. I also notice that the comment of > allocnos_conflict_p says > > /* Return TRUE if allocnos A1 and A2 conflicts. Here we are >interesting only in conflicts of allocnos with intersected allocno >classes. */ > > so doing it after the ira_reg_classes_intersect_p check makes even more > sense(?) > > diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc > index 8b6db1bb417..a5fd79484eb 100644 > --- a/gcc/ira-color.cc > +++ b/gcc/ira-color.cc > @@ -1572,15 +1572,14 @@ update_conflict_hard_regno_costs (int *costs, enum > reg_class aclass, > else > gcc_unreachable (); > > + another_aclass = ALLOCNO_CLASS (another_allocno); > if (another_allocno == from > + || ALLOCNO_ASSIGNED_P (another_allocno) > + || ALLOCNO_COLOR_DATA (another_allocno)->may_be_spilled_p > + || ! ira_reg_classes_intersect_p[aclass][another_aclass] > || allocnos_conflict_p (another_allocno, start)) > continue; > > - another_aclass = ALLOCNO_CLASS (another_allocno); > - if (! ira_reg_classes_intersect_p[aclass][another_aclass] > - || ALLOCNO_ASSIGNED_P (another_allocno) > - || ALLOCNO_COLOR_DATA (another_allocno)->may_be_spilled_p) > - continue; > class_size = ira_class_hard_regs_num[another_aclass]; > ira_allocate_and_copy_costs > (&ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (another_allocno), > > If it is allocnos_conflict_p takes significant time, this change definitely has sense. On my estimation it will decrease allocnos_conflict_p calls in about 4 times (assuming fp and int reg classes and half allocnos already assigned). In any case, the above change is profitable as allocnos_conflict_p practically always takes more time than the condition tests moved up. > Now, what's more odd is that we sometimes have a nice bitmap representation > for the conflicts but we always iterate. So it _seems_ we should be able > to do sth like > > diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc > index 8b6db1bb417..682d1ef7562 100644 > --- a/gcc/ira-color.cc > +++ b/gcc/ira-color.cc > @@ -1352,9 +1352,23 @@ allocnos_conflict_p (ira_allocno_t a1, ira_allocno_t > a2) > { >obj = ALLOCNO_OBJECT (a1, word); >/* Take preferences of conflicting allocnos into account. */ > - FOR_EACH_OBJECT_CONFLICT (obj, conflict_obj, oci) > - if (OBJECT_ALLOCNO (conflict_obj) == a2) > - return true; > + if (!OBJECT_CONFLICT_VEC_P (obj)) > + { > + for (int w2 = 0; w2 < ALLOCNO_NUM_OBJECTS (a2); w2++) > + { > + ira_object_t obj2 = ALLOCNO_OBJECT (a2, w2); > + if (OBJECT_CONFLICT_ID (obj2) >= OBJECT_MIN (obj) > + && OBJECT_CONFLICT_ID (obj2) <= OBJECT_MAX (obj) > + && TEST_MINMAX_SET_BIT (OBJECT_CONFLICT_BITVEC (obj), > + OBJECT_CONFLICT_ID (obj2), > + OBJECT_MIN (obj), OBJECT_MAX > (obj))) > + return true; > + } > + } > + else > + FOR_EACH_OBJECT_CONFLICT (obj, conflict_obj, oci) > + if (OBJECT_ALLOCNO (conflict_obj) == a2) > + return true; > } >return false; > } > > which reduces compile-time from 10s to 1s for me ... the above should > be split out so we can "optimally" use the bit test for > object vs. allocno when possible. > > Vlad - any thoughts about the above two things? Shall I try to polish and > optimize the bit test or would you be willing to pick those two speedups up? This change also has sense. Usually for big functions conflict sets are very sparse and bit vectors are not used. But it seems this is not the case for the PR. Please, polish and optimize the change as you proposed and I approve the final version promptly. Thank you for working on this PR, Richard.
[Bug rtl-optimization/104637] [9/10/11/12 Regression] ICE: maximum number of LRA assignment passes is achieved (30) with -Og -fno-forward-propagate -mavx since r9-5221-gd8fcab689435a29d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104637 --- Comment #6 from Vladimir Makarov --- (In reply to CVS Commits from comment #5) > The master branch has been updated by Jakub Jelinek : > > https://gcc.gnu.org/g:d7b4c8feee11ea04b83f9996654c96b130588570 > > commit r12-7449-gd7b4c8feee11ea04b83f9996654c96b130588570 > Author: Jakub Jelinek > Date: Wed Mar 2 11:04:35 2022 +0100 > > testsuite: Fix up pr104637 testcase [PR104637] > > This testcase FAILs everywhere for 3 reasons: > 1) the testcase can't work on ia32, where sizeof (long double) == 12 >and as it is not a power of 2, we disallow creating vectors with such >elements, -mx32 and -m64 are fine > 2) the testcase emits a lot of -Wdiv-by-zero warnings, I've just added >-Wno-div-by-zero to dg-options > 3) my fault, when tweaking the testcase I've missed 33 initializers of >a 32 element vector which didn't change anything on the ICE, but is >still reported > > This patch fixes all of it, tested with > RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} i386.exp=pr104637.c' > both without the LRA fix where it ICEs and with it where it passes > everywhere. > > 2022-03-02 Jakub Jelinek > > PR rtl-optimization/104637 > * gcc.target/i386/pr104637.c: Don't run on ia32. Add > -Wno-div-by-zero > to dg-options. > (foo): Remove extraneous initializer. Sorry, I should have been more careful with using the original test. And thank you for fixing this, Jakub.
[Bug rtl-optimization/104961] [9/10/11/12 Regression] compilation never (?) finishes at -Og
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104961 --- Comment #2 from Vladimir Makarov --- I've reproduced the bug. The mentioned patch is not the cause but a trigger. The origin of the problem is actually a removal of hard reg propagation before RA which happened about year ago. I hope the fix will be ready on Friday or Monday.
[Bug middle-end/105032] Compiling inline ASM x86 causing GCC stuck in an endless loop with 100% CPU usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105032 --- Comment #9 from Vladimir Makarov --- Cycling is the worst what can happen to compiler (even crash is better). This is the highest priority PR right now for me. I can not say why the cycle does not finish. It should as it works only for reload pseudos. I'll investigate it more. In any case I hope to fix it on this week. Sorry for inconvenience.
[Bug middle-end/105032] Compiling inline ASM x86 causing GCC stuck in an endless loop with 100% CPU usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105032 --- Comment #10 from Vladimir Makarov --- I've reproduced the bug also on the trunk. The loop in question assumes a specific order for reload insns. In this case order of insns involving the reload pseudos is violated because the pseudo is also used for inheritance. We can change the loop condition to guarantee its finish independently of the reload insns order. It might results in failure of hard reg live range splitting for the pseudo. Permitting hard reg splitting for reload pseudo involved in inheritance is questionable with LRA correct work and generated code efficiency. So it has no sense for me to do this. The patch will be pushed to trunk right after finishing testing.
[Bug middle-end/105032] Compiling inline ASM x86 causing GCC stuck in an endless loop with 100% CPU usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105032 --- Comment #12 from Vladimir Makarov --- GCC-11 branch needs a bit different patch. I'll commit a modified patch to gcc-11 branch on Friday.
[Bug target/105136] [11/12 regression] Missed optimization regression with 32-bit adds and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105136 --- Comment #4 from Vladimir Makarov --- I am just saying trivial things here that RA is a NP-complete task and there is no optimal solution for all tests. For GCC it is even more complicated as RA solves code selection tasks too. Basically we have for this test p91=di p92=si ... p89=p92+p87 (dead p92) p97=p91>>const (dead p91) p83=flags?p87:p89 (dead p87, p89) ax=p83 RA creates the following relations (to propagate assignment costs) for pseudos p83(ax preferred)---p87---p91(di preferred) \ \--p89---p92(si preferred) Only assignment ax for p89 can create the desired code. Relation costs of p87--p91 and p89--p92 or p83--p87 and p83--p89 are the same even if we use --param ira-consider-dup-in-all-alts=1. To get the right guaranteed solution we need some greedy algorithm which will take a lot of time to work and check results not only at the end of IRA but at the end LRA. I can revert meaningful changes of the patch which resulted in this degradation. But as I can see this creates 3 new test failures for tests avx512fp16-conjugation-1.c and avx512fp16vl-conjugation-1.c. Also I can not guarantee that such change will not result in more serious benchmark (e.g. SPEC) degradation. But in any I can try to do this. Although I am not sure taht it is worth to do this at this stage of gcc-12 release work. Richard and Jakub, what your thoughts about reverting my patch in question?
[Bug target/103676] [10/11/12 Regression] internal compiler error: in extract_constrain_insn, at recog.c:2671
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103676 --- Comment #21 from Vladimir Makarov --- (In reply to Jakub Jelinek from comment #19) > r10-3981-gf6ff841bc8dd87ce364deb217dc6d1ec5dc31de8 still doesn't ICE, > r10-3984-g22060d0e575e7754eb1355763d22bbe37c3caa13 already ICEs. > > I guess there is a disagreement between LRA and recog on how exactly they > treat register constraints. > "=lh" for TARGET_THUMB means LO_REGS or HI_REGS classes for the output, bet > LRA sees that LO_REGS or HI_REGS is together GENERAL_REGS and picks a > GENERAL_REGS > (reg:DI 7 r7 [orig:119 tmp ] [119]). But that one has one half in LO_REGS > and another half in HI_REGS and so extract_constrain_insn -> > constrain_operands > doesn't consider it as matching. Interesting case. To find required (reload) register class, LRA (as also the old reload pass) makes some union of register classes in one alternative which contains all or part of the registers of the classes (in this case it is general reg class). The problem can be solved w/o fixing LRA (and reload pass) by using asm volatile( "ldrd %Q[r], %R[r], %[p]\n" : [r]"=l,h"(tmp) : [p]"m,m"(*p64) : "memory" ); The problem can be solved in LRA by more complex representation of required reg classes (still reload should have also such fix). I guess it will complicate LRA and reload code a lot. We could also use more clear description of semantics of constraints currently used by LRA/reload. In this case we still need to output more meaningful error for LRA/reload instead of just internal compiler error.
[Bug target/103676] [10/11/12 Regression] internal compiler error: in extract_constrain_insn, at recog.c:2671
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103676 --- Comment #23 from Vladimir Makarov --- (In reply to Jakub Jelinek from comment #22) > If we consider such an inline asm invalid, we could error on it, ICE is not > the right thing. But what exactly should we error on? Alternative I think it is better to fix it in LRA than describing the semantics. I am starting to work on it and will look how the fix is going. If it is too complicated, we could try another solution (with describing the current semantics). In any case, I think it is not worth to fix the same existing problem in the old reload pass. > containing multiple register classes for multi-word operands is still > something used quite commonly in real-world, the problem is when the RA > assigns it a reg spanning across those. Or do most backends restrict > multi-word regs to start at a reg number divisible by the number of words > they need?
[Bug rtl-optimization/104049] [12 Regression] vec_select to subreg lowering causes superfluous moves
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104049 --- Comment #3 from Vladimir Makarov --- (In reply to Richard Biener from comment #2) > We need to understand the issue at least. I think that it is not an RA problem. IRA assigns quite reasonable registers. LRA just generates 2 reloads for this test, one for insn *add_lsr_si which has only one alternative and one for insn andsi3 which needs reload insns for any alternative and LRA in this case chooses the best one. I guess the problem of the code generation regression is in some recent changes of combiner or most probably aarch64 machine dependent code directing the combiner (as Tamar wrote). It would be nice if somebody bisected and found what commit resulted in the regression. As for double transfer of the value, it could be removed by inheritance in LRA but it is impossible as an input reload pseudo got the same hard register (in LRA assignment subpass) as one of the insn output pseudo (the assignment was done in IRA) and the reloaded value is still used in subsequent insn. Unfortunately it can happen as RA can not make allocation and code selection optimally in general case. Some coordination between LRA-assignment subpass and LRA-inheritance subpass could help to avoid the double transfer but right now I have no idea how to do this. It is also dangerous to implement such coordination at this stage as LRA-inheritance sub-pass is very complicated.
[Bug middle-end/103616] [9/10/11/12 Regression] ICE on ceph with systemtap macro since r8-5608
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103616 --- Comment #1 from Vladimir Makarov --- I can not reproduce ICE on this week GCC. Probably it was fixed (or switched off) by some recent RA patch. As for the second issue (code generation for function foo), I thought for some time how it could be fixed. It seemed that LRA inheritance sub-pass could be extended to work on memory too besides regs. But I got to conclusion that it would complicate already complicated LRA (inheritance subpass) more as we need to add sophisticated analysis (including aliasing) for memory. I guess there is an simpler alternative solution. The problem would disappear if double constant were in asm insn before LRA. I think some pass before RA could this. It could be driven by a target, for example to promote double constants for x86-64. Also the problem might be solved if we had pseudo<-double insn instead of mem<-double insn before LRA, LRA code dealing with equiv could promote double into the asm insn (although I am not 100% sure about this but, if it is not the case, probably code dealing with equiv could be tweaked to do this). So my proposal is to solve the problem somehow outside RA.
[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #26 from Vladimir Makarov --- (In reply to Richard Biener from comment #7) > make costs in a way that IRA/LRA prefer re-materialization of constants > from the constant pool over spilling to GPRs (if that's possible at all - > Vlad?) LRA rematerialization can not rematerialize constant value from memory pool. It can rematerialize value of expression only consisting of other pseudos (currently assigned to hard regs) and constants. I guess rematerialization pass can be extended to work for constants from constant memory pool. It is pretty doable project opposite to rematerialization of any memory which would require a lot analysis including aliasing and complicated cost calculation benefits. May be somebody could pick this project up.
[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #27 from Vladimir Makarov --- (In reply to Richard Biener from comment #17) > So in .reload we have (with unpatched trunk) > > 401: NOTE_INSN_BASIC_BLOCK 6 > 462: ax:DF=[`*.LC0'] > REG_EQUAL 9.8506899724167309977929107844829559326171875e-1 > 407: xmm2:DF=ax:DF > 463: ax:DF=[`*.LC0'] > REG_EQUAL 9.8506899724167309977929107844829559326171875e-1 > 408: xmm4:DF=ax:DF > > why??! We can load .LC0 into xmm4 directly. IRA sees > > 401: NOTE_INSN_BASIC_BLOCK 6 > 407: r118:DF=r482:DF > 408: r119:DF=r482:DF > > now I cannot really decipher IRA or LRA dumps but my guess would be that > inheritance (causing us to load from LC0) interferes badly with register > class assignment? > > Changing pseudo 482 in operand 1 of insn 407 on equiv > 9.8506899724167309977929107844829559326171875e-1 > ... > alt=21,overall=9,losers=1,rld_nregs=1 > Choosing alt 21 in insn 407: (0) v (1) r {*movdf_internal} > Creating newreg=525, assigning class GENERAL_REGS to r525 > 407: r118:DF=r525:DF > Inserting insn reload before: > 462: r525:DF=[`*.LC0'] > REG_EQUAL 9.8506899724167309977929107844829559326171875e-1 > > we should have preferred alt 14 I think (0) v (1) m, but that has > > alt=14,overall=13,losers=1,rld_nregs=0 > 0 Spill pseudo into memory: reject+=3 > Using memory insn operand 0: reject+=3 > 0 Non input pseudo reload: reject++ > 1 Non-pseudo reload: reject+=2 > 1 Non input pseudo reload: reject++ > alt=15,overall=28,losers=3 -- refuse > 0 Costly set: reject++ > alt=16: Bad operand -- refuse > 0 Costly set: reject++ > 1 Costly loser: reject++ > 1 Non-pseudo reload: reject+=2 > 1 Non input pseudo reload: reject++ > alt=17,overall=17,losers=2 -- refuse > 0 Costly set: reject++ > 1 Spill Non-pseudo into memory: reject+=3 > Using memory insn operand 1: reject+=3 > 1 Non input pseudo reload: reject++ > alt=18,overall=14,losers=1 -- refuse > 0 Spill pseudo into memory: reject+=3 > Using memory insn operand 0: reject+=3 > 0 Non input pseudo reload: reject++ > 1 Costly loser: reject++ > 1 Non-pseudo reload: reject+=2 > 1 Non input pseudo reload: reject++ > alt=19,overall=29,losers=3 -- refuse > 0 Non-prefered reload: reject+=600 > 0 Non input pseudo reload: reject++ > alt=20,overall=607,losers=1 -- refuse > 1 Non-pseudo reload: reject+=2 > 1 Non input pseudo reload: reject++ > > I'm not sure I can decipher the reasoning but I don't understand how it > doesn't seem to anticipate the cost of reloading the GPR in the alternative > it chooses? > > Vlad? All this diagnostics is just description of voodoo from the old reload pass. LRA choosing alternative the same way as the old reload pass (I doubt that any other approach will not break all existing targets). Simply the old reload pass does not report its decisions in the dump. LRA code (lra-constraints.cc::process_alt_operands) choosing the insn alternatives (as the old reload pass) does not use any memory or register move costs. Instead, the alternative is chosen by heuristics and insn constraints hints (like ? !). The only case where these costs are used, when we have reg:=reg and the register move costs for this is 2. In this case LRA(reload) does not bother to check the insn constraints.
[Bug target/104117] [9,10,11,12 Regression] Darwin ppc64 uses invalid non-PIC address to access constants (in PIC code).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104117 --- Comment #13 from Vladimir Makarov --- I think there are two code spots whose pitfalls resulted in the PR. The first one is in rs6000.cc::legitimate_lo_sum_address_p which permits wrong pic low-sum address. Another one is in lra-constraints.cc::process_address_1 which permits put wrong low-sum address in reg and use the reg in memory. The following patch solves the problem: diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 5404fb18755..306f67f26c4 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -8202,7 +8202,7 @@ legitimate_lo_sum_address_p (machine_mode mode, rtx x, int strict) { bool large_toc_ok; - if (DEFAULT_ABI == ABI_V4 && flag_pic) + if ((DEFAULT_ABI == ABI_V4 || DEFAULT_ABI == ABI_DARWIN) && flag_pic) return false; /* LRA doesn't use LEGITIMIZE_RELOAD_ADDRESS as it usually calls push_reload from reload pass code. LEGITIMIZE_RELOAD_ADDRESS diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c index 30d088afbca..998e82be54f 100644 --- a/gcc/lra-constraints.c +++ b/gcc/lra-constraints.c @@ -3517,21 +3517,8 @@ process_address_1 (int nop, bool check_only_p, *ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr); if (!valid_address_p (op, &ad, cn)) { - /* Try to put lo_sum into register. */ - insn = emit_insn (gen_rtx_SET - (new_reg, -gen_rtx_LO_SUM (Pmode, new_reg, addr))); - code = recog_memoized (insn); - if (code >= 0) - { - *ad.inner = new_reg; - if (!valid_address_p (op, &ad, cn)) - { - *ad.inner = addr; - code = -1; - } - } - + *ad.inner = addr; + code = -1; } } if (code < 0) The patch was successfully tested on x86-64/ppc64 under Linux.
[Bug rtl-optimization/104400] [12 Regression] v850e lra/reload failure after recent change
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104400 --- Comment #1 from Vladimir Makarov --- Thank you for reporting this, Jeff. I've reproduced the bug. I hope to fix this on this week.
[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #28 from Vladimir Makarov --- Could somebody benchmark the following patch on zen2 470.lbm. diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc index 9cee17479ba..76619aca8eb 100644 --- a/gcc/lra-constraints.cc +++ b/gcc/lra-constraints.cc @@ -5084,7 +5089,9 @@ lra_constraints (bool first_p) (x, lra_get_allocno_class (i)) == NO_REGS)) || contains_symbol_ref_p (x ira_reg_equiv[i].defined_p = false; - if (contains_reg_p (x, false, true)) + if (contains_reg_p (x, false, true) + || (CONST_DOUBLE_P (x) + && maybe_ge (GET_MODE_SIZE (GET_MODE (x)), 8))) ira_reg_equiv[i].profitable_p = false; if (get_equiv (reg) != reg) bitmap_ior_into (equiv_insn_bitmap, &lra_reg_info[i].insn_bitmap); If it improves the performance, I'll commit this patch. The expander unconditionally uses memory pool for double constants. I think the analogous treatment could be done for equiv double constants in LRA. I know only x86_64 permits 64-bit constants as immediate for moving them into general regs. As double fp operations is not done in general regs in the most cases, they should be moved into fp regs and this is costly as Jan wrote. So it has sense to prohibit using equiv double constant values in LRA unconditionally. If in the future we have a target which can move double immediate into fp regs we can introduce some target hooks to deal with equiv double constant. But right now I think there is no need for the new hook.
[Bug target/104117] [9,10,11,12 Regression] Darwin ppc64 uses invalid non-PIC address to access constants (in PIC code).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104117 --- Comment #21 from Vladimir Makarov --- (In reply to Iain Sandoe from comment #20) > (In reply to Iain Sandoe from comment #15) > > (In reply to Vladimir Makarov from comment #13) > > > I think there are two code spots whose pitfalls resulted in the PR. > > > > --- a/gcc/config/rs6000/rs6000.c > > > +++ b/gcc/config/rs6000/rs6000.c > > > @@ -8202,7 +8202,7 @@ legitimate_lo_sum_address_p (machine_mode mode, rtx > > > x, > > > int strict) > > > { > > >bool large_toc_ok; > > > > > > - if (DEFAULT_ABI == ABI_V4 && flag_pic) > > > + if ((DEFAULT_ABI == ABI_V4 || DEFAULT_ABI == ABI_DARWIN) && > > > flag_pic) > > > return false; > > On testing, this is not sufficient - one ends up with ICEs when we reject a > valid (UNSPEC-wrapped) address here. So I think that the slightly more > elaborate target changes are required - but the LRA change seems fine! > > ... reg-straps on this old h/w take > 1 day .. so some more time will be > needed for a complete answer. Ian, you have my approval for LRA changes in advance for committing them into the master and the branches when the overall patch is ready. Thank you for working on machine-dependent parts of the patch.
[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178 --- Comment #30 from Vladimir Makarov --- (In reply to Richard Biener from comment #29) > (In reply to Vladimir Makarov from comment #28) > > Could somebody benchmark the following patch on zen2 470.lbm. > > Code generation changes quite a bit, with the patch the offending function > is 16 bytes larger. I see no large immediate moves to GPRs anymore but > there is still a lot of spilling of XMMs to GPRs. Performance is > unchanged by the patch: > > 470.lbm 13740128107 S 13740128107 S > 470.lbm 13740128107 * 13740128107 S > 470.lbm 13740128107 S 13740128107 * > > Thank you very much for testing the patch, Richard. The results mean no go for the patch to me. > Without knowing much of the code I wonder if we can check whether the move > will be to a reg in GENERAL_REGS? That is, do we know whether there are > (besides some special constants like zero), immediate moves to the > destination register class? > There are no such info from the target code. Ideally we need to have the cost of loading *particular* immediate value into register class on the same cost basis as load/store. Still to use this info efficiently choosing alternatives should be based on costs not on the hints and some machine independent general heuristics (as now). > That said, given the result on LBM I'd not change this at this point. > > Honza wanted to look at the move pattern to try to mitigate the > GPR spilling of XMMs. > > I do think that we need to take costs into account at some point and get > rid of the reload style hand-waving with !?* in the move patterns. In general I am agree with the direction but it will be quite hard to do. I know it well from my experience to change register class cost calculation algorithm in IRA (the experimental code can be found on the branch ira-select). I expect huge number of test failures and some benchmark performance degradation practically for any targets and a big involvement of target maintainers to fix them. Although it is possible to try to do this for one target at the time.
[Bug rtl-optimization/104400] [12 Regression] v850e lra/reload failure after recent change
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104400 --- Comment #3 from Vladimir Makarov --- (In reply to Jeffrey A. Law from comment #2) > NP on the timing. My biggest concern (as always) is whether or not this is > a generic issue or a bug in the v850 target files. The former is obviously > much more important. > > If it starts to look like a target issue, then feel free to punt it to me. > While I don't know the v850 fp bits, I have retained a fair amount of > generic v850 knowledge over the decades :-) It is my patch pitfall for very unusual v850 insn constraint 'e!r' where e is even general reg and subset of r. I have a patch to fix this and after testing it I'll commit it today or tomorrow.
[Bug rtl-optimization/103437] gcc/ira-color.c:2813:5: runtime error: signed integer overflow: 15 * 147462000 cannot be represented in type 'int'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103437 --- Comment #5 from Vladimir Makarov --- Thank you for reporting this. This problem seems not that important as it is only about heuristic costs and might be result only in worse performance code generation (but might be in better code -- it is hard to say). Still it is better not to remove this warning. I'll look into this.