[Bug middle-end/99299] Need a recoverable version of __builtin_trap()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99299 --- Comment #6 from Segher Boessenkool --- (In reply to Richard Biener from comment #4) > I'm not sure what your proposed not noreturn trap() would do in terms of > IL semantics compared to a not specially annotated general call? Nothing I think? But __builtin_trap *is* very different: it ends BBs. > "recoverable" likely means resuming after the trap, not on an exception > path (so it'll not be a throw())? "recoverable" is super unclear. For example, on Power the hardware has a concept "recoverable interrupt", which set MSR[RI]=1, and traps never do. This is a very different concept as what is wanted here, which has nothing to do with recoverability, and is simply about not being an abort() (which __builtin_trap *is*!) > The only thing that might be useful to the middle-end would be marking > the function as not altering the memory state. But I suppose it should > still serve as a barrier for code motion of both loads and stores, even > of those loads/stores are known to not trap. The only magic we'd have > for this would be __attribute__((const,returns_twice)). Which likely > will be more detrimental to general optimization. > > So - what's the "sub-optimal code generation" you refer to from the > (presumably) volatile asm() you use for the trap? > > [yeah, asm() on GIMPLE is less optimized than a call] The rs6000 backend can optimise the used instructions: we have trap_if instructions, both with registers and with immediates. A single instruction can do a comparison and a conditional trap. This works great with __builtin_trap, *if* the kernel's trap handler has abort() semantics. __builtin_trap_no_abort() maybe?
[Bug middle-end/99299] Need a recoverable version of __builtin_trap()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99299 --- Comment #7 from Segher Boessenkool --- (In reply to Franz Sirl from comment #5) > For the naming I suggest __builtin_debugtrap() to align with clang. Maybe > with an aliased __debugbreak() on Windows platforms. Those are terrible names. This would *not* be used more often than __builtin_trap, for debugging. In general, builtins should say what they *do*, nott what you imagine they will be used for.
[Bug middle-end/99299] Need a recoverable version of __builtin_trap()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99299 --- Comment #9 from Segher Boessenkool --- The i386 port has === (define_insn "trap" [(trap_if (const_int 1) (const_int 6))] "" { #ifdef HAVE_AS_IX86_UD2 return "ud2"; #else return ASM_SHORT "0x0b0f"; #endif } [(set_attr "length" "2")]) === which implements __builtin_trap, and can implement __builtin_trap_no_abort just fine as well, if your OS kernel (or similar) can return after a ud2. If clang uses terribly confusing names (or semantics, or syntax, etc.) we should not copy that from them. *Especially* when that already conflicts with names they copied from us.
[Bug testsuite/99352] New: check_effective_target_sqrt_insn for powerpc is wrong
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99352 Bug ID: 99352 Summary: check_effective_target_sqrt_insn for powerpc is wrong Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: testsuite Assignee: unassigned at gcc dot gnu.org Reporter: segher at gcc dot gnu.org Target Milestone: --- It just just says [istarget powerpc*-*-*] but it should test whether the preprocessor symbol "_ARCH_PPCSQ" is defined.
[Bug testsuite/99352] check_effective_target_sqrt_insn for powerpc is wrong
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99352 Segher Boessenkool changed: What|Removed |Added Ever confirmed|0 |1 Target||powerpc*-*-* Last reconfirmed||2021-03-02 Assignee|unassigned at gcc dot gnu.org |segher at gcc dot gnu.org Status|UNCONFIRMED |ASSIGNED --- Comment #1 from Segher Boessenkool --- Mine.
[Bug testsuite/99352] check_effective_target_sqrt_insn for powerpc is wrong
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99352 --- Comment #3 from Segher Boessenkool --- rs6000 has check_effective_target_powerpc_fprs already (with slightly different semantics).
[Bug other/99496] [11 regression] g++.dg/modules/xtreme-header-3_c.C ICEs after r11-7557
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99496 Segher Boessenkool changed: What|Removed |Added CC||segher at gcc dot gnu.org --- Comment #2 from Segher Boessenkool --- Just FYI: There are four Power Linux systems in the cfarm (as well as some AIX). gcc110 POWER7 BE gcc203 POWER8 BE gcc112 POWER8 LE gcc135 POWER9 LE The last one is by far the most powerful of these.
[Bug target/98959] ICE in extract_constrain_insn, at recog.c:2670
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98959 --- Comment #20 from Segher Boessenkool --- (In reply to Bill Schmidt from comment #14) > We should definitely not be allowing the AltiVec "& ~16" flavors into these > patterns. I'm not certain whether your fix is the best way to achieve that, > but it could well be; I'll defer to Segher on that. Hey, it works, so it is okay for now at least. Longer term we should probably think of something more elegant and less failure-prone.
[Bug testsuite/99352] check_effective_target_sqrt_insn for powerpc is wrong
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99352 Segher Boessenkool changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #4 from Segher Boessenkool --- commit c60ad1c5fe0249f48362be0f989184ca447f9d17 Author: Segher Boessenkool Date: Wed Mar 3 20:34:32 2021 + rs6000: Fix check_effective_target_sqrt_insn (PR99352) The previous version returned true for all PowerPC. This is incorrect. We only support floating point square root instructions if a) we support floating point instructions at all, and b) we have _ARCH_PPCSQ defined. 2020-03-09 Segher Boessenkool gcc/testsuite/ * lib/target-supports.exp (check_effective_target_powerpc_sqrt): New. (check_effective_target_sqrt_insn): Use it.
[Bug target/99581] [11 Regression] internal compiler error: during RTL pass: final - void QTWTF::TCMalloc_PageHeap::scavengerThread() since r11-7526
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99581 --- Comment #5 from Segher Boessenkool --- Thanks Vladimir. It is indeed a problem in LRA (or triggered by it). We have 8: {[r121:DI+low(unspec[`*.LANCHOR0',%2:DI] 47+0x92a4)]=asm_operands;clobber so this is an offset that is too big for a machine instruction, those can take -32768..32767. Changing the constraint to "m" you get in LRA Inserting insn reload before: 13: r121:DI=high(unspec[`*.LANCHOR0',%2:DI] 47+0x92a4) but this doesn't happen if you keep it "o", and it dies later.
[Bug other/99496] [11 regression] g++.dg/modules/xtreme-header-3_c.C ICEs after r11-7557
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99496 --- Comment #13 from Segher Boessenkool --- Hi Nathan, I think you didn't push the branch that is on?
[Bug target/99581] [11 Regression] internal compiler error: during RTL pass: final - void QTWTF::TCMalloc_PageHeap::scavengerThread() since r11-7526
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99581 --- Comment #7 from Segher Boessenkool --- >From the offending patch: -/* Return true if the eliminated form of AD is a legitimate target address. */ +/* Return true if the eliminated form of AD is a legitimate target address. + If OP is a MEM, AD is the address within OP, otherwise OP should be + ignored. CONSTRAINT is one constraint that the operand may need + to meet. */ static bool -valid_address_p (struct address_info *ad) +valid_address_p (rtx op, struct address_info *ad, +enum constraint_num constraint) The addition of those extra args makes clear that the function is no longer just testing if it is a valid address. It should be renamed. And perhaps most callers should still use the old version, the one that actually tests if something is a valid address?
[Bug target/98092] [11 Regression] ICE in extract_insn, at recog.c:2315 (error: unrecognizable insn) since r11-4623
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98092 Segher Boessenkool changed: What|Removed |Added Attachment #50040|0 |1 is obsolete|| --- Comment #6 from Segher Boessenkool --- Created attachment 50401 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50401&action=edit Patch
[Bug target/97926] ICE in patch_jump_insn, at cfgrtl.c:1298
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97926 Segher Boessenkool changed: What|Removed |Added Assignee|acsawdey at gcc dot gnu.org|segher at gcc dot gnu.org --- Comment #4 from Segher Boessenkool --- That is not where the UNGE and UNLE come from. I have no idea where they *do* come from though :-/
[Bug target/97926] ICE in patch_jump_insn, at cfgrtl.c:1298
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97926 --- Comment #5 from Segher Boessenkool --- It helps if you test the compiler you just built, not something old. Sigh. Patch is testing.
[Bug target/99581] [11 Regression] internal compiler error: during RTL pass: final - void QTWTF::TCMalloc_PageHeap::scavengerThread() since r11-7526
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99581 --- Comment #14 from Segher Boessenkool --- Well, V=m-o (not the same thing, these are sets) -- but, it is clear that "o" should be a subset of "m": (define_memory_constraint "TARGET_MEM_CONSTRAINT" "Matches any valid memory." (define_memory_constraint "o" "Matches an offsettable memory reference." So yeah, it should get the memory_address_addr_space_p thing.
[Bug testsuite/97926] ICE in patch_jump_insn, at cfgrtl.c:1298
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97926 Segher Boessenkool changed: What|Removed |Added Component|target |testsuite Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #7 from Segher Boessenkool --- Fixed.
[Bug target/99708] __SIZEOF_FLOAT128__ not defined on powerpc64le-linux
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99708 --- Comment #1 from Segher Boessenkool --- Yes, the __SIZEOF_* macros do not say whether some type can be used. This is true for all targets! What would it be useful for to define these macros? They all are equivalent to #define SIXTEEN 16 :-)
[Bug target/99708] __SIZEOF_FLOAT128__ not defined on powerpc64le-linux
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99708 --- Comment #3 from Segher Boessenkool --- The only such __SIZEOF_* macro that is not about a standards-required type is for int128. Not the best example ;-) There are not predefines for __SIZEOF_FLOAT128__ etc. either. In an ideal world the user can just assume those types exist always. In a less ideal world, use autoconf? You have to anyway, if you want to support older compilers at all.
[Bug target/97329] POWER9 default cache and line sizes appear to be wrong
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97329 --- Comment #10 from Segher Boessenkool --- GCC 11 stage 4 will be fine. I doubt you can ever measure a difference, but you can try :-)
[Bug target/99718] [11 regression] ICE in new test case gcc.target/powerpc/pr98914.c for 32 bits
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99718 --- Comment #5 from Segher Boessenkool --- (In reply to Jakub Jelinek from comment #3) > If the non-constant vec_set can't be supported when > !(TARGET_P8_VECTOR && TARGET_DIRECT_MOVE_64BIT) I don't see why not? It may need different code, sure, but that is much preferable over contorting the rest of the backend.
[Bug target/99708] __SIZEOF_FLOAT128__ not defined on powerpc64le-linux
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99708 --- Comment #6 from Segher Boessenkool --- (In reply to Jonathan Wakely from comment #5) > (In reply to Segher Boessenkool from comment #3) > > In an ideal world the user can just assume those types exist always. > Arguably a __SIZEOF_xxx__ macro isn't a very sensible macro for types where > the type has a guaranteed size, Yes. And it does not mean the type exist (or is usable), either. > but we need *something* that says the type > exists. Do we? The types should always exist! > Since all other targets already use __SIZEOF_xxx__ to say that the > type exists, it would be consistent and helpful for powerpc to do the same. Other targets do not have __ieee128 or __ibm128.
[Bug target/99718] [11 regression] ICE in new test case gcc.target/powerpc/pr98914.c for 32 bits
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99718 --- Comment #7 from Segher Boessenkool --- (In reply to Jakub Jelinek from comment #6) > I did not know whether it is implementable (in VSX or in Altivec) for 32-bit > targets etc., all I was suggesting was what to do if it is not implementable. Yes. > If it is implementable, somebody familiar with VSX/Altivec should add the > implementation, or we can temporarily use the patch that has been posted and > get back to it later. I haven't seen a patch posted yet? > Or if it is partly implementable (e.g. can be done in > VSX and can't be done in Altivec, etc.), then the patch can still be used > after amendments for what will and what will not work. The only thing I am saying it should be massively easier to just implement it for -m32 as well, much easier than adding extra conditions (and unavoidably getting that wrong). > Right now it is a P1 blocker because we ICE on something that worked > perfectly fine (perhaps slower than it could) in GCC 10. So something needs > to be done before GCC 11 and we have ~ a month left for that. Yup. I'll review any patch that is sent (cc: me, so that I see it immediately, instead of after 3 to 6 weeks). Thanks, Segher
[Bug target/99718] [11 regression] ICE in new test case gcc.target/powerpc/pr98914.c for 32 bits
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99718 --- Comment #17 from Segher Boessenkool --- (In reply to Jakub Jelinek from comment #10) > https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567215.html Ah, that is more recent than anything I have replied to :-(
[Bug target/99718] [11 regression] ICE in new test case gcc.target/powerpc/pr98914.c for 32 bits
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99718 --- Comment #18 from Segher Boessenkool --- (In reply to luoxhu from comment #12) > Not sure whether TARGET_DIRECT_MOVE_64BIT is the right MACRO to correctly > differentiate m32 and m64? It is not. It looks at TARGET_POWERPC64 only, and that can be set for -m32 just fine.
[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930 --- Comment #3 from Segher Boessenkool --- What happens here is https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/combine.c;h=3294575357bfcb19e589868da34364498a860dcf;hb=HEAD#l1884 "*2_1" for absneg:MODEF has a bare "use". And then we trigger If the USE in INSN was for a pseudo register, the matching insn pattern will likely match any register; combining this with any other USE would only be safe if we knew that the used registers have identical values, or if there was something to tell them apart, e.g. different modes. For now, we forgo such complicated tests and simply disallow combining of USES of pseudo registers with any other USE. because both the abs and the neg have a bare use. The patterns should be rewritten to not have such bare uses. Alternatively we can add some pretty-much-never-triggered code do combine to handle this case. Patches welcome.
[Bug tree-optimization/99927] [11 Regression] Maybe wrong code since r11-39-gf9e1ea10e657af9f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99927 --- Comment #9 from Segher Boessenkool --- (In reply to Jakub Jelinek from comment #5) > But what is wrong is that try_combine has been called at all, because > (reg:CCZ 17 flags) is used in 3 instructions rather than just one. That is not a problem; If that were true it just would mean that added_sets_2 should be set: added_sets_2 = !dead_or_set_p (i3, i2dest); But, the flags reg actually *is* dead in i3 (insn 108), it dies in i2 (insn 107): (expr_list:REG_DEAD (reg:SI 107) So something earlier is bad already.
[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930 --- Comment #8 from Segher Boessenkool --- That patch is no good. The combination is not allowed because it is not known what the "use"s are *for*. Checking if something is from the constant pools is not enough at all.
[Bug tree-optimization/99927] [11 Regression] Wrong code since r11-39-gf9e1ea10e657af9f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99927 --- Comment #11 from Segher Boessenkool --- (In reply to Jakub Jelinek from comment #7) > Ah, create_log_links wants to work like that. > So, the bug seems to be that insn 108 has REG_DEAD (reg:CC 17 flags) note. > It doesn't initially, but it is added during 106 -> 108 combination But that combination should never have been made: flags is set in insn 107!
[Bug tree-optimization/99927] [11 Regression] Wrong code since r11-39-gf9e1ea10e657af9f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99927 --- Comment #13 from Segher Boessenkool --- Yes, combine just drops that clobber of flags, that was a thinko :-)
[Bug tree-optimization/99927] [11 Regression] Wrong code since r11-39-gf9e1ea10e657af9f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99927 --- Comment #14 from Segher Boessenkool --- distribute_notes says Any clobbers from i2 or i1 can only exist if they were added by recog_for_combine. which is not true apparently. But all of this code *does* depend on that, it just doesn't make sense otherwise.
[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323 --- Comment #10 from Segher Boessenkool --- You cannot fix a simplify-rtx problem in much earlier passes! It may be useful of course (I have no idea, I don't know gimple well enough), but it is no solution to the problem at all. The xor/and/xor thing should be simplified to something proper. ((A^B)&C)^A = (A&~C)^(B&C) = (A&~C)|(B&C) This should already be done by the expand pass. At gimple level the logical complement is counted as an operation, making the contorted xor/and/xor form the best form to use, but in a system that considers more than just operation counts (like in RTL) this is not the best form at all. But, anyway, RTL simplification should be able to do this. Similar problems happen all over the place, fwiw -- see the various rl* tests for rs6000, for example.
[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930 --- Comment #10 from Segher Boessenkool --- That is a USE of a constant, which is a no-op always. Here we have a USE of a register, which is not. We actually have *two* uses of pseudos, and combine cannot know what that means for the target (all PARALLELs are split up in combine).
[Bug debug/99830] [11 Regression] ICE: in lra_eliminate_regs_1, at lra-eliminations.c:659 with -O2 -fno-expensive-optimizations -fno-split-wide-types -g
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99830 --- Comment #5 from Segher Boessenkool --- (In reply to Jakub Jelinek from comment #3) > In normal insns such clobbers would be rejected by recog, but for > DEBUG_INSNs we don't have strict validity tests, but guess we need to throw > away at least the worst garbage. combine puts clobbers of const0_rtx in instructions precisely because those *should* be rejected; it does it to abort a combination attempt. So it isn't clear to me why we end up with this here? Papering over it (as the proposed patch does) is not a good idea imho.
[Bug debug/99830] [11 Regression] ICE: in lra_eliminate_regs_1, at lra-eliminations.c:659 with -O2 -fno-expensive-optimizations -fno-split-wide-types -g
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99830 --- Comment #7 from Segher Boessenkool --- (In reply to Jakub Jelinek from comment #6) > In the end on the actual instruction the clobber is optimized away That is a very serious bug.
[Bug debug/99830] [11 Regression] ICE: in lra_eliminate_regs_1, at lra-eliminations.c:659 with -O2 -fno-expensive-optimizations -fno-split-wide-types -g
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99830 --- Comment #10 from Segher Boessenkool --- (In reply to Jakub Jelinek from comment #8) > In particular, it is combine_simplify_rtx that is called on: > (zero_extend:SI (subreg:QI (ior:TI (and:TI (reg/v:TI 103 [ f ]) > (const_int -16711681 [0xff00])) > (ashift:TI (and:TI (clobber:TI (const_int 0 [0])) > (const_int 255 [0xff])) > (const_int 16 [0x10]))) 0)) > which simplifies it into > (and:SI (subreg:SI (reg/v:TI 103 [ f ]) 0) > (const_int 255 [0xff])) That is very wrong. A clobber of 0 should *never* be removed. Various parts of generic code know about that already, btw. A clobber of 0 means "Abort! Abort!" It does not mean "well, here is something you can optimise away more easily". Do you want to investigate further, or shall I?
[Bug c/100005] undefined reference to `_rdrand64_step'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=15 Segher Boessenkool changed: What|Removed |Added CC||segher at gcc dot gnu.org --- Comment #2 from Segher Boessenkool --- So the only bug here is that we should give a better error message? One when taking the address, already.
[Bug c/100005] undefined reference to `_rdrand64_step'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=15 --- Comment #3 from Segher Boessenkool --- I'm not sure how/why "artificial" should prevent taking the address though?
[Bug debug/99830] [11 Regression] ICE: in lra_eliminate_regs_1, at lra-eliminations.c:659 with -O2 -fno-expensive-optimizations -fno-split-wide-types -g
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99830 --- Comment #12 from Segher Boessenkool --- (In reply to Jakub Jelinek from comment #11) > I don't understand what is wrong about that. > (clobber:TI (const_int 0 [0])) in there stands for couldn't figure out what > this value is or how to represent it, so it is wildcard for I don't know > what the value is. That is not what it means. It means "This instruction is invalid". It should never be "optimised" away. > I'd think if one has say (and:TI (clobber:TI (const_int 0 [0])) (const_int 0 > [0])) one should be able to still simplify it into 0, etc., No. That RTL has no meaning at all, you cannot use a clobber as a RHS! > and what happens > here is the same thing, the clobber value, whatever it is, doesn't influence > in any way the whole expression value, therefore it is optimized away. > If it remained there, sure, the instruction would fail recog_for_combine. Yes. And that is why it should never be removed!
[Bug debug/99830] [11 Regression] ICE: in lra_eliminate_regs_1, at lra-eliminations.c:659 with -O2 -fno-expensive-optimizations -fno-split-wide-types -g
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99830 --- Comment #14 from Segher Boessenkool --- (In reply to Jakub Jelinek from comment #13) > Seems the exact spot where the clobber is optimized away is e.g. when > simplify_and_const_int_1 (SImode, (ashift:SI (subreg:SI (and:TI (clobber:TI > (const_int 0 [0])) (const_int 255 [0xff])) 0) (const_int 16 [0x10])), 255); > is called. > It calls nonzero_bits, nonzero_bits sees VARYING << 16 and so returns > 0x, > /* Turn off all bits in the constant that are known to already be zero. > Thus, if the AND isn't needed at all, we will have CONSTOP == > NONZERO_BITS > which is tested below. */ > > constop &= nonzero; > > /* If we don't have any bits left, return zero. */ > if (constop == 0) > return const0_rtx; > > So, are you suggesting that in all such spots we need to test side_effects_p > and punt? Yes, you need to do check side_effects_p *everywhere* you can potentially remove a side effect. This is not specific to combine, even. > Note, simplify_and_const_int_1 already starts with: > if (GET_CODE (varop) == CLOBBER) > return NULL_RTX; > so it would need to use > if (side_effects_p (varop)) > return NULL_RTX; > instead. Yeah. This no longer disallows a VOIDmode clobber, but we should not see those here anyway. You'll need the same change a few lines later, btw: varop = force_to_mode (varop, mode, constop, 0); /* If VAROP is a CLOBBER, we will fail so return it. */ if (GET_CODE (varop) == CLOBBER) return varop; (you only need that second one, even, force_to_mode immediately returns its arg if it is a clobber).
[Bug tree-optimization/99927] [11 Regression] Wrong code since r11-39-gf9e1ea10e657af9f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99927 Segher Boessenkool changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |segher at gcc dot gnu.org --- Comment #16 from Segher Boessenkool --- (In reply to Richard Biener from comment #15) > So ... the conclusion is? The conclusion is I have a patch and I will commit it after testing it successfully on enough targets. This takes time. I see I forgot to self-assign the bug. Fixed.
[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323 --- Comment #13 from Segher Boessenkool --- (In reply to luoxhu from comment #11) > I noticed that you added the below optimization with commit > a62436c0a505155fc8becac07a8c0abe2c265bfe. But it doesn't even handle this > case, cse1 pass will call simplify_binary_operation_1, both op0 and op1 are > REGs instead of AND operators, do you have a test case to cover that piece > of code? This worked at the time. It broke some time ago in simple testcases, triggered by the "don't combine hard registers" thing I did. This is PR98468.
[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323 --- Comment #14 from Segher Boessenkool --- (In reply to luoxhu from comment #12) > That code was called by combine pass but fail to match. > > pr newpat > (set (reg:DI 125 [ l ]) > (xor:DI (and:DI (xor:DI (reg/v:DI 120 [ l ]) > (reg:DI 127)) > (const_int 267390975 [0xff00fff])) > (reg/v:DI 120 [ l ]))) Note this is 0x0ff00fff, and this is not a valid mask for rlwimi.
[Bug target/97142] __builtin_fmod not optimized on POWER
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97142 --- Comment #11 from Segher Boessenkool --- (In reply to luoxhu from comment #10) > If not built with fast-math, gimple_has_side_effects will return true and > cause the expand_call_stmt fail to expand the "_1 = fmod (x_2(D), y_3(D));" > to internal function. X86 also produces "bl fmod" for O3 build. > > > xlF expands the fmod to below ASM, no FMA generated? > > > 1900 : > 1900: 8c 03 01 10 vspltisw v0,1 > 1904: 00 00 24 c8 lfd f1,0(r4) > 1908: 00 00 03 c8 lfd f0,0(r3) > 190c: e2 03 40 f0 xvcvsxwdp vs2,vs32 > 1910: c0 09 62 f0 xsdivdp vs3,vs2,vs1 > 1914: 80 19 80 f0 xsmuldp vs4,vs0,vs3 > 1918: 64 21 a0 f0 xsrdpiz vs5,vs4 > 191c: 88 2d 01 f0 xsnmsubadp vs0,vs1,vs5 > 1920: 18 00 20 fc frspf1,f0 > 1924: 20 00 80 4e blr xsnmsubadp is an FMA. Multiply-subtract in this case, but that is just a sign switch -- I often say FMA for all of fmadd, fnmadd, fnmsub, fmsub, and their VSX counterparts. "Anything that does a multiply-type operation followed by an addition-type operation". (And often call integer MADs "FMA" as well, which is totally wrong, but :-) )
[Bug target/100085] Bad code for union transfer from __float128 to vector types
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085 --- Comment #3 from Segher Boessenkool --- The rotates in 6 and 7 are not merged, and neither are the vec_selects in 8 and 9. Both should be pretty easy to do, there is no unspec in sight, etc.
[Bug rtl-optimization/99927] Wrong code since r11-39-gf9e1ea10e657af9f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99927 Segher Boessenkool changed: What|Removed |Added Status|NEW |ASSIGNED --- Comment #18 from Segher Boessenkool --- Fixed for 11. This still needs backports for 10 and everything before, please don't close the bug.
[Bug target/100108] [10/11 Regression] powerpc: recognize 32-bit CPU as POWER9 with -misel option
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100108 --- Comment #4 from Segher Boessenkool --- (In reply to Andrew Pinski from comment #1) > e500 support had been moved to the powerpcspe target; so assuming power9 for > -misel is correct. > > e500mc support is still there though. There never *was* separate e500 support in GCC!
[Bug target/100108] [10/11 Regression] powerpc: recognize 32-bit CPU as POWER9 with -misel option
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100108 --- Comment #5 from Segher Boessenkool --- Created attachment 50629 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50629&action=edit Proposed simpler patch A simpler patch. I'll commit this later today (if no one stops me).
[Bug target/100108] [10/11 Regression] powerpc: recognize 32-bit CPU as POWER9 with -misel option
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100108 Segher Boessenkool changed: What|Removed |Added Target|powerpc--netbsd |powerpc Last reconfirmed||2021-04-19 Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |segher at gcc dot gnu.org Ever confirmed|0 |1
[Bug libgcc/98952] powerpc*: __trampoline_setup inverted test for trampoline size
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98952 --- Comment #4 from Segher Boessenkool --- Fixed on trunk. Needs backports to 11 and whatever else is still an open branch when the backports are done :-)
[Bug target/97329] POWER9 default cache and line sizes appear to be wrong
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97329 --- Comment #8 from Segher Boessenkool --- The default -mcpu= for a compiler targeting powerpc64le-linux is normally power8 (you can change this with the --with-cpu= configure option though). -mcpu=powerpc64le is also (currently) equal to -mcpu=power8. But the numbers for Power8 (in power8_cost) are wrong it seems: it has a 64kB L1-D cache, and a 512kB L2 cache (it looks like we have simply copied the Power7 numbers here; 32 and 256 is correct for Power7).
[Bug rtl-optimization/97249] Missing vec_select and subreg optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97249 --- Comment #5 from Segher Boessenkool --- (In reply to Richard Biener from comment #3) > Guess you want to figure what built the (vec_select:V8QI (V16QI)) and if > it was appropriately simplified (and simplify_rtx would handle this case). > In any case the vec_select is the same as (subreg:V8QI (V16QI)). This case for vec_select isn't yet handled in simplify-rtx. It looks like it does not yet handle any cases that do not use full vector length? (Or, in other words, it only handles cases where all vectors are the same length.)
[Bug target/97437] builtins subcarry and addcarry still not generate the right code. Not get optimized to immediate value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97437 --- Comment #5 from Segher Boessenkool --- Trying 7 -> 9: 7: r97:SI=0x2a 9: {flags:CCC=cmp(r97:SI+r98:SI,r97:SI);r99:SI=r97:SI+r98:SI;} REG_DEAD r98:SI REG_DEAD r97:SI Failed to match this instruction: (parallel [ (set (reg:CC 17 flags) (compare:CC (reg:SI 98 [ *b_12(D) ]) (const_int -42 [0xffd6]))) (set (reg:SI 99) (plus:SI (reg:SI 98 [ *b_12(D) ]) (const_int 42 [0x2a]))) ]) On rs6000 we have four special variants for the immediate add-with-carry insn patterns: imm 0, imm -1, imm pos, imm neg. All of these have different canonical RTL.
[Bug target/97437] builtins subcarry and addcarry still not generate the right code. Not get optimized to immediate value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97437 --- Comment #6 from Segher Boessenkool --- I forgot to add: subtract immediate is the same as add immediate for us, we don't change the sense of the carry bit to a "borrow bit" (and instead, we have a subtract-from-immediate). But this doesn't change much at all to the situation here.
[Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445 Segher Boessenkool changed: What|Removed |Added CC||segher at gcc dot gnu.org --- Comment #3 from Segher Boessenkool --- AFAICS the point is that this always compiles to just a handful of insns, and the inliner should be able to see that (even if the source is biggish).
[Bug target/97437] builtins subcarry and addcarry still not generate the right code. Not get optimized to immediate value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97437 --- Comment #8 from Segher Boessenkool --- So is that something than can/should be improved in ix86_cc_mode?
[Bug target/97437] builtins subcarry and addcarry still not generate the right code. Not get optimized to immediate value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97437 --- Comment #10 from Segher Boessenkool --- Not even an alternative SELECT_CC_MODE; just add an argument to it, giving the original mode? We already have that in combine, so we can trivially pass it. Will that work for x86 here?
[Bug bootstrap/94761] host != target
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94761 Segher Boessenkool changed: What|Removed |Added CC||segher at gcc dot gnu.org --- Comment #2 from Segher Boessenkool --- All of the text of the report is missing, apparently?
[Bug bootstrap/94761] host != target
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94761 --- Comment #3 from Segher Boessenkool --- Commit e69bf64be925 added the host and target flags originally, and it seems to have been just a mistake that is used --build=${build_alias} --host=${build_alias}. (Now of course that has spread to many more places.)
[Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445 --- Comment #31 from Segher Boessenkool --- (In reply to Jan Hubicka from comment #27) > It is because --param inline-insns-single was reduced for -O2 from 200 > to 70. GCC 10 has newly different set of parameters for -O2 and -O3 and > enables auto-inlining at -O2. > > Problem with inlininig funtions declared inline is that C++ codebases > tends to abuse this keyword for things that are really too large (and > get_order would be such example if it did not have builtin_constant_p > check which inliner does not understand well). So having same limit at > -O2 and -O3 turned out to be problematic with respect to code size and > especially with respect to LTO, where a lot more inlining oppurtunities > appear. Do the heuristics account for that not inlining a "static inline" results in multiple copies? > I will implement the heuristics to push up inline limits of functions > having builtin_constant_p of parameter which should help a bit in this > case Thank you! > (but not very systematically: as dicussed in the PR log it is quite > hard problem to get builtin_constant_p right in the code size metrics > used by inliner before it knows exactly what is going to be constant and > what is not). That is true for many other inlining things as well... builtin_constant_p is worse than most I guess ;-)
[Bug target/43892] PowerPC suboptimal "add with carry" optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892 --- Comment #26 from Segher Boessenkool --- It isn't easy to do. Feel free to try your hand at it :-)
[Bug tree-optimization/97360] [11 Regression] ICE in range_on_exit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97360 --- Comment #35 from Segher Boessenkool --- Send it to gcc-patches@ please, with explanation and everything?
[Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445 --- Comment #46 from Segher Boessenkool --- (In reply to Christophe Leroy from comment #43) > int g(int x) > { > return __builtin_clz(0); > } > > Gives > > 0018 : > 18: 38 60 00 20 li r3,32 > 1c: 4e 80 00 20 blr That is because rs6000 has /* The cntlzw and cntlzd instructions return 32 and 64 for input of zero. */ #define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \ ((VALUE) = GET_MODE_BITSIZE (MODE), 2) This says that at RTL level and in the optabs, clz of 0 *is* defined, for rs6000. But the builtin is not valid with an arg of 0!
[Bug target/43892] PowerPC suboptimal "add with carry" optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892 --- Comment #29 from Segher Boessenkool --- Yup, and that is a more elegant way of writing this anyway. But we still do not handle the exact testcase code optimally ;-)
[Bug target/43892] PowerPC suboptimal "add with carry" optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892 --- Comment #31 from Segher Boessenkool --- Performing a jump based on the carry bit is not something we can easily do (there are no simple insns for it, and those sequences that will do the trick are expensive). But I'll look at that, thanks for the hint! At least in the __builtin_add_overflow case most of it will be ootimised away :-)
[Bug libgcc/97543] powerpc64le: libgcc has unexpected long double in .gnu_attribute
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97543 --- Comment #3 from Segher Boessenkool --- This part of the attribute (all but the low 2 bits) is not documented in the as manual, btw.
[Bug rtl-optimization/97583] New: Unknown mode_attribute (or iterator) ignored
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97583 Bug ID: 97583 Summary: Unknown mode_attribute (or iterator) ignored Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: segher at gcc dot gnu.org Target Milestone: --- This leads to errors at compiler runtime instead of at compiler build time. See https://gcc.gnu.org/pipermail/gcc-patches/2020-October/556998.html . Code from md_reader::apply_iterator_to_string : p = start + 1; *end = 0; v = map_attr_string (loc, p); *end = '>'; if (v == 0) continue; It could report an error instead.
[Bug libgcc/97543] powerpc64le: libgcc has unexpected long double in .gnu_attribute
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97543 --- Comment #9 from Segher Boessenkool --- Yes, that looks correct.
[Bug rtl-optimization/97676] New: "*" should skip a constraint, not just one char of it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97676 Bug ID: 97676 Summary: "*" should skip a constraint, not just one char of it Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: segher at gcc dot gnu.org Target Milestone: --- See https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557759.html and the thread leading up to it.
[Bug inline-asm/97708] Inline asm does not use the local register asm specified with register ... asm() as input
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97708 Segher Boessenkool changed: What|Removed |Added Resolution|INVALID |--- Status|RESOLVED|REOPENED Last reconfirmed||2020-11-03 Ever confirmed|0 |1 --- Comment #3 from Segher Boessenkool --- Yes, exactly. GCC silently does the wrong thing, contradicting its documentation.
[Bug inline-asm/97708] Inline asm does not use the local register asm specified with register ... asm() as input
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97708 Segher Boessenkool changed: What|Removed |Added Resolution|INVALID |--- Status|RESOLVED|REOPENED --- Comment #5 from Segher Boessenkool --- The only supported use for this feature is to specify registers for input and output operands when calling Extended @code{asm} (@pxref{Extended Asm}). This may be necessary if the constraints for a particular machine don't provide sufficient control to select the desired register. To force an operand into a register, create a local variable and specify the register name after the variable's declaration. Then use the local variable for the @code{asm} operand and specify any constraint letter that matches the register: Stop marking this as invalid. It is not. "r" *is* valid. And even if it was not, the compiler should just error, not silently do the wrong thing!
[Bug inline-asm/97708] Inline asm does not use the local register asm specified with register ... asm() as input
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97708 --- Comment #19 from Segher Boessenkool --- Documenting that GCC behaves differently is just documenting a bug :-( It should not be hard to detect this and give an error somewhere? Saying "the user did something wrong" is true of course, but then saying "so the compiler can do whatever" might be technically true, but doesn't help the user, who would rather the compiler did not silently do the opposite of what the user asked it to do!
[Bug inline-asm/97708] Inline asm does not use the local register asm specified with register ... asm() as input
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97708 --- Comment #21 from Segher Boessenkool --- register float foo asm ("xmm0") = 0.99f; asm volatile("movl %0, %%r8d\n\t" "vmcall\n\t" :: "g" (foo)); The user said operands[0] should go in xmm0, but that hard reg is not valid for its constraint. """ Then use the local variable for the asm operand and specify any constraint letter that matches the register: """ Not following that rule, causing a reload, is the user error. The reload you get is diametrically opposite to what local register vars are *for*, so it would be good if we could give an error.
[Bug inline-asm/97708] Inline asm does not use the local register asm specified with register ... asm() as input
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97708 Segher Boessenkool changed: What|Removed |Added Resolution|INVALID |FIXED --- Comment #23 from Segher Boessenkool --- The user said that foo should be in xmm1 when used in an asm. That is what local register asm does, nothing more, nothing less. Reloading it is never allowed.
[Bug inline-asm/97708] Inline asm does not use the local register asm specified with register ... asm() as input
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97708 --- Comment #27 from Segher Boessenkool --- (In reply to Alexander Monakov from comment #24) > Segher, did you really mean to mark the bug resolved/fixed? No, if I did that, I have no idea how :-) > Given that the only supported use of local register variables is passing > operands to inline asm in specific registers, I really think that GCC > shouldn't silently change the operand's location like that. Yes, exactly.
[Bug inline-asm/97708] Inline asm does not use the local register asm specified with register ... asm() as input
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97708 --- Comment #28 from Segher Boessenkool --- (In reply to Jakub Jelinek from comment #25) > Even if we wanted to do something about it (which I disagree with, e.g. > given that the implementation matches the documentation), you run into the > problem that even GIMPLE nor RTL differentiates between: > void > foo (void) > { > register int a __asm ("eax") = 1; > __asm ("# %0 " : : "c" (a+0)); > __asm ("# %0 " : : "c" (a)); > } > And "c" (a+0) unquestionably must be valid, it is just an expression that > happens to be equal to a value of local register variable. The documentation says """ The only supported use for this feature is to specify registers for input and output operands when calling Extended @code{asm}- (@pxref{Extended Asm}). This may be necessary if the constraints for a- particular machine don't provide sufficient control to select the desired- register. To force an operand into a register, create a local variable- and specify the register name after the variable's declaration. Then use- the local variable for the @code{asm} operand and specify any constraint- letter that matches the register: @smallexample register int *p1 asm ("r0") = @dots{}; register int *p2 asm ("r1") = @dots{}; register int *result asm ("r0"); asm ("sysint" : "=r" (result) : "0" (p1), "r" (p2)); @end smallexample """ Note the "use the local variable *for* the asm operand". Not *in* the asm operand. We really do care about the identity here (for all asm operands), not the value contained in the operand. So (a+0) is not valid. It is of course likely this will be optimised to just (a) and might even work, but that is not guaranteed. (The documentation here could be much improved, of course.)
[Bug inline-asm/97708] Inline asm does not use the local register asm specified with register ... asm() as input
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97708 --- Comment #29 from Segher Boessenkool --- (In reply to Richard Biener from comment #26) > So it would need to be diagnosed in the FE (only), making a + 0 valid and > a not. Eh. We do not *have* to diagnose anything, certainly not things that just happen to work (if "a+0" is optimised to just "a", say). But it would be good if we could diagnose the obvious and certainly wrong cases we do not do now -- like a register asm that does not match the operand constraint!
[Bug rtl-optimization/97784] New: Expressions evaluated as long chain instead of as tree or the like
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784 Bug ID: 97784 Summary: Expressions evaluated as long chain instead of as tree or the like Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: segher at gcc dot gnu.org Target Milestone: --- When compiling something like #define O + long x4(long x, long a, long b, long c, long d) { return x O a O b O c O d; } we end up with machine code like add 3,3,4# 10 [c=4 l=4] *adddi3/0 add 3,3,5# 11 [c=4 l=4] *adddi3/0 add 3,3,6# 12 [c=4 l=4] *adddi3/0 add 3,3,7# 18 [c=4 l=4] *adddi3/0 blr # 30 [c=4 l=4] simple_return Every of those "add" insns depends on the result of the previous one, making this slower than necessary: it has the latency of 4 add insns in series, while some can be done in parallel. This problem is there on gimple level already: _1 = x_4(D) + a_5(D); _2 = _1 + b_6(D); _3 = _2 + c_7(D); _9 = _3 + d_8(D); return _9; A very similar problem also happens as a result of RTL unrolling.
[Bug rtl-optimization/97784] Expressions evaluated as long chain instead of as tree or the like
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784 --- Comment #2 from Segher Boessenkool --- No, it is exactly the same with unsigned types :-( Use -Dlong="unsigned long" or use #define O ^ (as in my original test). I forgot about this signed thing, but it has nothing to do with it (that matters on gimple level, sure, but the problem exists in pure RTL as well).
[Bug target/97786] New: rs6000 isinf etc. are pretty horrible
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97786 Bug ID: 97786 Summary: rs6000 isinf etc. are pretty horrible Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: segher at gcc dot gnu.org Target Milestone: --- int isfinite(double x) { return __builtin_isfinite (x); } int isinf(double x) { return __builtin_isinf (x); } int isinf_sign(double x) { return __builtin_isinf_sign (x); } int isnan(double x) { return __builtin_isnan (x); } int isnormal(double x) { return __builtin_isnormal (x); } int fpclassify(double x) { return __builtin_fpclassify (5, 6, 7, 8, 9, x); } We can generate much better code for all these than the generic code we use now.
[Bug target/97784] Expressions evaluated as long chain instead of as tree or the like
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784 --- Comment #6 from Segher Boessenkool --- (In reply to Richard Biener from comment #3) > There is targetm.sched.reassociation_width which specifies how re-assocation > should make such sequence "wide". Ah cool, thank you :-) > Andrew is correct that we don't do this > for any types that are TYPE_OVERFLOW_UNDEFINED. Yes; but I see the sub-optimal behaviour for unsigned, too. > And powerpc has > > static int > rs6000_reassociation_width (unsigned int opc ATTRIBUTE_UNUSED, > machine_mode mode) > { > switch (rs6000_tune) > { > case PROCESSOR_POWER8: > case PROCESSOR_POWER9: > case PROCESSOR_POWER10: > if (DECIMAL_FLOAT_MODE_P (mode)) > return 1; > if (VECTOR_MODE_P (mode)) > return 4; > if (INTEGRAL_MODE_P (mode)) > return 1; Yeah this last 1 is the problem :-) > thus you get width 1 which means a linear chain (even if the user wrote > a tree). Yup. > Note RTL doesn't do any such thing like re-assocation (I guess in principle > scheduling could, and that's the only place where it would make sense > on RTL). RTL unrolling can, actually! "Variable expansion" is its horrible name (and it makes a lot of sense there: it allows breaking a bit linear chain into pieces).
[Bug target/97847] [11 Regression] ICE in insert_insn_on_edge, at cfgrtl.c:1976
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97847 Segher Boessenkool changed: What|Removed |Added Status|NEW |WAITING --- Comment #1 from Segher Boessenkool --- I cannot reproduce this? Not with any -mcpu= either, or any -O option.
[Bug tree-optimization/22326] promotions (from float to double) are not removed when they should be able to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=22326 Segher Boessenkool changed: What|Removed |Added CC||segher at gcc dot gnu.org --- Comment #8 from Segher Boessenkool --- The fmadd;frsp sequence is correct for this source code. It does double rounding of the result (first to DP float, then to SP float), so using just fmadds is only correct for -ffast-math or similar.
[Bug target/97847] [11 Regression] ICE in insert_insn_on_edge, at cfgrtl.c:1976
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97847 --- Comment #3 from Segher Boessenkool --- I can now reproduce it, with a compiler built yesterday (previous was a few days older), and -O0. Confirmed.
[Bug target/97847] [11 Regression] ICE in insert_insn_on_edge, at cfgrtl.c:1976
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97847 --- Comment #4 from Segher Boessenkool --- This was caused (or exposed) by e3b3b59683c1: commit e3b3b59683c1e7d31a9d313dd97394abebf644be Author: Vladimir N. Makarov Date: Fri Nov 13 12:45:59 2020 -0500 [PATCH] Implementation of asm goto outputs
[Bug target/97926] ICE in patch_jump_insn, at cfgrtl.c:1298
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97926 --- Comment #1 from Segher Boessenkool --- Confirmed (needs -O0).
[Bug target/96791] ICE in convert_mode_scalar, at expr.c:412
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96791 --- Comment #15 from Segher Boessenkool --- Why does that compiler default to -mcpu=power10?
[Bug target/96791] ICE in convert_mode_scalar, at expr.c:412
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96791 --- Comment #16 from Segher Boessenkool --- Oh, it's a different testcase, in comment 6. Yeah a new PR would have been better ;-/
[Bug rtl-optimization/97972] [9/10/11 Regression] ICE in moving_insn_creates_bookkeeping_block_p, at sel-sched.c:2031 since r9-2064-gc4c5ad1d6d1e1e1f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97972 --- Comment #2 from Segher Boessenkool --- Confirmed.
[Bug target/96791] ICE in convert_mode_scalar, at expr.c:412
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96791 --- Comment #19 from Segher Boessenkool --- (In reply to Arseny Solokha from comment #17) > (In reply to Segher Boessenkool from comment #16) > > Oh, it's a different testcase, in comment 6. Yeah a new PR would > > have been better ;-/ > > Do you want me to reopen PR97963 and copy comment 14 there until it's not > too late? Nah, it already is too late... Just keep it in mind for the future :-) It is easy to join two PRs. It is very hard / annoying to separate PRs; it is much easier if separate bugs just start out separate, so don't piggy-back it onto a PR that you think may have to do with it (you can always point to the existing PR!)
[Bug target/96791] ICE in convert_mode_scalar, at expr.c:412
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96791 --- Comment #20 from Segher Boessenkool --- (In reply to Peter Bergner from comment #18) > So why don't we default to the Altivec ABI with -m32 on cpus that have > Altivec and VSX units??? History. I'm not sure all our ABIs are compatible with vectors enabled, either. Since always, you have needed to use -mabi=altivec on 32-bit.
[Bug rtl-optimization/97972] [9/10/11 Regression] ICE in moving_insn_creates_bookkeeping_block_p, at sel-sched.c:2031 since r9-2064-gc4c5ad1d6d1e1e1f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97972 --- Comment #3 from Segher Boessenkool --- #0 moving_insn_creates_bookkeeping_block_p (through_insn=0x3fffb5b23138, insn=0x3fffb5b736c0) at /home/segher/src/gcc/gcc/sel-sched.c:2031 It crashes here because the insn is not in any BB; which is correct actually, because the insn has been deleted! It is deleted in sel-sched, and it was created there as well. I don't see anything wrong in the earlier debug dump; afaics this was just expose by the 2-2 combine thing.
[Bug target/96791] ICE in convert_mode_scalar, at expr.c:412
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96791 --- Comment #23 from Segher Boessenkool --- Changing the ABI (silently, even!) is never an expected thing. All of the four 32-bit ABIs we support have an AltiVec variant that isn't fully compatible to the non-AltiVec base variant. It would be a huge disservice to the user to change the ABI from under his/her feet. Anyway, patch in testing.
[Bug rtl-optimization/98179] New: gcc.dg/pr97954.c fails on (at least) BE powerpc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98179 Bug ID: 98179 Summary: gcc.dg/pr97954.c fails on (at least) BE powerpc Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: segher at gcc dot gnu.org Target Milestone: --- /home/segher/src/gcc/gcc/testsuite/gcc.dg/pr97954.c: In function 'foo': /home/segher/src/gcc/gcc/testsuite/gcc.dg/pr97954.c:12:1: error: too many outgoing branch edges from bb 4 during RTL pass: loop2_invariant /home/segher/src/gcc/gcc/testsuite/gcc.dg/pr97954.c:12:1: internal compiler error: verify_flow_info failed 0x10435cb3 verify_flow_info() /home/segher/src/gcc/gcc/cfghooks.c:269 0x10876cc7 checking_verify_flow_info /home/segher/src/gcc/gcc/cfghooks.h:212 0x10876cc7 move_loop_invariants() /home/segher/src/gcc/gcc/loop-invariant.c:2299 0x1087142f execute /home/segher/src/gcc/gcc/loop-init.c:530 This happens because this passed moved insn 8 from bb 4 to 2: (jump_insn 8 2 22 2 (parallel [ (set (reg:SI 118 [ x ]) (asm_operands:SI ("") ("=r") 0 [] [] [ (label_ref:DI 22) ] pr97954.c:10)) (clobber (reg:SI 98 ca)) ]) "pr97954.c":10:3 -1 (expr_list:REG_UNUSED (reg:SI 98 ca) (nil)) -> 22) We shouldn't allow such a move at all (not of any jump_insn!)
[Bug rtl-optimization/98178] Combine splitter does not split to single instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98178 --- Comment #3 from Segher Boessenkool --- Yup, this is true in general, we almost never say why we don't combine so far. Patches welcome! (Make sure you use TDF_DETAILS for such prints).
[Bug target/98020] PPC: mfvsrwz+extsw not merged to mtvsrwa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98020 Segher Boessenkool changed: What|Removed |Added Status|UNCONFIRMED |WAITING Last reconfirmed||2020-12-08 Ever confirmed|0 |1 --- Comment #1 from Segher Boessenkool --- mtvsrwa is the wrong way around, and mfvsrwa does not exist. Am I missing anything?
[Bug tree-optimization/22326] promotions (from float to double) are not removed when they should be able to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=22326 --- Comment #18 from Segher Boessenkool --- Why is it correct to convert the double x to single precision here?!
[Bug tree-optimization/22326] promotions (from float to double) are not removed when they should be able to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=22326 --- Comment #20 from Segher Boessenkool --- Yes, that is clear... But we have ***double*** x in that example even, as the declared type of the parameter, so converting that to float is almost certainly a bad idea?
[Bug target/97329] POWER9 default cache and line sizes appear to be wrong
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97329 Segher Boessenkool changed: What|Removed |Added Last reconfirmed||2020-10-08 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW CC||segher at gcc dot gnu.org --- Comment #3 from Segher Boessenkool --- At least as far back as GCC 5 we report D-L1 size 64kB (for most CPUs, not just p9). Confirmed.
[Bug target/97329] POWER9 default cache and line sizes appear to be wrong
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97329 --- Comment #5 from Segher Boessenkool --- So both the cache line size and the cache size are wrong for GCC 10 and before, but okay on trunk, on all compiler I tested (I tested on Linux only so far).