https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103616
Bug ID: 103616 Summary: [9/10/11/12 Regression] ICE on ceph with systemtap macro since r8-5608 Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: jakub at gcc dot gnu.org Target Milestone: --- Since r8-5608-gd555138e648961fdc572d8afdb234b52978828f9 the following testcase ICEs at -O2 -fPIC on x86_64-linux: long a, b; void bar (char *, long); void baz (char, char); void qux (char *, char *); void foo (void) { while (1) { char c, d, e, f; bar (&c, a); bar (&d, b); baz (c, d); qux (&e, &f); double g = 0; __asm__("" : : "norfxy" (g)); } } during RTL pass: reload dump file: rh2027386.c.301r.reload rh2027386.c: In function ‘foo’: rh2027386.c:19:1: internal compiler error: maximum number of generated reload insns per insn achieved (90) 19 | } | ^ 0x11156b7 lra_constraints(bool) ../../gcc/lra-constraints.c:5084 0x10fe2de lra(_IO_FILE*) ../../gcc/lra.c:2336 0x10a590d do_reload ../../gcc/ira.c:5932 0x10a5dfc execute ../../gcc/ira.c:6118 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. Similar LRA looping on these "norfxy" constraints has been fixed with r9-9463-g49cc1253d079bbefc1 but not in this testcase. One thing is it would be nice to avoid the LRA looping (dunno what is at fault, whether LRA or the backend). Another one is I wonder if the cheapest reload when the insn allows memory wouldn't be to use the literal pool memory. E.g. on void foo (void) { double d = 0.0, e = 7.8; __asm ("# %0 %1" : : "m" (d), "m" (e)); } void bar (void) { double d = 0.0, e = 7.8; __asm ("# %0 %1" : : "mr" (d), "mr" (e)); } void baz (void) { double d = 0.0, e = 7.8; __asm ("# %0 %1" : : "mrx" (d), "mrx" (e)); } void qux (void) { double d = 0.0, e = 7.8; __asm ("# %0 %1" : : "mrfx" (d), "mrfx" (e)); } for foo we emit a weird load of the floating point constants from constant pool, store those on stack and use those stack memories as operands (this isn't RA fault, but expansion fault), while for bar-qux the combiner combines the constant pool memories into the inline asm and they survive RA there. So, after the looping is fixed, it would be nice if the RA also considered moving constant pool MEMs (they are constant, can't be clobbered by function calls etc. in between) to input operands that accept memory. Note, systemtap changed recently the norfxy to norx for x86_64, I think both the y and f in there are too dangerous, but even with norx constraint, if a floating point constant is used and combiner doesn't combine it for some reason (e.g. multiple uses), it would be nice if for the systemtap macros they were as cheap as possible and thus avoiding runtime code to compute the values when possible.