12 Regression] ICE on ceph with systemtap macro since r8-5608

jakub at gcc dot gnu.org via Gcc-bugs Wed, 08 Dec 2021 01:44:44 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103616


            Bug ID: 103616
           Summary: [9/10/11/12 Regression] ICE on ceph with systemtap
                    macro since r8-5608
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jakub at gcc dot gnu.org
  Target Milestone: ---

Since r8-5608-gd555138e648961fdc572d8afdb234b52978828f9 the following testcase
ICEs at -O2 -fPIC on x86_64-linux:
long a, b;
void bar (char *, long);
void baz (char, char);
void qux (char *, char *);

void
foo (void)
{
  while (1)
    {
      char c, d, e, f;
      bar (&c, a);
      bar (&d, b);
      baz (c, d);
      qux (&e, &f);
      double g = 0;
      __asm__("" : : "norfxy" (g));
    }
}

during RTL pass: reload
dump file: rh2027386.c.301r.reload
rh2027386.c: In function ‘foo’:
rh2027386.c:19:1: internal compiler error: maximum number of generated reload
insns per insn achieved (90)
   19 | }
      | ^
0x11156b7 lra_constraints(bool)
        ../../gcc/lra-constraints.c:5084
0x10fe2de lra(_IO_FILE*)
        ../../gcc/lra.c:2336
0x10a590d do_reload
        ../../gcc/ira.c:5932
0x10a5dfc execute
        ../../gcc/ira.c:6118
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

Similar LRA looping on these "norfxy" constraints
has been fixed with r9-9463-g49cc1253d079bbefc1 but not in this testcase.

One thing is it would be nice to avoid the LRA looping (dunno what is at fault,
whether LRA or the backend).

Another one is I wonder if the cheapest reload when the insn allows memory
wouldn't be to use the literal pool memory.  E.g. on
void
foo (void)
{
  double d = 0.0, e = 7.8;
  __asm ("# %0 %1" : : "m" (d), "m" (e));
}

void
bar (void)
{
  double d = 0.0, e = 7.8;
  __asm ("# %0 %1" : : "mr" (d), "mr" (e));
}

void
baz (void)
{
  double d = 0.0, e = 7.8;
  __asm ("# %0 %1" : : "mrx" (d), "mrx" (e));
}

void
qux (void)
{
  double d = 0.0, e = 7.8;
  __asm ("# %0 %1" : : "mrfx" (d), "mrfx" (e));
}

for foo we emit a weird load of the floating point constants from constant
pool,
store those on stack and use those stack memories as operands (this isn't RA
fault, but expansion fault), while for bar-qux the combiner combines the
constant pool memories into the inline asm and they survive RA there.
So, after the looping is fixed, it would be nice if the RA also considered
moving constant pool MEMs (they are constant, can't be clobbered by function
calls etc. in between) to input operands that accept memory.

Note, systemtap changed recently the norfxy to norx for x86_64, I think both
the y and f in there are too dangerous, but even with norx constraint, if a
floating point constant is used and combiner doesn't combine it for some reason
(e.g. multiple uses), it would be nice if for the systemtap macros they were as
cheap as possible and thus avoiding runtime code to compute the values when
possible.

[Bug middle-end/103616] New: [9/10/11/12 Regression] ICE on ceph with systemtap macro since r8-5608

Reply via email to