https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91869

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2019-09-24
                 CC|                            |rguenth at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  One issue is with the use of volatile in the testcase which
on GIMPLE forces an intermediate init of an aggregate that is then
volatile-copied to the destination.

If you remove the volatile qualifications the code generation improves but
we still see

main ()
{
  <bb 2> [local count: 1073741825]:
  MEM[(struct Reg_T *)&Reg_0] = 0;
  MEM[(struct Reg_T *)&Reg_1] = 64;
  MEM[(struct Reg_T *)&Reg_2] = 8;
  Reg_3 = *.LC0;
  MEM[(struct Reg_T *)&Reg_4] = 4;
  Reg_5 = *.LC1;
  Reg_6 = *.LC2;
  Reg_7 = *.LC3;
  Reg_A = 0;
  Reg_B = 72;
  Reg_C = 255;
  return 0;

thus 1-byte constant pool entries being used:

        movzbl  .LC0(%rip), %eax
...

on the GIMPLE level this isn't cleaned up because of the aggregate-ness
(plus the constructor involving bitfields and us being lazy and giving up
on native-interpreting those in the constant folding code - still we
have code to deal with this in ctor emit code).

The cases with *.LCN uses come from the gimplifier heuristic when
there's more than one non-zero initializer:

        Reg_0.a = 0;
        Reg_0.b = 0;
        Reg_0.c = 0;
        Reg_1.a = 0;
        Reg_1.b = 0;
        Reg_1.c = 4;
        Reg_2.a = 0;
        Reg_2.b = 1;
        Reg_2.c = 0;
        Reg_3 = *.LC0;
        Reg_4.a = 4;
        Reg_4.b = 0;
        Reg_4.c = 0;
        Reg_5 = *.LC1;
        Reg_6 = *.LC2;
        Reg_7 = *.LC3;
        Reg_A = 0;
        Reg_B = 72;
        Reg_C = 255;

the heuristics are a bit odd here given we don't use pre-init and in the
other cases and thus don't save anything?!

Note I'd rather have the gimplifier use *.LCN aggregate assigns always
and leave the optimization to optimizations (which we obviously have to
improve as can be seen here).

The immediate "refactoring" possible is trying to unify the ctor
emission code in varasm.c and the native_encode stuff.  Then it could
be SRAs job to optimally scalarize the aggregate init.

Reply via email to