https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121527

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
                 CC|                            |mjambor at suse dot cz

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Eric Botcazou from comment #11)
> The missing stores are visible in the .optimized file wrt a compiler without
> the problematic change (r16-3156-g5294840e3c7bf9) spotted by Sam.

More specifically, I see a difference in c391002_4.adb.272t.optimized:

@@ -1001,17 +1001,15 @@
   <bb 4> [local count: 1073312328]:
   VIEW_CONVERT_EXPR<struct c391002_4__creator__S1b>(A11b) =
VIEW_CONVERT_EXPR<struct c391002_4__creator__S1b>(*plugs_6(D));
   _2 = VIEW_CONVERT_EXPR<struct
c391002_4__creator__S1b>(*plugs_6(D))._parent.band;
-  A11b._parent._parent.band = _2;
   _3 = VIEW_CONVERT_EXPR<struct
c391002_4__creator__S1b>(*plugs_6(D)).the_command;
-  A11b._parent.the_command = _3;
   A11b.tc_mc = -1;

where this is redundant store removal done in FRE1:

 Value numbering store A11b._parent._parent.band to _3
 Setting value number of .MEM_15 to .MEM_15 (changed)
+Deleted redundant store A11b._parent._parent.band = _3;
...
 Value numbering store A11b._parent.the_command to _4
 Setting value number of .MEM_16 to .MEM_16 (changed)
+Deleted redundant store A11b._parent.the_command = _4;

The apparent redundancy is visible above, we aggregate copy *plugs_6
to A11b and then copy separately a field of it.

The stored value is of 1 byte size enumeral type.  The store offset is
8 bytes to A11b, the load offset as well.

The aggregate copy is done using TImode load/stores, the remaining
loads to _2/_3 are QImode.

I fail to see how there's anything wrong.  Or are those not the stores
"missing"?  In the assembly I also see the stores to A11b (which is
on the stack) and the corresponding load feeding the compares.

In p.adb I see

   MEM <void (*ada__tags__prim_ptr) (void)[1:1] * {ref-all}> [(struct p__T43b *
{ref-all})&A50b] = &MEM <struct c391002_3__Telectronics_moduleTS> [(void
*)&c391002_3__electronics_moduleT + 32B];
-  MEM <unsigned char> [(struct p__T43b * {ref-all})&A50b + 8B] = 2;
   MEM <integer> [(struct p__T43b * {ref-all})&A50b + 52B] = 66;
   VIEW_CONVERT_EXPR<struct p__T43b>(A63b) = MEM[(struct p__T43b *
{ref-all})&A50b];
-  A63b._parent._parent.band = 2;
-  A63b._parent.the_command = 0;

the latter two look OK at the time they happen.

The 2nd is difficult to track down.  SRA seems to do quite stupid things
around aggregate copies like

   VIEW_CONVERT_EXPR<struct p__B51b__T52b>(A50b) = A53b;

where it does scalarize A50b, but before such V_C_E accesses restores
A50b and afterwards re-loads all components.

Disabling (non-early) SRA for p.adb resolves the issue and the A63b stores
above are still gone (but there's a lot of difference otherwise).

The late SRA beavior with/without the patch differs substantially.

This definitely needs more analysis - there's no FRE after late SRA, so for
now I think it's an error in SRA (apart from the fact it doing stupid
things around such aggregate copies).

Reply via email to