https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99785

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hubicka at gcc dot gnu.org
            Version|unknown                     |11.0

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
Did anybody check the actual output from clang as to whether it performs the
desired optimizations?  I only have clang 9 around and that rejects the TU
(maybe there's clang specific code paths and the preprocessed source is not
representative here)

Inlining blend_pixels without first constant propagating 'blend_key' (I suppose
at all call paths that's eventually supposed to be constant propagated
somehow?)
looks quite stupid given the large switch.  Sure, saving %xmm around calls can
have a cost but trashing icache should be worse.  If all of this is
auto-generated the auto-generation might also be able to improve the
blend_key dispatch.

Another strathegy might be to not put always_inline on everything
(because that in turn will cause exponential growth) but instead inline
everything into the finally important function(s) via 'flatten'.

That is, you do sth like

static __attribute__((always_inline)) inline void large_leaf () { /* large */ }

static __attribute__((always_inline)) inline void inter1 () { large_leaf (); }

static __attribute__((always_inline)) inline void inter2 () { inter1 (); inter1
(); }

static __attribute__((always_inline)) inline void inter3 () { inter2 (); inter2
(); }

and what you get is (intermediate) 8 copies of the large_leaf body.  Which
is because we inline expand from leafs rather than first inlining the small
always-inline wrappers (and throwing them away before inlining into them).
I suppose we could try to not inline into always-inline functions at the
expense of needing to iterate on inlined always-inline bodies.  Or somehow
at least delay inlining large bodies into always-inline bodies.

Anyway, marking such large functions as always-inline is asking for trouble.

Reply via email to