https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60621

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
                 CC|                            |hubicka at gcc dot gnu.org,
                   |                            |mjambor at suse dot cz
   Last reconfirmed|                            |2024-12-15
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=94960

--- Comment #9 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
With recent changes to std::string (including not yet reviewed
https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671599.html)
and std::vector we now get:

jh@ryzen3:~> ~/trunk-install2/bin/g++ -O2 -std=c++11 empl2.C  ; size   a.out 
   text    data     bss     dec     hex filename
   3792     656       8    4456    1168 a.out
jh@ryzen3:~> ~/trunk-install2/bin/g++ -O2 -std=c++11 empl2.C -DEMPLACE_BACK ;
size   a.out 
   text    data     bss     dec     hex filename
   5095     680       8    5783    1697 a.out

note that text size includes also EH tables. with emplace back we now get:

int main ()
{
  void * D.46214;
  struct S * const vs$8;
  struct S * const vs$D40641$_M_impl$D39953$_M_start;
  ptrdiff_t __dif;
  const char * c;
  const char * b;
  const char * a;
  struct vector vs;
  int _7;
  long int _15;
  void * _66;

  <bb 2> [local count: 1073741824]:
  MEM[(struct _Vector_impl_data *)&vs] ={v} {CLOBBER(bob)};
  MEM[(struct _Vector_impl_data *)&vs]._M_end_of_storage = 0B;
  a = "a";
  b = "b";
  c = "c";
  MEM <vector(2) long unsigned int> [(struct vector *)&vs] = { 0, 0 };
  std::vector<S>::_M_realloc_append<const char*&, const char*&, const char*&>
(&vs, &a, &b, &c);

  <bb 3> [local count: 1073741824]:
  vs$D40641$_M_impl$D39953$_M_start_143 = MEM <struct S * const> [(struct
vector *)&vs];
  vs$8_144 = MEM <struct S * const> [(struct vector *)&vs + 8B];
  _15 = vs$8_144 - vs$D40641$_M_impl$D39953$_M_start_143;
  __dif_16 = _15 /[ex] 96;
  _7 = (int) __dif_16;
  std::vector<S>::~vector (&vs);
  vs ={v} {CLOBBER(eos)};
  a ={v} {CLOBBER(eos)};
  b ={v} {CLOBBER(eos)};
  c ={v} {CLOBBER(eos)};
  return _7;

  <bb 4> [count: 0]:
<L3>:
  std::vector<S>::~vector (&vs);
  _66 = __builtin_eh_pointer (2);
  __builtin_unwind_resume (_66);

}

So _M_realloc_apped is offlined and quite large since it constructs strings and
we do not know that the strings fits to local buffer.

Without emplace back everything gets inlined.  The main difference is that here
the construction happens in main().

Now inlining is limited since we know that main is called once.  Modifying
testcase:

jh@ryzen3:~> cat empl2.C
#include <vector>
#include <string>

struct S {
#ifdef USE_CHAR
    S(const char*a, const char*b, const char*c)
#else
    S(std::string const&a, std::string const&b, std::string const &c)
#endif
        : a(a), b(b), c(c) {}
    std::string a, b, c;
};

int main2() {
    std::vector<S> vs;
#ifdef USE_STRING
        std::string a("a"),b("b"),c("c");
#else
        char const* a = "a", *b = "b", *c = "c";
#endif
#ifdef EMPLACE_BACK
    vs.emplace_back(a, b, c);
#elif defined(EMPLACE_BACK_NOTHROW)
                    vs.emplace_back(std::string(a), std::string(b),
std::string(c));
#else
    vs.push_back(S{a, b, c});
#endif
    return vs.size();
}

int main()
{
        return main2();
}

I get:
int main2 ()
{
  <bb 2> [local count: 1073741824]:
  return 1;

}       
int main ()
{
  <bb 2> [local count: 1073741824]:
  return 1;

}

which is as small as it can get :)
With emplace_back we only get everything inlined and otimized if --param
max-inline-insns-auto=160 is used. Default is 15 for -O2 and 30 for -O3.

Inline summary is:
IPA function summary for void std::vector<_Tp,
_Alloc>::_M_realloc_append(_Args&& ...) [with _Args = {const char*&, const
char*&, const char*&}; _Tp = S; _Alloc = std::allocator<S>]/760 inlinable
  global time:     840.049387
  self size:       81
  global size:     309
  min size:       288
  self stack:      123
  global stack:    123
  estimated growth:4
    size:167.500000, time:388.798296
    size:3.000000, time:2.000000,  executed if:(not inlined)
    size:3.000000, time:3.000000,  executed if:(op0 not sra candidate) && (not
inlined)
    size:3.000000, time:3.000000,  executed if:(op0 not sra candidate)
    size:0.500000, time:0.500000,  executed if:(op3 not sra candidate) && (not
inlined)
    size:0.500000, time:0.500000,  executed if:(op3 not sra candidate)
    size:0.500000, time:0.500000,  executed if:(op2 not sra candidate) && (not
inlined)
    size:0.500000, time:0.500000,  executed if:(op2 not sra candidate)
    size:0.500000, time:0.500000,  executed if:(op1 not sra candidate) && (not
inlined)
    size:0.500000, time:0.500000,  executed if:(op1 not sra candidate)
    size:0.500000, time:0.500000,  executed if:(op0 not sra candidate), 
nonconst if:(op0[ref offset: 64] changed) && (op0 not sra candidate)
    size:0.500000, time:0.500000,  executed if:(op0 not sra candidate), 
nonconst if:(op0[ref offset: 0] changed) && (op0 not sra candidate)
    size:8.000000, time:8.000000,  nonconst if:(op0[ref offset: 64] changed ||
op0[ref offset: 0] changed)

Getting size down from 160 to 15 will be quite some work.  I think to do that
we need to understand that lengths of strings are known. I.e. at IPA-prop time
understand that:

  _5 = __builtin_strlen (__s_7(D));
  std::__cxx11::basic_string<char>::_M_construct<true> (this_3(D), __s_7(D),
_5);

will be constant and since it is smaller than 15 bytes the constructor will
optimize.  This will need extension of jump functions.

Also at the moment _M_create is not instantiated (see PR94960) which prevents
additional propagation druing early opts.

Reply via email to