https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60621
Jan Hubicka <hubicka at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 Status|UNCONFIRMED |NEW CC| |hubicka at gcc dot gnu.org, | |mjambor at suse dot cz Last reconfirmed| |2024-12-15 See Also| |https://gcc.gnu.org/bugzill | |a/show_bug.cgi?id=94960 --- Comment #9 from Jan Hubicka <hubicka at gcc dot gnu.org> --- With recent changes to std::string (including not yet reviewed https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671599.html) and std::vector we now get: jh@ryzen3:~> ~/trunk-install2/bin/g++ -O2 -std=c++11 empl2.C ; size a.out text data bss dec hex filename 3792 656 8 4456 1168 a.out jh@ryzen3:~> ~/trunk-install2/bin/g++ -O2 -std=c++11 empl2.C -DEMPLACE_BACK ; size a.out text data bss dec hex filename 5095 680 8 5783 1697 a.out note that text size includes also EH tables. with emplace back we now get: int main () { void * D.46214; struct S * const vs$8; struct S * const vs$D40641$_M_impl$D39953$_M_start; ptrdiff_t __dif; const char * c; const char * b; const char * a; struct vector vs; int _7; long int _15; void * _66; <bb 2> [local count: 1073741824]: MEM[(struct _Vector_impl_data *)&vs] ={v} {CLOBBER(bob)}; MEM[(struct _Vector_impl_data *)&vs]._M_end_of_storage = 0B; a = "a"; b = "b"; c = "c"; MEM <vector(2) long unsigned int> [(struct vector *)&vs] = { 0, 0 }; std::vector<S>::_M_realloc_append<const char*&, const char*&, const char*&> (&vs, &a, &b, &c); <bb 3> [local count: 1073741824]: vs$D40641$_M_impl$D39953$_M_start_143 = MEM <struct S * const> [(struct vector *)&vs]; vs$8_144 = MEM <struct S * const> [(struct vector *)&vs + 8B]; _15 = vs$8_144 - vs$D40641$_M_impl$D39953$_M_start_143; __dif_16 = _15 /[ex] 96; _7 = (int) __dif_16; std::vector<S>::~vector (&vs); vs ={v} {CLOBBER(eos)}; a ={v} {CLOBBER(eos)}; b ={v} {CLOBBER(eos)}; c ={v} {CLOBBER(eos)}; return _7; <bb 4> [count: 0]: <L3>: std::vector<S>::~vector (&vs); _66 = __builtin_eh_pointer (2); __builtin_unwind_resume (_66); } So _M_realloc_apped is offlined and quite large since it constructs strings and we do not know that the strings fits to local buffer. Without emplace back everything gets inlined. The main difference is that here the construction happens in main(). Now inlining is limited since we know that main is called once. Modifying testcase: jh@ryzen3:~> cat empl2.C #include <vector> #include <string> struct S { #ifdef USE_CHAR S(const char*a, const char*b, const char*c) #else S(std::string const&a, std::string const&b, std::string const &c) #endif : a(a), b(b), c(c) {} std::string a, b, c; }; int main2() { std::vector<S> vs; #ifdef USE_STRING std::string a("a"),b("b"),c("c"); #else char const* a = "a", *b = "b", *c = "c"; #endif #ifdef EMPLACE_BACK vs.emplace_back(a, b, c); #elif defined(EMPLACE_BACK_NOTHROW) vs.emplace_back(std::string(a), std::string(b), std::string(c)); #else vs.push_back(S{a, b, c}); #endif return vs.size(); } int main() { return main2(); } I get: int main2 () { <bb 2> [local count: 1073741824]: return 1; } int main () { <bb 2> [local count: 1073741824]: return 1; } which is as small as it can get :) With emplace_back we only get everything inlined and otimized if --param max-inline-insns-auto=160 is used. Default is 15 for -O2 and 30 for -O3. Inline summary is: IPA function summary for void std::vector<_Tp, _Alloc>::_M_realloc_append(_Args&& ...) [with _Args = {const char*&, const char*&, const char*&}; _Tp = S; _Alloc = std::allocator<S>]/760 inlinable global time: 840.049387 self size: 81 global size: 309 min size: 288 self stack: 123 global stack: 123 estimated growth:4 size:167.500000, time:388.798296 size:3.000000, time:2.000000, executed if:(not inlined) size:3.000000, time:3.000000, executed if:(op0 not sra candidate) && (not inlined) size:3.000000, time:3.000000, executed if:(op0 not sra candidate) size:0.500000, time:0.500000, executed if:(op3 not sra candidate) && (not inlined) size:0.500000, time:0.500000, executed if:(op3 not sra candidate) size:0.500000, time:0.500000, executed if:(op2 not sra candidate) && (not inlined) size:0.500000, time:0.500000, executed if:(op2 not sra candidate) size:0.500000, time:0.500000, executed if:(op1 not sra candidate) && (not inlined) size:0.500000, time:0.500000, executed if:(op1 not sra candidate) size:0.500000, time:0.500000, executed if:(op0 not sra candidate), nonconst if:(op0[ref offset: 64] changed) && (op0 not sra candidate) size:0.500000, time:0.500000, executed if:(op0 not sra candidate), nonconst if:(op0[ref offset: 0] changed) && (op0 not sra candidate) size:8.000000, time:8.000000, nonconst if:(op0[ref offset: 64] changed || op0[ref offset: 0] changed) Getting size down from 160 to 15 will be quite some work. I think to do that we need to understand that lengths of strings are known. I.e. at IPA-prop time understand that: _5 = __builtin_strlen (__s_7(D)); std::__cxx11::basic_string<char>::_M_construct<true> (this_3(D), __s_7(D), _5); will be constant and since it is smaller than 15 bytes the constructor will optimize. This will need extension of jump functions. Also at the moment _M_create is not instantiated (see PR94960) which prevents additional propagation druing early opts.