https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88775
Bug ID: 88775 Summary: [8/9 Regression] Optimize std::string assignment Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: glisse at gcc dot gnu.org Target Milestone: --- #include <bits/c++config.h> #undef _GLIBCXX_EXTERN_TEMPLATE #define _GLIBCXX_EXTERN_TEMPLATE 0 #include<string> __attribute__((flatten)) std::string f(){ std::string s; s="hello"; return s; } Yes, I have to go through some lengths to convince gcc to at least try to optimize... With gcc-7, I get <bb 2> [14.44%]: _3 = &MEM[(struct basic_string *)s_2(D)].D.21635._M_local_buf; MEM[(struct _Alloc_hider *)s_2(D)]._M_p = _3; MEM[(size_type *)s_2(D) + 8B] = 0; MEM[(char_type &)s_2(D) + 16] = 0; if (_3 != "hello") goto <bb 3>; [75.00%] else goto <bb 4>; [25.00%] <bb 3> [1.43%]: __builtin_memcpy (_3, "hello", 5); goto <bb 5>; [100.00%] <bb 4> [0.97%]: __builtin_memcpy ("hello", &MEM[(void *)"hello" + 5B], 5); <bb 5> [14.43%]: MEM[(size_type *)s_2(D) + 8B] = 5; MEM[(char_type &)s_2(D) + 21] = 0; return s_2(D); which is kind of OK. It would be much better if we folded _3 != "hello", but it is already small enough. With gcc-9, I get something that starts with __x.7_6 = (long unsigned int) "hello"; __y.8_7 = (long unsigned int) _3; if (__x.7_6 < __y.8_7) goto <bb 4>; [50.00%] else goto <bb 3>; [50.00%] <bb 3> [local count: 38463891]: if (__x.7_6 > __y.8_7) goto <bb 4>; [50.00%] else goto <bb 5>; [50.00%] ifcombine would kindly turn this into __x.7_6 != __y.8_7, but it doesn't look like this yet when ifcombine runs. We also have equivalent blocks (reached by goto, not fallthrough) <bb 4> [local count: 19039626]: __builtin_memcpy (_3, "hello", 5); goto <bb 16>; [100.00%] <bb 6> [local count: 3173271]: __builtin_memcpy (_3, "hello", 5); goto <bb 16>; [100.00%] <bb 16> [local count: 114817586]: # prephitmp_14 = PHI <pretmp_16(13), pretmp_25(15), _3(4), _3(8), _3(6), _3(14)> that we fail to merge. In the end, we have 4 times more code than we used to... This is most likely caused by a change in libstdc++, but I am categorizing it as tree-optimization because I believe we need some improvements there, whatever libstdc++ decides to do.