https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88775
Bug ID: 88775
Summary: [8/9 Regression] Optimize std::string assignment
Product: gcc
Version: 9.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: glisse at gcc dot gnu.org
Target Milestone: ---
#include <bits/c++config.h>
#undef _GLIBCXX_EXTERN_TEMPLATE
#define _GLIBCXX_EXTERN_TEMPLATE 0
#include<string>
__attribute__((flatten))
std::string f(){
std::string s;
s="hello";
return s;
}
Yes, I have to go through some lengths to convince gcc to at least try to
optimize...
With gcc-7, I get
<bb 2> [14.44%]:
_3 = &MEM[(struct basic_string *)s_2(D)].D.21635._M_local_buf;
MEM[(struct _Alloc_hider *)s_2(D)]._M_p = _3;
MEM[(size_type *)s_2(D) + 8B] = 0;
MEM[(char_type &)s_2(D) + 16] = 0;
if (_3 != "hello")
goto <bb 3>; [75.00%]
else
goto <bb 4>; [25.00%]
<bb 3> [1.43%]:
__builtin_memcpy (_3, "hello", 5);
goto <bb 5>; [100.00%]
<bb 4> [0.97%]:
__builtin_memcpy ("hello", &MEM[(void *)"hello" + 5B], 5);
<bb 5> [14.43%]:
MEM[(size_type *)s_2(D) + 8B] = 5;
MEM[(char_type &)s_2(D) + 21] = 0;
return s_2(D);
which is kind of OK. It would be much better if we folded _3 != "hello", but it
is already small enough.
With gcc-9, I get something that starts with
__x.7_6 = (long unsigned int) "hello";
__y.8_7 = (long unsigned int) _3;
if (__x.7_6 < __y.8_7)
goto <bb 4>; [50.00%]
else
goto <bb 3>; [50.00%]
<bb 3> [local count: 38463891]:
if (__x.7_6 > __y.8_7)
goto <bb 4>; [50.00%]
else
goto <bb 5>; [50.00%]
ifcombine would kindly turn this into __x.7_6 != __y.8_7, but it doesn't look
like this yet when ifcombine runs. We also have equivalent blocks (reached by
goto, not fallthrough)
<bb 4> [local count: 19039626]:
__builtin_memcpy (_3, "hello", 5);
goto <bb 16>; [100.00%]
<bb 6> [local count: 3173271]:
__builtin_memcpy (_3, "hello", 5);
goto <bb 16>; [100.00%]
<bb 16> [local count: 114817586]:
# prephitmp_14 = PHI <pretmp_16(13), pretmp_25(15), _3(4), _3(8), _3(6),
_3(14)>
that we fail to merge. In the end, we have 4 times more code than we used to...
This is most likely caused by a change in libstdc++, but I am categorizing it
as tree-optimization because I believe we need some improvements there,
whatever libstdc++ decides to do.