https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88775

            Bug ID: 88775
           Summary: [8/9 Regression] Optimize std::string assignment
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---

#include <bits/c++config.h>
#undef _GLIBCXX_EXTERN_TEMPLATE
#define _GLIBCXX_EXTERN_TEMPLATE 0
#include<string>
__attribute__((flatten))
std::string f(){
  std::string s;
  s="hello";
  return s;
}

Yes, I have to go through some lengths to convince gcc to at least try to
optimize...

With gcc-7, I get

  <bb 2> [14.44%]:
  _3 = &MEM[(struct basic_string *)s_2(D)].D.21635._M_local_buf;
  MEM[(struct _Alloc_hider *)s_2(D)]._M_p = _3;
  MEM[(size_type *)s_2(D) + 8B] = 0;
  MEM[(char_type &)s_2(D) + 16] = 0;
  if (_3 != "hello")
    goto <bb 3>; [75.00%]
  else
    goto <bb 4>; [25.00%]

  <bb 3> [1.43%]:
  __builtin_memcpy (_3, "hello", 5);
  goto <bb 5>; [100.00%]

  <bb 4> [0.97%]:
  __builtin_memcpy ("hello", &MEM[(void *)"hello" + 5B], 5);

  <bb 5> [14.43%]:
  MEM[(size_type *)s_2(D) + 8B] = 5;
  MEM[(char_type &)s_2(D) + 21] = 0;
  return s_2(D);

which is kind of OK. It would be much better if we folded _3 != "hello", but it
is already small enough.

With gcc-9, I get something that starts with

  __x.7_6 = (long unsigned int) "hello";
  __y.8_7 = (long unsigned int) _3;
  if (__x.7_6 < __y.8_7)
    goto <bb 4>; [50.00%]
  else
    goto <bb 3>; [50.00%]

  <bb 3> [local count: 38463891]:
  if (__x.7_6 > __y.8_7)
    goto <bb 4>; [50.00%]
  else
    goto <bb 5>; [50.00%]

ifcombine would kindly turn this into __x.7_6 != __y.8_7, but it doesn't look
like this yet when ifcombine runs. We also have equivalent blocks (reached by
goto, not fallthrough)

  <bb 4> [local count: 19039626]:
  __builtin_memcpy (_3, "hello", 5);
  goto <bb 16>; [100.00%]

  <bb 6> [local count: 3173271]:
  __builtin_memcpy (_3, "hello", 5);
  goto <bb 16>; [100.00%]

  <bb 16> [local count: 114817586]:
  # prephitmp_14 = PHI <pretmp_16(13), pretmp_25(15), _3(4), _3(8), _3(6),
_3(14)>

that we fail to merge. In the end, we have 4 times more code than we used to...

This is most likely caused by a change in libstdc++, but I am categorizing it
as tree-optimization because I believe we need some improvements there,
whatever libstdc++ decides to do.

Reply via email to