https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112666

            Bug ID: 112666
           Summary: Missed optimization: Value initialization
                    zero-initializes members with user-defined constructor
           Product: gcc
           Version: 11.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: paisanafc at gmail dot com
  Target Milestone: ---

Looking for the presence of "memset" instructions in the generated assembly, it
seems that gcc is zero-initializing class members with user-defined
constructors that shouldn't need to be zero-initialized.

I share below the example benchmark and a godbolt link for convenience
(https://godbolt.org/z/158q6sfen). I used the benchmark library as I didn't
know an easy way to reproduce the instruction `benchmark::DoNotOptimize`. I
hope that's ok.

---

#include <benchmark/benchmark.h>
#include <array>

struct A {
    A() = default;
    ~A() {
      benchmark::DoNotOptimize(c); // avoid inlining
    }
    std::array<char, 50000> member;
    char c;
};

struct B {
    B() {}  // user-defined ctor
    ~B() {
      benchmark::DoNotOptimize(c); // avoid inlining
    }
    std::array<char, 50000> member;
    char c;
};

struct C {
    // no user-defined ctor
    B b;
    int dummy;
};

// The benchmark code:

static void ACreation(benchmark::State& state) {
  for (auto _ : state) {
    A a{};
    benchmark::DoNotOptimize(a);
  }
}
BENCHMARK(ACreation);
static void BCreation(benchmark::State& state) {
  for (auto _ : state) {
    B b{};
    benchmark::DoNotOptimize(b);
  }
}
BENCHMARK(BCreation);
static void CCreation(benchmark::State& state) {
  for (auto _ : state) {
    C c{};
    benchmark::DoNotOptimize(c);
  }
}
BENCHMARK(CCreation);
BENCHMARK_MAIN();

---

When I run this with https://github.com/google/benchmark, I get the following
results (with gcc++11.4 and above):

-----------------------------------------------------
Benchmark           Time             CPU   Iterations
-----------------------------------------------------
ACreation         736 ns          736 ns       933741
BCreation        3.62 ns         3.62 ns    191180154
CCreation         755 ns          754 ns       944906

The struct "C" which is just "B" and an int is much slower at being initialized
than B when value initialization (via {}) is used. However, my understanding of
the C++ standard is that members with a user-defined default constructor do not
need to be zero-initialized in this situation. Looking at the godbolt assembly
output, I see that both `A a{}` and `C c{}` generate a memset instruction,
while `B b{}` doesn't. Clang, on the other hand, seems to initialize C almost
as fast as B.

This potentially missed optimization in gcc is particularly nasty for structs
with large embedded storage (e.g. structs that contain C-arrays, std::arrays,
or static_vectors).

Reply via email to