https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112666
Bug ID: 112666
Summary: Missed optimization: Value initialization
zero-initializes members with user-defined constructor
Product: gcc
Version: 11.4.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: paisanafc at gmail dot com
Target Milestone: ---
Looking for the presence of "memset" instructions in the generated assembly, it
seems that gcc is zero-initializing class members with user-defined
constructors that shouldn't need to be zero-initialized.
I share below the example benchmark and a godbolt link for convenience
(https://godbolt.org/z/158q6sfen). I used the benchmark library as I didn't
know an easy way to reproduce the instruction `benchmark::DoNotOptimize`. I
hope that's ok.
---
#include <benchmark/benchmark.h>
#include <array>
struct A {
A() = default;
~A() {
benchmark::DoNotOptimize(c); // avoid inlining
}
std::array<char, 50000> member;
char c;
};
struct B {
B() {} // user-defined ctor
~B() {
benchmark::DoNotOptimize(c); // avoid inlining
}
std::array<char, 50000> member;
char c;
};
struct C {
// no user-defined ctor
B b;
int dummy;
};
// The benchmark code:
static void ACreation(benchmark::State& state) {
for (auto _ : state) {
A a{};
benchmark::DoNotOptimize(a);
}
}
BENCHMARK(ACreation);
static void BCreation(benchmark::State& state) {
for (auto _ : state) {
B b{};
benchmark::DoNotOptimize(b);
}
}
BENCHMARK(BCreation);
static void CCreation(benchmark::State& state) {
for (auto _ : state) {
C c{};
benchmark::DoNotOptimize(c);
}
}
BENCHMARK(CCreation);
BENCHMARK_MAIN();
---
When I run this with https://github.com/google/benchmark, I get the following
results (with gcc++11.4 and above):
-----------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------
ACreation 736 ns 736 ns 933741
BCreation 3.62 ns 3.62 ns 191180154
CCreation 755 ns 754 ns 944906
The struct "C" which is just "B" and an int is much slower at being initialized
than B when value initialization (via {}) is used. However, my understanding of
the C++ standard is that members with a user-defined default constructor do not
need to be zero-initialized in this situation. Looking at the godbolt assembly
output, I see that both `A a{}` and `C c{}` generate a memset instruction,
while `B b{}` doesn't. Clang, on the other hand, seems to initialize C almost
as fast as B.
This potentially missed optimization in gcc is particularly nasty for structs
with large embedded storage (e.g. structs that contain C-arrays, std::arrays,
or static_vectors).