https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78180
Bug ID: 78180 Summary: Poor optimization of std::array on gcc 4.8/5.4/6.2 as compared to simple raw array Product: gcc Version: 6.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: barry.revzin at gmail dot com Target Milestone: --- Here is a complete benchmark comparing a bunch of simple operations on a std::array<int64_t, 128> vs a int64_t[128]. I'm using https://github.com/google/benchmark and compiling with -std=c++11 -O3 -D_GLIBCXX_USE_CXX11_ABI=0: ============================================================= #include <array> #include <benchmark/benchmark_api.h> template <class C> class Rolling { C times_{}; uint32_t idx_; const uint32_t size_; public: Rolling(uint32_t size) : idx_(0) , size_(size) { } void add(int64_t t) { times_[idx_] = t; ++idx_; if (idx_ == size_) { idx_ = 0; } } bool exceeded(int64_t now, int64_t intv) { return now - times_[idx_] < intv; } }; template <class C> void BM_Rolling(benchmark::State& state) { Rolling<C> r(100); int64_t i = 0; int64_t exc = 0; while (state.KeepRunning()) { for (int i = 0; i < state.range(0); ++i) { r.add(i); if (r.exceeded(i, 1000000)) { benchmark::DoNotOptimize(++exc); } } } } #define JOIN(...) __VA_ARGS__ BENCHMARK_TEMPLATE(BM_Rolling, int64_t[128])->Range(8, 8<<10); BENCHMARK_TEMPLATE(BM_Rolling, JOIN(std::array<int64_t, 128>))->Range(8, 8<<10); BENCHMARK_MAIN(); ============================================================= This yields the following performance numbers (similar across 4.8.2, 5.4.0, and 6.2.0): Run on (16 X 3199.66 MHz CPU s) 2016-11-01 15:56:13 Benchmark Time CPU Iterations ------------------------------------------------------------------------------------- BM_Rolling<JOIN(std::array<int64_t, 128>)>/8 18 ns 18 ns 39568747 BM_Rolling<JOIN(std::array<int64_t, 128>)>/64 135 ns 134 ns 5218330 BM_Rolling<JOIN(std::array<int64_t, 128>)>/512 1084 ns 1031 ns 678795 BM_Rolling<JOIN(std::array<int64_t, 128>)>/4k 8221 ns 8185 ns 85583 BM_Rolling<JOIN(std::array<int64_t, 128>)>/8k 16975 ns 16520 ns 42752 BM_Rolling<int64_t[128]>/8 15 ns 15 ns 45940368 BM_Rolling<int64_t[128]>/64 112 ns 111 ns 6301196 BM_Rolling<int64_t[128]>/512 821 ns 817 ns 858168 BM_Rolling<int64_t[128]>/4k 6538 ns 6496 ns 108570 BM_Rolling<int64_t[128]>/8k 12957 ns 12902 ns 53582 That is a large performance gap between std::array and raw array, where I wouldn't expect any. When compiling with clang, I don't see any gap at all (though for both containers, the performance is significantly worse than gcc's).