[Bug libstdc++/101923] New: std::function's move ctor is slower than the copy one for empty source objects
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923 Bug ID: 101923 Summary: std::function's move ctor is slower than the copy one for empty source objects Product: gcc Version: 9.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: dartdart26 at gmail dot com Target Milestone: --- std::function's move constructor calls swap() irrespective of whether the source object is empty or not. In contrast, the copy constructor first checks if the source object is empty and if it is, nothing is being done as the `this` object is constructed in an empty state by _Function_base(). Calling swap() on an empty source requires more work, because some data needs to be copied - for example, the POD data cannot be moved. Could the move constructor check if the source is empty too, as the copy one does? Please let me know if I am missing a rule that prevents that. I have noticed that on version 9.3.0, but I see the code is the same in current master at: https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/include/bits/std_function.h;hb=c22bcfd2f7dc9bb5ad394720f4a612327dc898ba#l391 I have tested on a MacBook M1 and the copy ctor for empty sources is almost 2x faster than the move ctor: - Benchmark Time CPU Iterations - copy0.945 ns0.945 ns555789159 move 1.83 ns 1.83 ns382183169 I have made an YouTube video for describing my findings and the benchmark results: https://www.youtube.com/watch?v=WA3mKab-tn8
[Bug libstdc++/101923] std::function's move ctor is slower than the copy one for empty source objects
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923 --- Comment #1 from Petar Ivanov --- Benchmark code (using Google Benchmark): #include #include #include struct Car {}; static void copy(benchmark::State& state) { for (auto _ : state) { const auto f = std::function{}; const auto copied = f; benchmark::DoNotOptimize(copied); } } static void move(benchmark::State& state) { for (auto _ : state) { auto f = std::function{}; const auto moved = std::move(f); benchmark::DoNotOptimize(moved); } } BENCHMARK(copy); BENCHMARK(move); BENCHMARK_MAIN();
[Bug libstdc++/101923] std::function's move ctor is slower than the copy one for empty source objects
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923 --- Comment #3 from Petar Ivanov --- Thank you for pointing the output on x86! Following that, I checked O2 and O3 on ARM64 and I see differences, though I cannot say what their actual impact is: 02: https://godbolt.org/z/P9Garznef O3: https://godbolt.org/z/Yb1q33YP3 In terms of x86, I ran the benchmark in Quick Bench (I assume x86 as that what the disassembly is) and the results are similar to my findings on ARM64 - move being slower: https://quick-bench.com/q/vK9eSYngutKGo4QSPcdra9gUOI0 The benchmark code seems correct to me, but I might be missing something, might be misusing DoNotOptimize() or there might be some side effects. I am sure this is not a big deal. I was just wondering if adding an if statement is doable and, if yes, it seems like a quick and easy win.
[Bug tree-optimization/101923] std::function's move ctor is slower than the copy one for empty source objects
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923 --- Comment #5 from Petar Ivanov --- (In reply to Andrew Pinski from comment #4) > Hmm > > __tmp = MEM[(union _Any_data & {ref-all})&f]; > MEM[(union _Any_data * {ref-all})&f] = MEM[(union _Any_data & > {ref-all})&moved]; > MEM[(union _Any_data * {ref-all})&moved] = __tmp; > __tmp ={v} {CLOBBER}; > _13 = MEM[(void (*type) (const union _Any_data & {ref-all}, const struct > Car &) &)&f + 24]; > _14 = MEM[(void (*type) (const union _Any_data & {ref-all}, const struct > Car &) &)&moved + 24]; > MEM[(void (*) (const union _Any_data & {ref-all}, const struct Car > &) &)&f + 24] = _14; > MEM[(void (*) (const union _Any_data & {ref-all}, const struct Car > &) &)&moved + 24] = _13; > > So a missed optimization at the gimple level. > But note the arm64 compiler on godbolt is a few months old, 20210528. There > might have been some fixes which improve this already. I see, thank you. Do you think the x86 results on quick bench are something worth improving? From a user's perspective, I assume the expectation is that moves are at least as fast as copies. Could you please advise on how I can proceed with this report? Can a change be made in libstdc++ or should it be considered a compiler issue? Thank you!
[Bug tree-optimization/101923] std::function's move ctor is slower than the copy one for empty source objects
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923 Petar Ivanov changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #9 from Petar Ivanov --- (In reply to CVS Commits from comment #8) > The master branch has been updated by Jonathan Wakely : > > https://gcc.gnu.org/g:0808b0df9c4d31f4c362b9c85fb538b6aafcb517 > > commit r12-2959-g0808b0df9c4d31f4c362b9c85fb538b6aafcb517 > Author: Jonathan Wakely > Date: Tue Aug 17 11:30:56 2021 +0100 > > libstdc++: Optimize std::function move constructor [PR101923] > Thank you! On ARM64, it is now identical to copy: - Benchmark Time CPU Iterations - copy0.948 ns0.948 ns558822565 move0.952 ns0.952 ns729210032