[Bug libstdc++/101923] New: std::function's move ctor is slower than the copy one for empty source objects

2021-08-15 Thread dartdart26 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923

Bug ID: 101923
   Summary: std::function's move ctor is slower than the copy one
for empty source objects
   Product: gcc
   Version: 9.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dartdart26 at gmail dot com
  Target Milestone: ---

std::function's move constructor calls swap() irrespective of whether the
source object is empty or not. In contrast, the copy constructor first checks
if the source object is empty and if it is, nothing is being done as the `this`
object is constructed in an empty state by _Function_base().

Calling swap() on an empty source requires more work, because some data needs
to be copied - for example, the POD data cannot be moved.

Could the move constructor check if the source is empty too, as the copy one
does? Please let me know if I am missing a rule that prevents that.

I have noticed that on version 9.3.0, but I see the code is the same in current
master at:
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/include/bits/std_function.h;hb=c22bcfd2f7dc9bb5ad394720f4a612327dc898ba#l391

I have tested on a MacBook M1 and the copy ctor for empty sources is almost 2x
faster than the move ctor:

-
Benchmark   Time CPU   Iterations
-
copy0.945 ns0.945 ns555789159
move 1.83 ns 1.83 ns382183169

I have made an YouTube video for describing my findings and the benchmark
results:
https://www.youtube.com/watch?v=WA3mKab-tn8

[Bug libstdc++/101923] std::function's move ctor is slower than the copy one for empty source objects

2021-08-15 Thread dartdart26 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923

--- Comment #1 from Petar Ivanov  ---
Benchmark code (using Google Benchmark):

#include 

#include 
#include 

struct Car {};

static void copy(benchmark::State& state) {
  for (auto _ : state) {
const auto f = std::function{};
const auto copied = f;
benchmark::DoNotOptimize(copied);
  }
}

static void move(benchmark::State& state) {
  for (auto _ : state) {
auto f = std::function{};
const auto moved = std::move(f);
benchmark::DoNotOptimize(moved);
  }
}

BENCHMARK(copy);
BENCHMARK(move);

BENCHMARK_MAIN();

[Bug libstdc++/101923] std::function's move ctor is slower than the copy one for empty source objects

2021-08-16 Thread dartdart26 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923

--- Comment #3 from Petar Ivanov  ---
Thank you for pointing the output on x86!

Following that, I checked O2 and O3 on ARM64 and I see differences, though I
cannot say what their actual impact is:

02: https://godbolt.org/z/P9Garznef

O3: https://godbolt.org/z/Yb1q33YP3

In terms of x86, I ran the benchmark in Quick Bench (I assume x86 as that what
the disassembly is) and the results are similar to my findings on ARM64 - move
being slower:
https://quick-bench.com/q/vK9eSYngutKGo4QSPcdra9gUOI0

The benchmark code seems correct to me, but I might be missing something, might
be misusing DoNotOptimize() or there might be some side effects.

I am sure this is not a big deal. I was just wondering if adding an if
statement is doable and, if yes, it seems like a quick and easy win.

[Bug tree-optimization/101923] std::function's move ctor is slower than the copy one for empty source objects

2021-08-16 Thread dartdart26 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923

--- Comment #5 from Petar Ivanov  ---
(In reply to Andrew Pinski from comment #4)
> Hmm
> 
>   __tmp = MEM[(union _Any_data & {ref-all})&f];
>   MEM[(union _Any_data * {ref-all})&f] = MEM[(union _Any_data &
> {ref-all})&moved];
>   MEM[(union _Any_data * {ref-all})&moved] = __tmp;
>   __tmp ={v} {CLOBBER};
>   _13 = MEM[(void (*type) (const union _Any_data & {ref-all}, const struct
> Car &) &)&f + 24];
>   _14 = MEM[(void (*type) (const union _Any_data & {ref-all}, const struct
> Car &) &)&moved + 24];
>   MEM[(void (*) (const union _Any_data & {ref-all}, const struct Car
> &) &)&f + 24] = _14;
>   MEM[(void (*) (const union _Any_data & {ref-all}, const struct Car
> &) &)&moved + 24] = _13;
> 
> So a missed optimization at the gimple level.
> But note the arm64 compiler on godbolt is a few months old, 20210528.  There
> might have been some fixes which improve this already.

I see, thank you.

Do you think the x86 results on quick bench are something worth improving? From
a user's perspective, I assume the expectation is that moves are at least as
fast as copies.

Could you please advise on how I can proceed with this report? Can a change be
made in libstdc++ or should it be considered a compiler issue?

Thank you!

[Bug tree-optimization/101923] std::function's move ctor is slower than the copy one for empty source objects

2021-08-17 Thread dartdart26 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923

Petar Ivanov  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Petar Ivanov  ---
(In reply to CVS Commits from comment #8)
> The master branch has been updated by Jonathan Wakely :
> 
> https://gcc.gnu.org/g:0808b0df9c4d31f4c362b9c85fb538b6aafcb517
> 
> commit r12-2959-g0808b0df9c4d31f4c362b9c85fb538b6aafcb517
> Author: Jonathan Wakely 
> Date:   Tue Aug 17 11:30:56 2021 +0100
> 
> libstdc++: Optimize std::function move constructor [PR101923]
> 

Thank you!

On ARM64, it is now identical to copy:

-
Benchmark   Time CPU   Iterations
-
copy0.948 ns0.948 ns558822565
move0.952 ns0.952 ns729210032