[Bug c++/96750] New: 10-12% performance decrease in benchmark going from GCC8 to GCC9/GCC10

2020-08-22 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96750

Bug ID: 96750
   Summary: 10-12% performance decrease in benchmark going from
GCC8 to GCC9/GCC10
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mattreecebentley at gmail dot com
  Target Milestone: ---

Created attachment 49102
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49102&action=edit
Compiler output

Have recently been working on a new version of the plf::colony container
(plflib.org) and found GCC9 was giving ~10% worse performance on average in a
given benchmark than GCC8. Further investigation found GCC10 was just as bad.

The effect is repeatable across architectures - I've tested on xubuntu, windows
running nuwen mingw, and on Core2 and Haswell CPUs, with and without
-march=native specified.

Compiler flags are: -O2;-march=native;-std=c++17

Code presented is with an absolute minimum use-case - other benchmarks have not
shown such strong performance differences - including both simpler and more
complex tests.
So I cannot reduce further, please do not ask me to do so.

The benchmark in question inserts into a container initially then iterates over
container elements repeatedly, randomly erasing and/or inserting new elements.

Compilers/environments used:
Xubuntu 20: GCC8.4, GCC9.3, GCC10.0.1
Windows 7: Nuwen mingw GCC8.2, nuwen mingw GCC9.2

The attached code output is from the Xubuntu environment.

Any questions let me know. I will help where I can, but my knowledge of
assembly is limited.

Information on code components:
Nanotimer is a ~nanosecond-precision sub-timeslice cross-platform timer.
Colony is a bucket-array-like unordered sequence container.

The attached zip contains the build logs and compiler preprocessed outputs for
GCC 8.4, 9.3 and 10.0.1

Thanks-
Mat

[Bug middle-end/96750] 10-12% performance decrease in benchmark going from GCC8 to GCC9/GCC10

2020-08-24 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96750

--- Comment #4 from Matt Bentley  ---
(In reply to Marc Glisse from comment #2)
> (In reply to Martin Liška from comment #1)
> > after:
> > 1794240.0
> > 
> > before:
> > 1802710.0
> 
> That's less than 1% of difference (with "after" better than "before"), not
> the 10% regression claimed, maybe there is another relevant commit?

See the .ods spreadsheet in the zip for my results with same code.

[Bug libstdc++/86471] New: GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-10 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

Bug ID: 86471
   Summary: GCC/libstdc++ outputs inferior code for std::fill and
std::fill_n vs std::memset on c-style arrays
   Product: gcc
   Version: 7.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mattreecebentley at gmail dot com
  Target Milestone: ---

Test setup: 

Xubuntu 18, Core2Duo E8500 CPU, GCC 7.3



Results in release mode (-O2 -march=native):
=

2018-07-10 21:28:11
Running ./google_benchmark_test
Run on (2 X 3800.15 MHz CPU s)
CPU Caches:
  L1 Data 32K (x2)
  L1 Instruction 32K (x2)
  L2 Unified 6144K (x1)
-
Benchmark  Time   CPU Iterations
-
memory_filln   16488 ns  16477 ns  42460
memory_fill16493 ns  16493 ns  42440
memory_memset   8414 ns   8408 ns  83022



Results in debug mode (-O0):
=

2018-07-10 21:48:09
Running ./google_benchmark_test
Run on (2 X 3800.15 MHz CPU s)
CPU Caches:
  L1 Data 32K (x2)
  L1 Instruction 32K (x2)
  L2 Unified 6144K (x1)
-
Benchmark  Time   CPU Iterations
-
memory_filln   87209 ns  87139 ns   8029
memory_fill94593 ns  94533 ns   7411
memory_memset   8441 ns   8434 ns  82833




Code:
==

// Uses Google Benchmark. Rearrange the code any way you want, results stay
similar:
#include 
#include 
#include 


static void memory_memset(benchmark::State& state)
{
int ints[5];

for (auto _ : state)
{
std::memset(ints, 0, sizeof(int) * 5);
}
}


static void memory_filln(benchmark::State& state)
{
int ints[5];

for (auto _ : state)
{
std::fill_n(ints, 5, 0);
}
}


static void memory_fill(benchmark::State& state)
{
int ints[5];

for (auto _ : state)
{
std::fill(std::begin(ints), std::end(ints), 0);
}
}


// Register the function as a benchmark
BENCHMARK(memory_filln);
BENCHMARK(memory_fill);
BENCHMARK(memory_memset);



int main (int argc, char ** argv)
{
benchmark::Initialize (&argc, argv);
benchmark::RunSpecifiedBenchmarks ();
return 0;
}

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-10 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #2 from Matt Bentley  ---
(In reply to Andrew Pinski from comment #1)
> Does -O3 bring the performance back?  -O3 enables loop distrubtion which
> should be able to detect these loops.

Just tested - yes.
However this is a fairly trivial testcase and not all projects compile with O3
(most don't).
Clang by comparison optimizes this out at -O2.

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-10 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #3 from Matt Bentley  ---
I thought I should note that there is also a missing optimization opportunity
in the code. Clang optimizes the code I've listed to remove the benchmark loops
entirely since it detects that the arrays aren't actually being used for
anything. 

In order to get clang to benchmark it properly, I had to add a loop which adds
the array contents to a total post-benchmark, as follows:

#include 
#include 
#include 

double total = 0;


static void memory_memset(benchmark::State& state)
{
int ints[5];

for (auto _ : state)
{
std::memset(ints, 0, sizeof(int) * 5);
}

for (int counter = 0; counter != 5; ++counter)
{
total += ints[counter];
}
}


static void memory_filln(benchmark::State& state)
{
int ints[5];

for (auto _ : state)
{
std::fill_n(ints, 5, 0);
}

for (int counter = 0; counter != 5; ++counter)
{
total += ints[counter];
}
}


static void memory_fill(benchmark::State& state)
{
int ints[5];

for (auto _ : state)
{
std::fill(std::begin(ints), std::end(ints), 0);
}

for (int counter = 0; counter != 5; ++counter)
{
total += ints[counter];
}
}


// Register the function as a benchmark
BENCHMARK(memory_filln);
BENCHMARK(memory_fill);
BENCHMARK(memory_memset);



int main (int argc, char ** argv)
{
benchmark::Initialize (&argc, argv);
benchmark::RunSpecifiedBenchmarks ();
printf("Total = %f\n", total);
getchar();
return 0;
}

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-12 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #5 from Matt Bentley  ---
What would be even more useful is a warning: for unused data.

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-12 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #6 from Matt Bentley  ---
Suggested patch for libstdc++, std_algobase.h, line 688:
  template
inline typename
__gnu_cxx::__enable_if<__is_scalar<_Tp>::__value, void>::__type
__fill_a(_ForwardIterator __first, _ForwardIterator __last,
 const _Tp& __value)
{
  if (__value != reinterpret_cast<_Tp>(0))
  {
  const _Tp __tmp = __value;
  for (; __first != __last; ++__first)
*__first = __tmp;
}
else
{
  if (const size_t __len = __last - __first)
__builtin_memset(reinterpret_cast(__first), 0, __len *
sizeof(_Tp));
}
}

Comments?

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-12 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #8 from Matt Bentley  ---
> This is incorrect for floating point types and non scalars.  And it
> introduces an extra check at runtime if value is not known to compile time.

This is the overload for scalar types, read the function description line.
The extra check is negligible compared to the overhead caused by the
alternative looping code vs memset, as is benchmarked above.

I had to read up on how floating-point is implementation-defined, so yes you're
right, the specialization would have to be further constricted to integral and
scalar pointer types using __is_integral_helper & __is_scalar.
Whether the commonality of zero-wiping newly allocated arrays outweights the
overhead of an additional check for non-zero fills, is a good question.

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-15 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #10 from Matt Bentley  ---
(In reply to Marc Glisse from comment #9)
> (In reply to Andrew Pinski from comment #7)
> We could also use __builtin_constant_p, if the function is inlined often
> enough (don't know if it is).

Right - so then the proposed function becomes:

template
inline typename
__gnu_cxx::__enable_if<__is_pointer_helper<_Tp>::__value ||
__is_integral_helper<_Tp>::__value, void>::__type
__fill_a(_ForwardIterator __first, _ForwardIterator __last,
 const _Tp& __value)
{
  if (__builtin_constant_p(__value) == 1)
  {
if (__value != reinterpret_cast<_Tp>(0))
  {
  const _Tp __tmp = __value;
  for (; __first != __last; ++__first)
*__first = __tmp;
}
else
{
  if (const size_t __len = __last - __first)
__builtin_memset(reinterpret_cast(__first), 0,
__len *
sizeof(_Tp));
}
}
else
{
  const _Tp __tmp = __value;
  for (; __first != __last; ++__first)
*__first = __tmp;
}
}

, if I'm getting the enable_if syntax correct?

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-16 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #13 from Matt Bentley  ---
(In reply to Jonathan Wakely from comment #12)
> Also you're doing a reinterpret_cast from an arbitrary iterator type, which
> is not necessarily a pointer, or even a random access iterator.
> 
> Since you don't have a copyright assignment in place please leave the patch
> to us, this is less than helpful :-)

Well it's more that you're doing- at any rate, the issue you've noted is easily
bypassed by changing the "reinterpret_cast(__first)" to
"reinterpret_cast(&*(__first))".
Cheers.

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-16 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #14 from Matt Bentley  ---
(In reply to Matt Bentley from comment #13)
> (In reply to Jonathan Wakely from comment #12)
> > Also you're doing a reinterpret_cast from an arbitrary iterator type, which
> > is not necessarily a pointer, or even a random access iterator.
> > 
> > Since you don't have a copyright assignment in place please leave the patch
> > to us, this is less than helpful :-)
> 
> Well it's more that you're doing- at any rate, the issue you've noted is
> easily bypassed by changing the "reinterpret_cast(__first)" to
> "reinterpret_cast(&*(__first))".
> Cheers.

My bad, I missed the point about the memory not necessarily being contiguous.

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-16 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #16 from Matt Bentley  ---
(In reply to Jonathan Wakely from comment #15)

> Look at how this exact problem is already solved elsewhere in the same file.
> All the algorithms that dispatch to a __builtin_memxxx function use similar
> techniques, detecting when the iterators are pointers for a start.

Yup - arguably it would be nice to have an overload for vector iterators that
does the same thing.


> And see https://gcc.gnu.org/contribute.html#legal for why code pasted into
> bugzilla without the necessary paperwork is unhelpful (and possibly even
> counter-productive if we end up having to re-invent the same wheel without
> using your code).

I've read it, it says copyright assignments are unneeded for small
contributions. Also, this wasn't patch code, just spitballing code.


> Indeed. It's a forward iterator. Even a random access iterator doesn't
> guarantee you can do that (e.g. std::deque::iterator is random access but
> not contiguous).

Yep, got it.

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-17 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #19 from Matt Bentley  ---
(In reply to Jonathan Wakely from comment #18)
> (In reply to Matt Bentley from comment #13)
> > Well it's more that you're doing- at any rate, the issue you've noted is
> > easily bypassed by changing the "reinterpret_cast(__first)" to
> > "reinterpret_cast(&*(__first))".
> 
> Also, independent of the non-contiguous problem, using reinterpret_cast here
> is unnecessary (any non-const pointer can be implicitly converted to void*)
> and would prevent adding constexpr to the algorithm (as required for C++2a).

It is to prevent compiler warnings under clang.

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-17 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #20 from Matt Bentley  ---
(In reply to Matt Bentley from comment #19)
> (In reply to Jonathan Wakely from comment #18)
> > (In reply to Matt Bentley from comment #13)
> > > Well it's more that you're doing- at any rate, the issue you've noted is
> > > easily bypassed by changing the "reinterpret_cast(__first)" to
> > > "reinterpret_cast(&*(__first))".
> > 
> > Also, independent of the non-contiguous problem, using reinterpret_cast here
> > is unnecessary (any non-const pointer can be implicitly converted to void*)
> > and would prevent adding constexpr to the algorithm (as required for C++2a).
> 
> It is to prevent compiler warnings under clang.

Actually, don't quote me on that - I may be thinking of the
'reinterpret_cast<_Tp>(0)' - one of the two.

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-19 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #22 from Matt Bentley  ---
(In reply to Jonathan Wakely from comment #21)
> Surely static_cast is good enough, and doesn't rule out making the function
> constexpr?

It may well be, how does reinterpret_cast cause constexpr to fail, given that
it's also compile-time?

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-19 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #23 from Matt Bentley  ---

> Actually, don't quote me on that - I may be thinking of the
> 'reinterpret_cast<_Tp>(0)' - one of the two.

Just to confirm, "reinterpret_cast(__first)" not required in this
context,  either "reinterpret_cast<_Tp>(0)" or "static_cast<_Tp>(0)" *are*
required to avoid warnings in clang when _Tp is a pointer. Either works fine.
I understand that reinterpret_cast isn't allowed inside constexpr, but not why,
and can't find any resources explicitly stating the reasoning.
But __builtin_constant_p allows it, so it's use is a matter of programmer
choice, at least in this context.

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-19 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #24 from Matt Bentley  ---
Ugh. I made a mistake. Clang only throws the warning when comparing NULL with 0
without reinterpret_cast/static_cast, not when comparing pointers in general.

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-24 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #26 from Matt Bentley  ---
(In reply to Jonathan Wakely from comment #25)

> What warning? Why can't you just pass 0 to __builtin_memset? It's a null
> pointer constant. I don't see any warning from clang when using -Weverything.

As corrected above, clang only issues the warning when comparing NULL to 0
without static_cast or reinterpret_cast.


> reinterpet_cast is forbidden in constexpr functions because it's purpose is
> to break the type system and say "trust me, I know what I'm doing", and such
> tricks are not allowed in constant expressions.

Yeah, but... why


> Using reinterpert_cast to convert 0 (a null pointer constant) into a pointer
> type is just silly and poor style. That conversion can be done implicitly,
> it doesn't need a sledgehammer to be used.

That's a bit of silly hyperbole I can do without.


> > But __builtin_constant_p allows it, so it's use is a matter of programmer
> > choice, at least in this context.
> 
> It really isn't if the standard requires the algorithm to be 'constexpr'
> (which we don't implement yet, but there's no point adding constructs which
> will just make life harder in the future).

Fair enough-
M@

[Bug middle-end/96750] 10-12% performance decrease in benchmark going from GCC8 to GCC9/GCC10

2020-09-27 Thread mattreecebentley at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96750

--- Comment #5 from Matt Bentley  ---
If anyone out there is interested in working on this, 
I found the smallest change possible to create the same performance as GCC8- 
it is literally eliminating one branch possibility in one function
(move-insert).

The branch in question questions whether there are existing memory blocks to
re-use or if we need to create a new memory block. For example, if the
reserve() function has been called there will be existing memory blocks to
re-use.

However the performance drop occurs whether or not there are memory blocks to
reuse. The actual if statement is irrelevant. I have tested and can remove all
instances of memory block storage (reserve(), erase()) and problem still exists
if this one branch is still in insert.

I've attached the source files to demonstrate this above, including one
plf_colony.h with the branch removed (renamed to plf_colony_fast.h), so you can
see what difference there is.
This code is all zlib license, free to share, but is early beta so don't
redistribute please.

Thanks,
Matt

ps. For consistency I've also removed the non-move-insert and emplace instances
of this branch statement, even though they won't be called by the benchmark
code in a C++11-compliant compiler.

[Bug middle-end/96750] 10-12% performance decrease in benchmark going from GCC8 to GCC9/GCC10

2020-09-27 Thread mattreecebentley at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96750

--- Comment #6 from Matt Bentley  ---
Created attachment 49278
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49278&action=edit
Demonstration of code which doesn't trigger the performance anomaly.

plf_colony_fast.h does not trigger the problem, has one branch eliminated in
each insert/emplace function.

[Bug c++/100956] New: Unused variable warnings ignore "if constexpr" blocks where variables are conditionally used

2021-06-07 Thread mattreecebentley at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100956

Bug ID: 100956
   Summary: Unused variable warnings ignore "if constexpr" blocks
where variables are conditionally used
   Product: gcc
   Version: 9.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mattreecebentley at gmail dot com
  Target Milestone: ---

For example, as part of a container class:

void remove_memory_blocks(pointer back_element_in_final_block)
{
   if constexpr(!std::is_trivially_destructible::value)
   {
  // destroy each element in each memory block until the back_element is
reached
   }

   // remove memory blocks
}


If element_type is_trivially_destructible, G++ will warn that
back_element_in_final_block is unused every time the function is called.

Ideally this warning should be performed before constexpr blocks are removed by
a parser (I have no idea what the procedure is for GCC, I'm just guessing here)

[Bug c++/100956] Unused variable warnings ignore "if constexpr" blocks where variables are conditionally used

2021-06-08 Thread mattreecebentley at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100956

--- Comment #2 from Matt Bentley  ---
Thank you - I'm aware GCC might optimize it out (and failed to test with
GCC10), at least in O2 mode, but other compilers might not, hence the code.

[Bug c++/117008] New: -march=native pessimization of 25% with bitset

2024-10-07 Thread mattreecebentley at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117008

Bug ID: 117008
   Summary: -march=native pessimization of 25% with bitset
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mattreecebentley at gmail dot com
  Target Milestone: ---

Created attachment 59292
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59292&action=edit
ii file

Overview: Found a sequence of code using bitset where using -march=native and
-O2 is 25% slower than just -O2, under a Intel i7-9750H. Repeatable, also
occurs on an Intel i7-3770 but with a much lower decrease in performance
(around ~5%).

At -O2 runtime duration is ~15 seconds, -march=native;-O2 it's ~20 seconds.

Have used a PCG-based rand() in the code since regular rand() slows down the
program by 5x (difference in runtime duration between -march=native;-O2 and -O2
is still the same, but percentage change is dramatically influenced by rand
taking all the CPU time).

Code adds values to a total to prevent the code being optimized out. And yes,
.count() would've probably been more typical code. 


GCC version: 13.2.0


System type: x86_64-w64-mingw32, windows, x64


Configured with:

 ../src/configure --enable-languages=c,c++ --build=x86_64-w64-mingw32
--host=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 --disable-multilib
--prefix=/e/temp/gcc/dest --with-sysroot=/e/temp/gcc/dest
--disable-libstdcxx-pch --disable-libstdcxx-verbose --disable-nls
--disable-shared --disable-win32-registry --enable-threads=posix
--enable-libgomp --with-zstd=/c/mingw


The complete command line that triggers the bug, for -O2 only: 

C:/programming/libraries/nuwen/bin/g++.exe  -c 
"C:/programming/workspaces/march_pessimisation_demo/march_pessimisation_demo.cpp"
-O2 -std=c++23 -s -save-temps -DNDEBUG  -o
build-Release/march_pessimisation_demo.cpp.o -I. -I.
C:/programming/libraries/nuwen/bin/g++.exe -o
build-Release\bin\march_pessimisation_demo.exe @build-Release/ObjectsList.txt
-L.   -O2 -s

Or for -march=native:

C:/programming/libraries/nuwen/bin/g++.exe  -c 
"C:/programming/workspaces/march_pessimisation_demo/march_pessimisation_demo.cpp"
-O2 -march=native -std=c++23 -s -save-temps -DNDEBUG  -o
build-Release/march_pessimisation_demo.cpp.o -I. -I.
C:/programming/libraries/nuwen/bin/g++.exe -o
build-Release\bin\march_pessimisation_demo.exe @build-Release/ObjectsList.txt
-L. -s


The compiler output (error messages, warnings, etc.):

No errors or warnings.

[Bug target/117008] -march=native pessimization of 25% with bitset []

2024-10-08 Thread mattreecebentley at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117008

--- Comment #11 from Matt Bentley  ---
Created attachment 59294
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59294&action=edit
-v build output for -o2 only in second "if" variant

 As requested

[Bug target/117008] -march=native pessimization of 25% with bitset []

2024-10-08 Thread mattreecebentley at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117008

--- Comment #9 from Matt Bentley  ---
(In reply to Jonathan Wakely from comment #7)

> I'm certain it won't be, because (apart from vector) they access
> memory locations that are at least one byte, not repeating very similar
> bitwise ops for every bit in a word.

Have tested 4 additional scenarios: random access, sequential iteration with an
if statement, the vector equivalent, and sequential iteration with an if
statement and a more complicated action.

Can confirm that the issue only occurs in the second of those, ie.:
{
std::bitset<1280> values;

for (unsigned int counter = 0; counter != 500; ++counter)
{
for (unsigned int index = 0; index != 1280;
++index)
{
values.set(index, plf::rand() & 1);
}


for (unsigned int index = 0; index != 1280;
++index)
{
if (values[index])
{
total += values[index] * 2;
}
}
}
}


While this statement is closer to what you might see in real world code ie. if
1, do this,
I'm assuming it gets optimized into a conditional move or something, as the
same 25% slowdown occurs.
Once you add in a more complicated action ie.:

for (unsigned int index = 0; index != 1280;
++index)
{
if (values[index])
{
total += values[index] * 2;
total ^= values[index];
}
}


The problem goes away. So essentially Andrew's correct, unless the user's
chosen action can be trivially optimized.

Will generate the -v output for the second of the scenarios now.

[Bug target/117008] -march=native pessimization of 25% with bitset []

2024-10-08 Thread mattreecebentley at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117008

--- Comment #10 from Matt Bentley  ---
Created attachment 59293
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59293&action=edit
-v build output for -o2;-march=native for second "if" variant

[Bug c++/116931] False "duplicate 'const'" errors when typedefs used on pointer types in function definitions

2024-10-01 Thread mattreecebentley at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116931

--- Comment #5 from Matt Bentley  ---
Nevermind, I realised my mistake.

[Bug target/117008] -march=native pessimization of 25% with bitset popcount

2024-10-07 Thread mattreecebentley at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117008

--- Comment #4 from Matt Bentley  ---
Yeah, I know, I mentioned that in the report.

It's not a bad benchmark, it's benchmarking access of individual consecutive
bits, not summing. The counting is merely for preventing the compiler from
optimizing out the loop.

I could equally make it benchmark random indices and I imagine the problem
would remain, though I haven't checked.

Still, your point is valid in that most non-benchmark code would likely have
more code around the access. Could potentially lead to misleading benchmark
results in other scenarios though. I haven't tested whether vector/array
indexing triggers the same bad vectorisation.

[Bug target/117008] -march=native pessimization of 25% with bitset popcount

2024-10-07 Thread mattreecebentley at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117008

--- Comment #5 from Matt Bentley  ---
(In reply to Andrew Pinski from comment #1)
> Can you provide the output of invoking g++ with -march=native and when
> compiling?

The .ii files were identical, so did you you mean .o files?

[Bug c++/116931] New: False "duplicate 'const'" errors when typedefs used on pointer types in function definitions

2024-10-01 Thread mattreecebentley at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116931

Bug ID: 116931
   Summary: False "duplicate 'const'" errors when typedefs used on
pointer types in function definitions
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mattreecebentley at gmail dot com
  Target Milestone: ---

Created attachment 59263
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59263&action=edit
Preprocessed file of bug demonstration.

This bug is problematic for scenarios where the dev is obtaining a pointer type
from the allocator eg. std::allocator_traits::pointer,
but will show up anywhere typedefs are used for pointer types.

In short, const int * const works, const int_pointer const doesn't under GCC.


the exact version of GCC: 

13.2.0


the system type: 

x86_64-w64-mingw32, windows, x64


Configured with:

 ../src/configure --enable-languages=c,c++ --build=x86_64-w64-mingw32
--host=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 --disable-multilib
--prefix=/e/temp/gcc/dest --with-sysroot=/e/temp/gcc/dest
--disable-libstdcxx-pch --disable-libstdcxx-verbose --disable-nls
--disable-shared --disable-win32-registry --enable-threads=posix
--enable-libgomp --with-zstd=/c/mingw


the complete command line that triggers the bug: 

C:/programming/libraries/nuwen/bin/g++.exe  -c 
"C:/programming/workspaces/bugreport/bugreport/bug_report.cpp" -g -Wall
-save-temps  -o build-Debug/bug_report.cpp.o -I. -I.


the compiler output (error messages, warnings, etc.):

C:/programming/workspaces/bugreport/bugreport/bug_report.cpp:3:37: error:
duplicate 'const'
3 | int test_fail(const integer_pointer const p)
  | ^
  | -

[Bug c++/116931] False "duplicate 'const'" errors when typedefs used on pointer types in function definitions

2024-10-01 Thread mattreecebentley at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116931

--- Comment #2 from Matt Bentley  ---
Enlighten me here, how is this invalid C++ code?
Because a typedef should mean that int_pointer is int *, so it should work,
right? 
If it's invalid merely because it's typedef'd instead of verbatim, it's a
language defect.

[Bug c++/116931] False "duplicate 'const'" errors when typedefs used on pointer types in function definitions

2024-10-01 Thread mattreecebentley at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116931

--- Comment #4 from Matt Bentley  ---
Okay, so that's a language defect, because const int * const isn't invalid, so
neither should a typedef of the same.

[Bug target/117008] -march=native pessimization of 25% with bitset [] and 20% with bool array []

2024-10-15 Thread mattreecebentley at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117008

--- Comment #13 from Matt Bentley  ---
Bool C-arrays also display the same problem, but only if a warm-up loop is
present before the main loop (to 'warm up' the cache). I discovered this by
accident the other day. char arrays and int arrays did not display the same
slowdown.

A pre-mainloop warm up is quite common for microbenchmarks to get the data in
cache. I hadn't done that with the previous additional tests mentioned above,
but once I did I found that vector had the same -march=native
pessimisation.

I'm not assuming that the cause of this is the same, but the result certainly
is (between 10%-20% slowdown under -march=native).

Attaching the .ii files for the 2 new cases.

[Bug target/117008] -march=native pessimization of 25% with bitset [] and 20% with bool array []

2024-10-15 Thread mattreecebentley at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117008

--- Comment #14 from Matt Bentley  ---
Created attachment 59355
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59355&action=edit
.ii file for vector code causing same issue

[Bug target/117008] -march=native pessimization of 25% with bitset [] and 20% with bool array []

2024-10-15 Thread mattreecebentley at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117008

--- Comment #15 from Matt Bentley  ---
Created attachment 59356
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59356&action=edit
.ii file for bool[array] causing same issue