[Bug c/96793] New: __builtin_floor produces wrong result when rounding direction is FE_DOWNWARD

2020-08-26 Thread chfast at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96793

Bug ID: 96793
   Summary: __builtin_floor produces wrong result when rounding
direction is FE_DOWNWARD
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

Created attachment 49125
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49125&action=edit
Preprocessed test code

If the rounding direction is set to FE_DOWNWARD by fesetround(),
the __builtin_floor() result is -0 where it should be +0. Eg.
__builtin_floor(0.25) == -0.

The outputs are done with GCC 10.2.0 (x86_64-linux-gnu) with -O2
-frounding-math -lm. Full gcc -v output in cli.log file. This is also
reproducible in GCC 9 and trunk.


The test code to reproduce the bug checks if the result's sign is 0 as expected
(attached as builtin_floor_test.c):

enum { FE_DOWNWARD = 0x400 };

extern int fesetround(int rounding_direction);

__attribute__((noinline))
float builtin_floorf(float value)
{
return __builtin_floorf(value);
}

int main()
{
fesetround(FE_DOWNWARD);
float result = builtin_floorf(0.25f);
return __builtin_signbitf(result) != 0;
}


The __builtin_floor() generates the following assembly (part of
builtin_floor_test.s):

movss   .LC1(%rip), %xmm2
movss   .LC0(%rip), %xmm4
movaps  %xmm0, %xmm3
movaps  %xmm0, %xmm1
andps   %xmm2, %xmm3
ucomiss %xmm3, %xmm4
jbe .L2
cvttss2sil  %xmm0, %eax
pxor%xmm3, %xmm3
andnps  %xmm1, %xmm2
cvtsi2ssl   %eax, %xmm3
movaps  %xmm3, %xmm4
cmpnless%xmm0, %xmm4
movss   .LC2(%rip), %xmm0
andps   %xmm0, %xmm4
subss   %xmm4, %xmm3
movaps  %xmm3, %xmm0
orps%xmm2, %xmm0
.L2:
ret

[Bug c/96793] __builtin_floor produces wrong result when rounding direction is FE_DOWNWARD

2020-08-26 Thread chfast at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96793

--- Comment #2 from Paweł Bylica  ---
Created attachment 49127
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49127&action=edit
Assembly output

[Bug c/96793] __builtin_floor produces wrong result when rounding direction is FE_DOWNWARD

2020-08-26 Thread chfast at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96793

--- Comment #1 from Paweł Bylica  ---
Created attachment 49126
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49126&action=edit
Test source code

[Bug c/96793] __builtin_floor produces wrong result when rounding direction is FE_DOWNWARD

2020-08-26 Thread chfast at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96793

--- Comment #3 from Paweł Bylica  ---
Created attachment 49128
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49128&action=edit
Compiler invocation log

[Bug c/96793] __builtin_floor produces wrong result when rounding direction is FE_DOWNWARD

2020-08-26 Thread chfast at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96793

--- Comment #4 from Paweł Bylica  ---
I missed some information:

This affects both double and float variants: __builtin_floor() and
__builtin_floorf().

This affects also usage of floor() C standard library function as the function
call usually replaced with __builtin_floor() in optimized builds.

This affects also libstdc++ where std::floor() is implemented with
__builtin_floor().

[Bug c/96804] New: Arguments are swapped in floating-point addition

2020-08-26 Thread chfast at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96804

Bug ID: 96804
   Summary: Arguments are swapped in floating-point addition
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

Created attachment 49132
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49132&action=edit
C source code

In the following function, when compiling without optimizations -O0 with GCC
10.2.0 (x86_64-linux-gnu) the arguments to the add instructions are swapped.

float fadd(const float* a, const float* b)
{
return *a + *b;
}

Assembly snippet:

movq%rdi, -8(%rbp)   # a
movq%rsi, -16(%rbp)  # b
movq-8(%rbp), %rax   # a
movss   (%rax), %xmm1# *a
movq-16(%rbp), %rax  # b
movss   (%rax), %xmm0# *b
addss   %xmm1, %xmm0 # = *b + *a

This is a problem because when both arguments are NaNs, the result may be
different than the one predicted by IEEE 754.

[Bug c/96804] Arguments are swapped in floating-point addition

2020-08-26 Thread chfast at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96804

--- Comment #1 from Paweł Bylica  ---
Created attachment 49133
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49133&action=edit
Assembly output

[Bug c/96804] Arguments are swapped in floating-point addition

2020-08-26 Thread chfast at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96804

--- Comment #3 from Paweł Bylica  ---
Yes, you are right, that is not violation of IEEE 754 (I assumed wrongly
previously, sorry).

However, it still maybe undesired to get different binary results depending on
optimization enabled/disabled.

[Bug target/96793] __builtin_floor produces wrong result when rounding direction is FE_DOWNWARD

2020-08-28 Thread chfast at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96793

--- Comment #8 from Paweł Bylica  ---
Did you consider fixing the __builtin_floor() implementation?

[Bug target/96793] __builtin_floor produces wrong result when rounding direction is FE_DOWNWARD

2020-08-28 Thread chfast at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96793

--- Comment #15 from Paweł Bylica  ---
(In reply to Marc Glisse from comment #14)
> (In reply to Marc Glisse from comment #13)
> > if (HONOR_SIGNED_ZEROS (mode))
> >   x2 = copysign (x2, x);
> 
> Hmm, I misread the comment, sorry. We already do that, for both floor and
> ceil. But we don't use a true copysign, we use ix86_sse_copysign_to_positive
> which won't be able to change the sign from - to +. Just changing it to a
> true copysign (one extra and or andn) should be enough then?

Yes, having full copysign should do the job.

[Bug target/96793] __builtin_floor produces wrong result when rounding direction is FE_DOWNWARD

2020-08-28 Thread chfast at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96793

--- Comment #16 from Paweł Bylica  ---
I have checked the glibc implementation of floorf().
Source here:
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/flt-32/s_floorf.c;h=da6c6dfa8ae86129e74d2e4391fac3a3c2ec;hb=HEAD

- It has variant for SSE 4.1 using ROUNDSS 9 - all good here.
- Otherwise it either uses __builtin_floorf() (so bug here),
- or generic implementation based on bit manipulations (so should be rounding
direction independent).

[Bug c++/97145] New: Sanitizer pointer-subtract breaks constexpr functions subtracting pointers

2020-09-21 Thread chfast at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97145

Bug ID: 97145
   Summary: Sanitizer pointer-subtract breaks constexpr functions
subtracting pointers
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

Having a constexpr function that subtracts two pointers does not work in
constexpr context when building with -fsanitize=address,pointer-subtract.

GCC version: starts with 8.1 where pointer-subtract was introduced, up to
trunk.

Minimal code:

constexpr char* a = nullptr;
constexpr auto d = a - a;

:3:22: error: '__builtin___sanitizer_ptr_sub(0, 0)' is not a constant
expression

3 | constexpr auto d = a - a;

  |~~^~~

https://godbolt.org/z/qWxT9v

[Bug rtl-optimization/99620] New: Subtract with borrow (SBB) missed optimization

2021-03-16 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99620

Bug ID: 99620
   Summary: Subtract with borrow (SBB) missed optimization
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

Hi.

For the 128-bit precision subtraction: SUB + SBB the optimization depends on
the how the carry bit condition is specified in the code. In the first case
below everything works nicely, but in the second we have unnecessary CMP in the
final code.

I believe the second carry bit condition is simpler (does not require unsigned
integer wrapping behavior) and does not have dependency on the first
subtraction. 


using u64 = unsigned long;

struct u128
{
u64 l;
u64 h;
};

auto sub_good(u128 a, u128 b)
{
auto l = a.l - b.l;
auto k = l > a.l;
auto h = a.h - b.h - k;
return u128{l, h};
}

auto sub_bad(u128 a, u128 b)
{
auto l = a.l - b.l;
auto k = a.l < b.l;
auto h = a.h - b.h - k;
return u128{l, h};
}


sub_good(u128, u128):
mov rax, rdi
sub rax, rdx
sbb rsi, rcx
mov rdx, rsi
ret
sub_bad(u128, u128):
cmp rdi, rdx
mov rax, rdi
sbb rsi, rcx
sub rax, rdx
mov rdx, rsi
ret


If you think this is easy to fix, I would like to give it a try if I could get
some pointers where to start.

[Bug target/99620] Subtract with borrow (SBB) missed optimization

2021-03-17 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99620

--- Comment #4 from Paweł Bylica  ---
Can you give me introduction where and how to fix it? I have a longer list of
similar issues, so maybe it's good time to learn how to fix them myself.

FYI, clang is unifying both cases by changing `k = l > a.l` into `k = a.l <
b.l` and only having SUB_OVERFLOW match for `k = a.l < b.l` case.

[Bug target/100119] New: [x86] Conversion unsigned int -> double produces -0 (-m32 -msse2 -mfpmath=sse)

2021-04-16 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100119

Bug ID: 100119
   Summary: [x86] Conversion unsigned int -> double produces -0
(-m32 -msse2 -mfpmath=sse)
   Product: gcc
   Version: 10.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

When building for 32-bit x86 but with SSE2 floating-point enabled:
-m32 -msse2 -mfpmath=sse

the conversion from unsigned int 0 to double produces the result of -0.0 when
floating-point rounding mode is set to FE_DOWNWARD.

I used -frounding-math and #pragma STDC FENV_ACCESS ON.

This bug is not present on x87 nor x86_64 builds.

The bug seems to be present at least since GCC 5.


#include 

#pragma STDC FENV_ACCESS ON

__attribute__((noinline)) double u32_to_f64(unsigned x) {
  return static_cast(x);
}

int main() {
  fesetround(FE_DOWNWARD);

  double d = u32_to_f64(0);

  return __builtin_signbit(d) != 0;  // signbit should be 0
}


The assembly:

u32_to_f64(unsigned int):
sub esp, 12
pxorxmm0, xmm0
mov eax, DWORD PTR [esp+16]
add eax, -2147483648
cvtsi2sdxmm0, eax
addsd   xmm0, QWORD PTR .LC0
movsd   QWORD PTR [esp], xmm0
fld QWORD PTR [esp]
add esp, 12
ret
main:
lea ecx, [esp+4]
and esp, -16
pushDWORD PTR [ecx-4]
pushebp
mov ebp, esp
pushecx
sub esp, 32
push1024
callfesetround
mov DWORD PTR [esp], 0
callu32_to_f64(unsigned int)
mov ecx, DWORD PTR [ebp-4]
add esp, 16
fstpQWORD PTR [ebp-16]
movsd   xmm0, QWORD PTR [ebp-16]
leave
lea esp, [ecx-4]
movmskpdeax, xmm0
and eax, 1
ret
.LC0:
.long   0
.long   1105199104


https://godbolt.org/z/rrMWY9jsG

[Bug sanitizer/97414] New: AddressSanitizer CHECK failed: detect_stack_use_after_return and detect_invalid_pointer_pairs

2020-10-14 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97414

Bug ID: 97414
   Summary: AddressSanitizer CHECK failed:
detect_stack_use_after_return and
detect_invalid_pointer_pairs
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: sanitizer
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org,
jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at 
gcc dot gnu.org
  Target Milestone: ---

==638106==AddressSanitizer CHECK failed:
../../../../src/libsanitizer/asan/asan_thread.cpp:369 "((bottom)) != (0)" (0x0,
0x0)
#0 0x7f00888e08b8  (/lib/x86_64-linux-gnu/libasan.so.6+0xb98b8)
#1 0x7f00889007ce  (/lib/x86_64-linux-gnu/libasan.so.6+0xd97ce)
#2 0x7f00888e64f0  (/lib/x86_64-linux-gnu/libasan.so.6+0xbf4f0)
#3 0x7f00888dd68b  (/lib/x86_64-linux-gnu/libasan.so.6+0xb668b)
#4 0x7f00888e0269 in __sanitizer_ptr_sub
(/lib/x86_64-linux-gnu/libasan.so.6+0xb9269)
#5 0x55e8cd6641f2 in pointer_diff(int const*, int const*)
/home/chfast/Projects/compiler_bugs/sanitizers/pointer_subtract_crash/pointer_subtract_crash.cpp:2
#6 0x55e8cd664248 in main
/home/chfast/Projects/compiler_bugs/sanitizers/pointer_subtract_crash/pointer_subtract_crash.cpp:10
#7 0x7f008865c0b2 in __libc_start_main
(/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
#8 0x55e8cd66410d in _start
(/home/chfast/Projects/compiler_bugs/sanitizers/pointer_subtract_crash/a.out+0x110d)


When running the program

[[gnu::noinline]] auto pointer_diff(const int *begin, const int *end) {
  return end - begin;
}

int main() {
  constexpr auto size = (2048 / sizeof(int)) + 1;

  auto buf = new int[size];
  auto end = buf + size;
  pointer_diff(end, buf);
  delete[] buf;

  return 0;
}


compiled with
gcc -fsanitize=address,pointer-subtract -g pointer_subtract_crash.cpp

To reproduce the crash, both runtime options must be enabled:
ASAN_OPTIONS=detect_stack_use_after_return=1:detect_invalid_pointer_pairs=1

This bug was previously reported in LLVM's AddressSanitizer project
https://bugs.llvm.org/show_bug.cgi?id=47626, but pointer-subtract is not
supported there.

[Bug libstdc++/97415] New: Invalid pointer comparison in stringbuf::str() (reported by pointer-compare AddressSanitizer)

2020-10-14 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97415

Bug ID: 97415
   Summary: Invalid pointer comparison in stringbuf::str()
(reported by pointer-compare AddressSanitizer)
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

When my application is instrumented with -fsanitize=address,pointer-compare
and running under ASAN_OPTIONS=detect_invalid_pointer_pairs=2,
I get for following failure in basic_stringbuf::str()

==3879==ERROR: AddressSanitizer: invalid-pointer-pair: 0x7ffcdf273b66
0x
#0 0x5597a6c6d786 in std::__cxx11::basic_stringbuf, std::allocator >::str() const
/usr/include/c++/10/sstream:184
#1 0x5597a6c6d786 in std::__cxx11::basic_ostringstream, std::allocator >::str() const
/usr/include/c++/10/sstream:678
#2 0x5597a6c6d786 in std::basic_ostream >&
std::__detail::operator<< ,
std::__cxx11::basic_string, std::allocator >
const&>(std::basic_ostream >&,
std::__detail::_Quoted_string, std::allocator > const&, char> const&)
/usr/include/c++/10/bits/quoted_string.h:130
#3 0x5597a6c6d786 in std::basic_ostream >&
std::filesystem::__cxx11::operator<< 
>(std::basic_ostream >&,
std::filesystem::__cxx11::path const&) /usr/include/c++/10/bits/fs_path.h:441
#4 0x5597a6c6d786 in log_total
/home/builder/project/test/spectests/spectests.cpp:675
#5 0x5597a6c48939 in run_tests_from_dir
/home/builder/project/test/spectests/spectests.cpp:708
#6 0x5597a6c48939 in main
/home/builder/project/test/spectests/spectests.cpp:750

Here is the implementation of basic_stringbuf::str() used for compilation:

  __string_type
  str() const
  {
__string_type __ret(_M_string.get_allocator());
if (this->pptr())
  {
// The current egptr() may not be the actual string end.
if (this->pptr() > this->egptr())
  __ret.assign(this->pbase(), this->pptr());
else
  __ret.assign(this->pbase(), this->egptr());
  }
else
  __ret = _M_string;
return __ret;
  }

In the line `if (this->pptr() > this->egptr())`,
the `this->egptr()` may be nullptr and therefore AddressSanitizer complains
about this comparison.

I don't have handy repro code for the issue, but I can try to build one if
desired.

GCC version: cpp (Debian 10.2.0-15) 10.2.0

[Bug libstdc++/97659] New: Invalid pointer subtraction in vector::insert() (reported by pointer-subtract AddressSanitizer)

2020-10-31 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97659

Bug ID: 97659
   Summary: Invalid pointer subtraction in vector::insert()
(reported by pointer-subtract AddressSanitizer)
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

When vector::insert(iterator pos, InputIt first, InputIt last) is used
the AddressSanitizer additional check "pointer-subtract" reports invalid
pointer pair in c++/10/bits/vector.tcc:729.

The relevant code is this:

  template
template
  void
  vector<_Tp, _Alloc>::
  _M_range_insert(iterator __position, _ForwardIterator __first,
  _ForwardIterator __last, std::forward_iterator_tag)
  {
if (__first != __last)
  {
const size_type __n = std::distance(__first, __last);
if (size_type(this->_M_impl._M_end_of_storage
  - this->_M_impl._M_finish) >= __n)  // FAILS HERE!
  {


My core code causing the problem is this:

void push(std::vector& b, uint32_t value)
{
uint8_t storage[sizeof(value)];
__builtin_memcpy(storage, &value, sizeof(value));
b.insert(b.end(), std::begin(storage), std::end(storage));
}


My program is pushing single bytes and uint32_t value using the above helper to
a vector, without preallocation. But I was not able to reproduce this issues on
a side. I will need more time to reduce my code to a proper regression test.

gcc-10 (Ubuntu 10.2.0-5ubuntu1~20.04) 10.2.0
export ASAN_OPTIONS=detect_invalid_pointer_pairs=1 

=
==3327279==ERROR: AddressSanitizer: invalid-pointer-pair: 0x60206e5c
0x60206e5a
#0 0x556e32bfecbf in void std::vector >::_M_range_insert(__gnu_cxx::__normal_iterator > >, unsigned char*, unsigned char*,
std::forward_iterator_tag) /usr/include/c++/10/bits/vector.tcc:729
#1 0x556e32bfecbf in void std::vector >::_M_insert_dispatch(__gnu_cxx::__normal_iterator > >, unsigned char*, unsigned char*,
std::__false_type) /usr/include/c++/10/bits/stl_vector.h:1665
#2 0x556e32bfecbf in __gnu_cxx::__normal_iterator > >
std::vector >::insert(__gnu_cxx::__normal_iterator > >, unsigned char*,
unsigned char*) /usr/include/c++/10/bits/stl_vector.h:1383
#3 0x556e32bfecbf in push
/home/chfast/Projects/wasmx/fizzy/lib/fizzy/parser_expr.cpp:26
...

0x60206e5c is located 0 bytes to the right of 12-byte region
[0x60206e50,0x60206e5c)
allocated by thread T0 here:
#0 0x7f0bfa861f17 in operator new(unsigned long)
(/lib/x86_64-linux-gnu/libasan.so.6+0xb1f17)
#1 0x556e32bff1e1 in __gnu_cxx::new_allocator::allocate(unsigned long, void const*)
/usr/include/c++/10/ext/new_allocator.h:115
#2 0x556e32bff1e1 in std::allocator_traits
>::allocate(std::allocator&, unsigned long)
/usr/include/c++/10/bits/alloc_traits.h:460
#3 0x556e32bff1e1 in std::_Vector_base >::_M_allocate(unsigned long)
/usr/include/c++/10/bits/stl_vector.h:346
#4 0x556e32bff1e1 in void std::vector >::_M_range_insert(__gnu_cxx::__normal_iterator > >, unsigned char*, unsigned char*,
std::forward_iterator_tag) /usr/include/c++/10/bits/vector.tcc:769
#5 0x556e32bff1e1 in void std::vector >::_M_insert_dispatch(__gnu_cxx::__normal_iterator > >, unsigned char*, unsigned char*,
std::__false_type) /usr/include/c++/10/bits/stl_vector.h:1665
#6 0x556e32bff1e1 in __gnu_cxx::__normal_iterator > >
std::vector >::insert(__gnu_cxx::__normal_iterator > >, unsigned char*,
unsigned char*) /usr/include/c++/10/bits/stl_vector.h:1383
#7 0x556e32bff1e1 in push
/home/chfast/Projects/wasmx/fizzy/lib/fizzy/parser_expr.cpp:26
...

0x60206e5a is located 10 bytes inside of 12-byte region
[0x60206e50,0x60206e5c)
allocated by thread T0 here:
#0 0x7f0bfa861f17 in operator new(unsigned long)
(/lib/x86_64-linux-gnu/libasan.so.6+0xb1f17)
#1 0x556e32bff1e1 in __gnu_cxx::new_allocator::allocate(unsigned long, void const*)
/usr/include/c++/10/ext/new_allocator.h:115
#2 0x556e32bff1e1 in std::allocator_traits
>::allocate(std::allocator&, unsigned long)
/usr/include/c++/10/bits/alloc_traits.h:460
#3 0x556e32bff1e1 in std::_Vector_base >::_M_allocate(unsigned long)
/usr/include/c++/10/bits/stl_vector.h:346
#4 0x556e32bff1e1 in void std::vector >::_M_range_insert(__gnu_cxx::__normal_iterator > >, unsigned char*, unsigned char*,
std::forward_iterator_tag) /usr/include/c++/10/bits/vector.tcc:769
#5 0x556e32bff1e1 in void std::vector >::_M_insert_dispatch(__gnu_cxx::__normal_iterator > >, unsigned char*, unsigned char*,
std::__false_type) /usr/include/c++/10/bits/stl_vect

[Bug libstdc++/97659] Invalid pointer subtraction in vector::insert() (reported by pointer-subtract AddressSanitizer)

2020-10-31 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97659

--- Comment #2 from Paweł Bylica  ---
Created attachment 49482
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49482&action=edit
Minimal test case source code

It turned out the problem is related to vector's internal instrumentation
_GLIBCXX_SANITIZE_VECTOR.

The minimal test case is the following:

#define _GLIBCXX_SANITIZE_VECTOR 1
#include 

int main()
{
std::vector v;
v.reserve(1);

char in[1] = {};
v.insert(v.end(), in, in + 1);

return 0;
}


export ASAN_OPTIONS=detect_invalid_pointer_pairs=1
g++ pointer_subtract_bug.cpp -fsanitize=address,pointer-subtract
./a.out

[Bug libstdc++/97659] Invalid pointer subtraction in vector::insert() (reported by pointer-subtract AddressSanitizer)

2020-11-01 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97659

--- Comment #4 from Paweł Bylica  ---
I'd like to explain some things here (to my best knowledge):

1. The "pointer-subtract" checks is ASan extension, not enabled by default.
When running with this check enabled in my application I have not detected any
issues in std::vector.

2. The "pointer-subtract" checks if you pointer subtraction operands are from
the same memory allocation. Allowed values are all pointers from the memory
region plus the "end" pointer one element outside of the region. Other
subtractions are UB in C to my information.

3. The issue shows up only when "pointer-subtract" is combined with
_GLIBCXX_SANITIZE_VECTOR. Moreover, the report looks like false positive
because the subtraction is between the "end" pointer and a pointer from inside
of a memory region.

[Bug middle-end/51839] GCC not generating adc instruction for canonical multi-precision add sequence

2021-02-17 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51839

Paweł Bylica  changed:

   What|Removed |Added

 CC||chfast at gmail dot com

--- Comment #1 from Paweł Bylica  ---
This is fixed in GCC 8.1 (at least for add+adc pair).
https://godbolt.org/z/9j4f6r

[Bug c++/97145] Sanitizer pointer-subtract breaks constexpr functions subtracting pointers

2021-02-23 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97145

Paweł Bylica  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Paweł Bylica  ---
This looks to be fixed in trunk. Thanks.

[Bug rtl-optimization/96475] direct threaded interpreter with computed gotos generates suboptimal dispatch loop

2022-08-22 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96475

Paweł Bylica  changed:

   What|Removed |Added

 CC||chfast at gmail dot com

--- Comment #25 from Paweł Bylica  ---
Is this issue resolved then?

[Bug tree-optimization/106786] New: Regression in cmp+sbb

2022-08-31 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106786

Bug ID: 106786
   Summary: Regression in cmp+sbb
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

I noticed a regression when using the builtin for sbb instruction
(__builtin_ia32_sbb_u64).

typedef unsigned long long u64;

struct R {
u64 value;
bool carry;
};

inline R subc(u64 x, u64 y, bool carry) noexcept {
u64 d;
const u64 carryout = __builtin_ia32_sbb_u64(carry, x, y, &d);
return {d, carryout != 0};
}

bool bad(u64 x, u64 y) {
const R z = subc(x, y, false);
R a = subc(x, y, z.carry);
return a.carry;
}

https://godbolt.org/z/f41KKe19q

The expected assembly is
cmp rdi, rsi
sbb rdi, rsi

But GCC 12.2.0 and trunk produces
cmp rdi, rsi
setbal
movzx   eax, al
add al, -1
sbb rdi, rsi

The regression is in 12.2.0, the 11.3.0 optimizes properly.

There are simple changes which will bring back the expected optimization:
- change `const R z` to `R z`,
- change `bool carry` to `u64 carry`.

This may be related to calling convention / ABI because I noticed in one of the
tree optimization outputs for 12.2.0 that the `bool carry` is forced to be in
memory: `MEM  [(struct R *)&z + 8B]`.

https://godbolt.org/z/7zh7GxraK

[Bug c++/107434] New: Wrong -Wmissing-field-initializers for C++ designated initializers

2022-10-27 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107434

Bug ID: 107434
   Summary: Wrong -Wmissing-field-initializers for C++ designated
initializers
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

If a struct S has a field c of type C having user constructor the
"missing-field-initializers" is reported for this field even though designated
initializers are used.

struct C
{
int x = 0;
};

struct S
{
C c;
bool flag = false;

};

S test()
{
return {.flag = true};
}

: In function 'S test()':
:15:25: warning: missing initializer for member 'S::c'
[-Wmissing-field-initializers]
   15 | return {.flag = true};
  | ^

https://godbolt.org/z/sxc8PP7Pq

[Bug c++/96868] C++20 designated initializer erroneous warnings

2022-10-29 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96868

--- Comment #6 from Paweł Bylica  ---
The workaround is 

MyObj obj = {};

which at least suggests some inconsistency in the compiler internals.

For me this warning should be disabled in C++ when designated initializers are
used and all other fields are value initialized.

[Bug tree-optimization/107837] New: Missed optimization: Using memcpy to load a struct unnecessary uses stack space

2022-11-23 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107837

Bug ID: 107837
   Summary: Missed optimization: Using memcpy to load a struct
unnecessary uses stack space
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

I have a simple struct with array uint64_t[4]. When using memcpy() load it from
a storage of bytes and then performing some additional operations, a temporary
object on the stack is created.


struct uint256
{
unsigned long v[4];
};

void load_bad(uint256* o, const char* src) noexcept
{
uint256 x;
__builtin_memcpy(&x, src, sizeof(x));
uint256 y;
y.v[0] = __builtin_bswap64(x.v[3]);
y.v[1] = __builtin_bswap64(x.v[2]);
y.v[2] = __builtin_bswap64(x.v[1]);
y.v[3] = __builtin_bswap64(x.v[0]);
*o = y;
}


load_bad(uint256*, char const*):
movdqu  xmm0, XMMWORD PTR [rsi]
movdqu  xmm1, XMMWORD PTR [rsi+16]
movaps  XMMWORD PTR [rsp-40], xmm0
mov rdx, QWORD PTR [rsp-32]
mov rax, QWORD PTR [rsp-40]
movaps  XMMWORD PTR [rsp-24], xmm1
mov rsi, QWORD PTR [rsp-16]
mov rcx, QWORD PTR [rsp-24]
bswap   rdx
bswap   rax
mov QWORD PTR [rdi+16], rdx
bswap   rsi
bswap   rcx
mov QWORD PTR [rdi], rsi
mov QWORD PTR [rdi+8], rcx
mov QWORD PTR [rdi+24], rax
ret


The workaround is to use reinterpret_cast.

https://godbolt.org/z/WevYch8nv

[Bug tree-optimization/106786] [12/13 Regression] SRA regression causes extra instructions sometimes

2022-11-29 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106786

--- Comment #4 from Paweł Bylica  ---
Any update on this? I've identified some other similar cases where this hurting
the performance.

[Bug c++/105481] New: ICE: unexpected expression of kind template_parm_index

2022-05-04 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105481

Bug ID: 105481
   Summary: ICE: unexpected expression of kind template_parm_index
   Product: gcc
   Version: 11.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

I get 

intx_reduced.cpp: In substitution of ‘template
uint f(const T&) [with unsigned int N = N; T = uint;
 = ]’:
intx_reduced.cpp:18:31:   required from here
intx_reduced.cpp:13:5: internal compiler error: unexpected expression ‘N’ of
kind template_parm_index
   13 | typename = typename std::enable_if>::value>::type>
  | ^~~~


for code:

#include 

template 
struct uint
{
   int words_[N];
};

template 
uint f(const uint& y) noexcept;

template >::value>::type>
uint f(const T& y) noexcept;

using X = uint<1>;

X (*fp)(X const&) noexcept = &f;


The reduced version (cvise):

template  struct integral_constant {
  static constexpr _Tp value = __v;
};
using true_type = integral_constant;
using false_type = integral_constant;
template  using __bool_constant = integral_constant;
template  struct conditional;
template  struct __or_;
template 
struct __or_<_B1, _B2> : conditional<_B1::value, _B1, _B2>::type {};
template  struct is_const;
template  struct is_array : false_type {};
template 
struct is_function : __bool_constant::value> {};
template  struct is_const : true_type {};
template , is_array<_To>>::value>
struct __is_convertible_helper {
  template  static true_type __test(int);
  typedef decltype(__test<_To>(0)) type;
};
template 
struct is_convertible : __is_convertible_helper<_From, _To>::type {};
template  struct enable_if { typedef _Tp type; };
template 
struct conditional {
  typedef _Iffalse type;
};
template  struct uint;
template  uint f(const uint &);
template <
unsigned N, typename T,
typename = typename enable_if>::value>::type>
uint f(T);
using X = uint<1>;
X (*fp)(X const &) = f;

[Bug rtl-optimization/114452] New: Functions invoked through compile-time table of function pointers not inlined

2024-03-25 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114452

Bug ID: 114452
   Summary: Functions invoked through compile-time table of
function pointers not inlined
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

In the following example there is a compile-time table of pointers to simple
functions. When the table is used in a simple unrolled loop with constant trip
count the functions invoked by pointers are not inlined.

using F = int (*)(int) noexcept;

void test(int z[2]) noexcept {
static constexpr F fs[]{
[](int x) noexcept { return x; },
[](int x) noexcept { return x; },
};

for (int i = 0; i < 2; ++i) {
z[i] = fs[i](z[i]);
}
}

Generated assembly:

test(int*)::{lambda(int)#1}::_FUN(int):
mov eax, edi
ret
test(int*)::{lambda(int)#2}::_FUN(int):
mov eax, edi
ret
test(int*):
mov rdx, rdi
mov edi, DWORD PTR [rdi]
calltest(int*)::{lambda(int)#1}::_FUN(int)
mov edi, DWORD PTR [rdx+4]
mov DWORD PTR [rdx], eax
calltest(int*)::{lambda(int)#2}::_FUN(int)
mov DWORD PTR [rdx+4], eax
ret


https://godbolt.org/z/fGqPKh81j

[Bug rtl-optimization/114452] Functions invoked through compile-time table of function pointers not inlined

2024-03-25 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114452

--- Comment #2 from Paweł Bylica  ---
I don't think this is related to lambdas. The following is also not optimized:


using F = int (*)(int) noexcept;

inline int impl(int x) noexcept { return x; }

void test(int z[2]) noexcept {
static constexpr F fs[]{
impl,
impl,
};

for (int i = 0; i < 2; ++i) {
z[i] = fs[i](z[i]);
}
}

https://godbolt.org/z/9hPbzo4Px

[Bug tree-optimization/109667] New: [12/13/14 Regression] Unnecessary temporary storage used for 32-byte struct

2023-04-28 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109667

Bug ID: 109667
   Summary: [12/13/14 Regression] Unnecessary temporary storage
used for 32-byte struct
   Product: gcc
   Version: 12.3.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

Reduced reproducer:

struct i256 {
long v[4];
};
void assign(struct i256 *v, long z) {
struct i256 r = {};
for (int i = 0; i < 1; ++i) 
r.v[i] = z;
*v = r;
}

https://godbolt.org/z/avM74o3r6

The compiler allocates temporary storage on stack for `r`:

assign:
pxorxmm0, xmm0
mov QWORD PTR [rsp-40], rsi
movups  XMMWORD PTR [rsp-32], xmm0
movdqa  xmm1, XMMWORD PTR [rsp-40]
mov QWORD PTR [rsp-16], 0
movdqa  xmm2, XMMWORD PTR [rsp-24]
movups  XMMWORD PTR [rdi], xmm1
movups  XMMWORD PTR [rdi+16], xmm2
ret

Regression since 12. The 11 compiles nicely to:

assign:
mov QWORD PTR [rdi], rsi
mov QWORD PTR [rdi+8], 0
mov QWORD PTR [rdi+16], 0
mov QWORD PTR [rdi+24], 0
ret

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2023-05-07 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #32 from Paweł Bylica  ---
For what it's worth, the original code is compiled the same as in Clang since
GCC 10. https://godbolt.org/z/vxorYW815

[Bug rtl-optimization/109771] New: Unnecessary pblendw for vectorized or

2023-05-08 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109771

Bug ID: 109771
   Summary: Unnecessary pblendw for vectorized or
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

I have an example of vectorization of 4x64-bit struct (representation of
256-bit integer). The implementation just uses for loop of count 4.

This is vectorized in isolation however when combined with some non-trivial
control-flow and additional wrapping functions the final assembly contains
weird pblendw instructions.

pblendw xmm1, xmm3, 240  (GCC 13, x86-64-v2)
movlpd  xmm1, QWORD PTR [rdi+16] (GCC 13, x86-64-v1)
shufpd  xmm1, xmm3, 2(GCC 12)

I believe this is some kind of regression in GCC 13 because I have a bigger
context where GCC 12 was optimizing it "correctly". However, I lost this
information during test reduction.

https://godbolt.org/z/jzK44h3js

cpp:

struct u256 {
unsigned long w[4];
};

inline u256 or_(u256 x, u256 y) {
u256 z;
for (int i = 0; i < 4; ++i) 
z.w[i] = x.w[i] | y.w[i];
return z;
}

inline void or_to(u256& z, u256 y) { z = or_(z, y); }

void op_or(u256* t) { or_to(t[1], t[0]); }

void test(u256* t) {
void* tbl[]{&&CLOBBER, &&OR};
CLOBBER:
goto * 0;
OR:
op_or(t);
goto * 0;
}


x86-64-v2 asm:

test(u256*):
xorl%eax, %eax
jmp *%rax
movdqu  32(%rdi), %xmm3
movdqu  (%rdi), %xmm1
movdqu  16(%rdi), %xmm2
movdqu  48(%rdi), %xmm0
por %xmm3, %xmm1
movups  %xmm1, 32(%rdi)
movdqa  %xmm2, %xmm1
pblendw $240, %xmm0, %xmm1
pblendw $240, %xmm2, %xmm0
por %xmm1, %xmm0
movups  %xmm0, 48(%rdi)
jmp *%rax

[Bug middle-end/104151] [10/11/12/13/14 Regression] x86: excessive code generated for 128-bit byteswap

2023-05-11 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104151

Paweł Bylica  changed:

   What|Removed |Added

 CC||chfast at gmail dot com

--- Comment #18 from Paweł Bylica  ---
Not sure if this helps in any way, but this is a 256-bit variant:
https://godbolt.org/z/84fMTs1YP.

[Bug target/105354] __builtin_shuffle for alignr generates suboptimal code unless SSSE3 is enabled

2023-05-11 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105354

Paweł Bylica  changed:

   What|Removed |Added

 CC||chfast at gmail dot com

--- Comment #6 from Paweł Bylica  ---
Confirmed fixed. https://godbolt.org/z/rEqcMqKaz

[Bug middle-end/109844] New: Unnecessary basic block with single jmp instruction

2023-05-13 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109844

Bug ID: 109844
   Summary: Unnecessary basic block with single jmp instruction
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

The code

void err(void);

void merge_bb(int y) {
if (y) 
return err();
}

is

merge_bb:
testedi, edi
jne .L4
ret
.L4:
jmp err


but could be

merge_bb:
testedi, edi
jne err
ret

https://godbolt.org/z/eafPa4o4T

[Bug rtl-optimization/49054] useless cmp+jmp generated for switch when "default:" is unreachable

2023-05-13 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49054

Paweł Bylica  changed:

   What|Removed |Added

 CC||chfast at gmail dot com

--- Comment #7 from Paweł Bylica  ---
GCC 13 generates optimal decision tree for the mentioned modified case.

if id == 3:
i()
elif id <= 3:
if id == 0:
f()
else:  # 1
g()
else:
if id == 4:
j()
else:  # 23456
h()

https://godbolt.org/z/9j6b88qKE

So I think this issue is fixed.

[Bug rtl-optimization/109845] New: Addition overflow/carry flag unnecessarily put in a temporary register

2023-05-13 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109845

Bug ID: 109845
   Summary: Addition overflow/carry flag unnecessarily put in a
temporary register
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

When we have an addition and an overflow check and the overflow flag is
combined with some other condition the codegen may generate variant when the
overflow flag is temporary register.

unsigned s = y + z;
_Bool ov = s < y;

if (x || ov) 
return;

This produces

add esi, edx
setcal
testedi, edi
jne .L1
testeax, eax
jne .L1

while it could be

add esi, edx
jc  .L6
testedi, edi
jne .L6


There are easy workaround to the C code which make the assembly optimal:

1. Change the order of checks 
if (ov || x)

2. Split if into two
if (x)
return;
if (ov) 
return;

https://godbolt.org/z/rxsrnhPdc

[Bug rtl-optimization/114452] Functions invoked through compile-time table of function pointers not inlined

2024-04-11 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114452

--- Comment #5 from Paweł Bylica  ---
(In reply to Martin Jambor from comment #4)
> In this testcase all (well, both) functions referenced from the array
> are semantically equivalent which is recognized by ICF but making it
> be able to pass this information to the inliner would be
> non-trivial... and is this the common case worth optimizing for?

I reduced the original code to the array of two identical functions.
Originally, there weren't identical. I can update the test case if this make
more sense.

[Bug rtl-optimization/114452] Functions invoked through compile-time table of function pointers not inlined

2024-04-11 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114452

--- Comment #7 from Paweł Bylica  ---
(In reply to Martin Jambor from comment #6)
> (In reply to Paweł Bylica from comment #5)
> > (In reply to Martin Jambor from comment #4)
> > > In this testcase all (well, both) functions referenced from the array
> > > are semantically equivalent which is recognized by ICF but making it
> > > be able to pass this information to the inliner would be
> > > non-trivial... and is this the common case worth optimizing for?
> > 
> > I reduced the original code to the array of two identical functions.
> > Originally, there weren't identical. I can update the test case if this make
> > more sense.
> 
> Probably not.  But how many elements does the array have in the original
> code?  Perhaps we could speculatively inline them if there are only few.

5. These are boolean functions from RIPEMD160.

[Bug tree-optimization/110020] New: [13/14 Regression] SHA2 misscompilation at -O3

2023-05-29 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110020

Bug ID: 110020
   Summary: [13/14 Regression] SHA2 misscompilation at -O3
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

This is a test case reduced from a C implementation of SHA256.

void test(unsigned h[8]) {
for (unsigned i = 0; i < 2; i++) {

unsigned w[16];
for (unsigned j = 0; j < 16; j++) {
if (i == 0)
w[j] = 0;

h[7] = h[6];
h[6] = h[5];
h[5] = h[4];
h[4] = h[3];
h[3] = h[2];
h[2] = h[1];
h[1] = h[0];
h[0] += w[j];
}
}
}

It looks that at -O3 compiler looses track of w[j] = 0 and uses uninitialized
stack storage.

test:
movl-36(%rsp), %ecx
movl-68(%rsp), %eax
movq%rdi, %rdx
movl-32(%rsp), %esi
addl-72(%rsp), %eax
addl-64(%rsp), %eax
addl-60(%rsp), %eax
addl-56(%rsp), %eax
addl-52(%rsp), %eax
addl-48(%rsp), %eax
addl-44(%rsp), %eax
addl-40(%rsp), %eax
addl(%rdi), %eax
addl%eax, %ecx
movl-28(%rsp), %edi
movl-24(%rsp), %r8d
movl%eax, 28(%rdx)
addl%ecx, %esi
movl-20(%rsp), %r9d
movl-16(%rsp), %r10d
movl%ecx, 24(%rdx)
addl%esi, %edi
movl-12(%rsp), %r11d
movl%esi, 20(%rdx)
addl%edi, %r8d
movl%edi, 16(%rdx)
addl%r8d, %r9d
movl%r8d, 12(%rdx)
addl%r9d, %r10d
movl%r9d, 8(%rdx)
addl%r10d, %r11d
movl%r10d, 4(%rdx)
movl%r11d, (%rdx)
ret 


https://godbolt.org/z/ff7E9sd94

[Bug tree-optimization/110020] [13/14 Regression] SHA2 misscompilation at -O3

2023-05-29 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110020

--- Comment #2 from Paweł Bylica  ---
Yes, you are right. Sorry for taking your time.

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2023-06-05 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #15 from Paweł Bylica  ---
For what it's worth, clang's __builtin_addc is implemented in frontend only as
a pair of __builtin_add_overflow. The commit from 11 year ago does not explain
why they were added.
https://github.com/llvm/llvm-project/commit/54398015bf8cbdc3af54dda74807d6f3c8436164

Producing a chain of ADC instructions out of __builtin_add_overflow patterns
has been done quite recently (~1 year ago). And this work is not fully finished
yet.

On the other hand, Go recently added "addc" like "builtins" in
https://pkg.go.dev/math/bits. And they are really pleasure to use in
multi-precision arithmetic.

[Bug target/113764] New: [X86] Generates lzcnt when bsr is sufficient

2024-02-05 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113764

Bug ID: 113764
   Summary: [X86] Generates lzcnt when bsr is sufficient
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

When lzcnt instructions is enabled (-mlzcnt) the compiler generates lzcnt for
__builtin_clz() in the context where the bsr instruction is sufficient and
better.

unsigned bsr(unsigned x)
{
return __builtin_clz(x) ^ 31;
}

bsr:
  xor eax, eax
  lzcnt eax, edi
  xor eax, 31
  ret


Without -mlzcnt the generated code is optimal.

bsr:
  bsr eax, edi
  ret


https://godbolt.org/z/5qcTq18nr

[Bug rtl-optimization/117000] New: Inefficient code for 32-byte struct comparison (ptest missing)

2024-10-07 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117000

Bug ID: 117000
   Summary: Inefficient code for 32-byte struct comparison (ptest
missing)
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

I was investigating why in GCC 13.3 the functions test1 and test2 produce
different x86 assembly. They only differ by the placement of the int -> U256
user defined conversion.

This lead to the discovery that the generated x86-64-v2 for all the examples is
not very efficient. E.g. for some reason a shift instruction is used (psrldq).

In GCC 14+ the compilation converges to test1 also in test2.

https://godbolt.org/z/r1vfcPone


using uint64_t = unsigned long;

struct U256
{
uint64_t words_[4]{};

U256(uint64_t v)
  : words_{v}
{}
};

bool eq(const U256& x, const U256& y)
{
uint64_t folded = 0;
for (int i = 0; i < 4; ++i)
folded |= (x.words_[i] ^ y.words_[i]);
return folded == 0;
}

bool eqi(const U256& x, uint64_t y)
{
return eq(x, U256(y));
}

auto test1(const U256& x)
{
return eqi(x, uint64_t(0));
}

bool test2(const U256& x)
{
return eq(x, U256(0));
}


test1(U256 const&):
movdqu  xmm1, XMMWORD PTR [rdi+16]
movdqu  xmm0, XMMWORD PTR [rdi]
por xmm0, xmm1
movdqa  xmm1, xmm0
psrldq  xmm1, 8
por xmm0, xmm1
movqrax, xmm0
testrax, rax
seteal
ret
test2(U256 const&):
mov rax, QWORD PTR [rdi]
or  rax, QWORD PTR [rdi+8]
or  rax, QWORD PTR [rdi+16]
or  rax, QWORD PTR [rdi+24]
seteal
ret

[Bug tree-optimization/117000] Inefficient code for 32-byte struct comparison (ptest missing)

2024-10-10 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117000

--- Comment #6 from Paweł Bylica  ---
Thanks for fixing this. Is there a way to get similar effect in GCC 14?