[Bug target/53929] Bug in the use of Intel asm syntax when a global is named "and"

2020-09-05 Thread u1049321969 at caramail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53929

tk  changed:

   What|Removed |Added

 CC||u1049321969 at caramail dot com

--- Comment #1 from tk  ---
Hello all,

I would like to report that I hit upon a related issue in GCC 10.0.1.  Besides
complaining on "and", the assembly pass also complains if I use a symbol which
happens to be the same as register name, e.g. "bx".

$ gcc-10 --version
gcc-10 (Ubuntu 10-20200411-0ubuntu1) 10.0.1 20200411 (experimental) [master
revision bb87d5cc77d:75961caccb7:f883c46b4877f637e0fa5025b4d6b5c9040ec566]
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ cat test.c
int bx[16];

int f(unsigned x)
{
return bx[x];
}
$ gcc-10 -c test.c -O3 -masm=intel
/tmp/ccGtGi2X.s: Assembler messages:
/tmp/ccGtGi2X.s:12: Error: invalid use of register

The offending line in the assembly code says
lea rax, bx[rip]

The problem does _not_ go away even if I quote the symbol name by hand in the
assembly output, e.g.
lea rax, "bx"[rip]

Thank you!

[Bug libstdc++/96942] New: std::pmr::monotonic_buffer_resource causes CPU cache misses

2020-09-05 Thread dmitriy.ovdienko at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942

Bug ID: 96942
   Summary: std::pmr::monotonic_buffer_resource causes CPU cache
misses
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dmitriy.ovdienko at gmail dot com
  Target Milestone: ---

Created attachment 49183
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49183&action=edit
Original implementation

There is a webpage that compares performance of different programming
languages: C++, C, Rust, Java, etc.

https://benchmarksgame-team.pages.debian.net/benchmarksgame/

There is a "binary trees" test there. In this test application creates `perfect
binary tree` and traverses it.

https://benchmarksgame-team.pages.debian.net/benchmarksgame/description/binarytrees.html#binarytrees

The fastest solution for this test is created in Rust. 

https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/binarytrees.html

C++ implementation of this problem uses `std::pmr::monotonic_buffer_resource`
class as a memory storage.

https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/binarytrees-gpp-4.html

I like C++ very much and I've started an investigation why application compiled
in Rust is faster than C++.

At first, I've run a `perf` tool and had found that application compiled in C++
generates a lot of CPU cache misses (54%):

```txt
root@E5530:/home/dmytro_ovdiienko/Sources# perf stat -B -e
cache-references,cache-misses,cycles,instructions,branches,faults,migrations 
./bt_orig 21
 Performance counter stats for './bt_orig 21':

45,104,136  cache-references
24,448,475  cache-misses  #   54.205 % of all cache
refs
19,904,251,283  cycles  
30,462,013,065  instructions  #1.53  insn per cycle 
 4,834,392,341  branches
   234,796  faults  
 2  migrations  

   2.083603709 seconds time elapsed

   5.559471000 seconds user
   0.309529000 seconds sys
```

I thought that it is caused by tree traversing. But after I've modified the
code, I found that a lot of cache misses are caused by
`std::pmr::monotonic_buffer_resource` class, which is used as a memory pool.

I've modified that sample to pre-allocate memory required to hold entire binary
tree instead of grow in geometric progression, but it had made things even
worse.

```txt
root@E5530:/home/dmytro_ovdiienko/Sources# perf stat -B -e
cache-references,cache-misses,cycles,instructions,branches,faults,migrations 
./bt_orig_prealloc 21
 Performance counter stats for './bt_orig_prealloc 21':

66,400,545  cache-references
45,740,962  cache-misses  #   68.886 % of all cache
refs
21,461,610,267  cycles  
31,296,637,782  instructions  #1.46  insn per cycle 
 4,967,611,660  branches
   575,100  faults  
 9  migrations  

   2.219161594 seconds time elapsed

   5.464583000 seconds user
   0.854839000 seconds sys
```

That looks really weird and I've implemented my own allocator that behaves like
`std::pmr::monotonic_buffer_resource` and with my memory storage CPU cache
misses are dropped to 34%.

```txt
root@E5530:/home/dmytro_ovdiienko/Sources# perf stat -B -e
cache-references,cache-misses,cycles,instructions,branches,faults,migrations 
./bt_malloc 21
 Performance counter stats for './bt_malloc 21':

40,713,525  cache-references
14,147,648  cache-misses  #   34.749 % of all cache
refs
14,823,743,812  cycles  
22,306,442,507  instructions  #1.50  insn per cycle 
 4,331,968,591  branches
60,227  faults  
 6  migrations  

   1.474751692 seconds time elapsed

   4.282074000 seconds user
   0.092476000 seconds sys
```

Execution time is also dropped from 2.12s to 1.52s (on my laptop).

For completness, following is the report for application compiled in Rust:

```txt
 Performance counter stats for './rust/target/release/rust 2

[Bug libstdc++/96942] std::pmr::monotonic_buffer_resource causes CPU cache misses

2020-09-05 Thread dmitriy.ovdienko at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942

--- Comment #2 from Dmitriy Ovdienko  ---
Created attachment 49185
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49185&action=edit
Modified solution with custom allocator based on malloc

[Bug libstdc++/96942] std::pmr::monotonic_buffer_resource causes CPU cache misses

2020-09-05 Thread dmitriy.ovdienko at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942

--- Comment #1 from Dmitriy Ovdienko  ---
Created attachment 49184
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49184&action=edit
Original implementation with preallocated buffer

[Bug c++/60304] Including disables -Wconversion-null in C++ 98 mode

2020-09-05 Thread harald at gigawatt dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60304

--- Comment #31 from Harald van Dijk  ---
(In reply to Jonathan Wakely from comment #30)
> I'm curious why the preprocessed code in comment 28 doesn't warn,

This was still bugging me, so I looked into it a little bit, and since I had
trouble finding this written down somewhere I thought it would be worth
including here. The line "# 2 "b.C" 3 4" means that what follows is line 2 of
b.C, and b.C is a C system header. The relevant bits of GCC code to see this
are

 
https://github.com/gcc-mirror/gcc/blob/releases/gcc-10.2.0/libcpp/directives.c#L1061
 
https://github.com/gcc-mirror/gcc/blob/releases/gcc-10.2.0/libcpp/internal.h#L358

So this means that "false" is coming from a system header. It is: it is coming
from the macro expansion of "false", and the macro definition was in a system
header. So far, so good.

However, during normal operation, with the integrated preprocessor, when a
warning would be emitted in a system header, that
get_location_for_expr_unwinding_for_system_header function added by the commit
you were asking about,
https://github.com/gcc-mirror/gcc/blob/releases/gcc-10.2.0/gcc/cp/call.c#L7146,
would change the warning location to that of the macro expansion point, if the
warning location was actually inside a macro definition from a system header.
Such macro unwinding is not possible when the preprocessor is invoked
separately, as this information is missing in the -E output. A
non-system-header effect of this can be seen in this test:

test.h:

  #define FALSE false

test.cc:

  #include "test.h"
  void *p = FALSE;

g++ -std=c++03 -c test.cc:

In file included from test.cc:1:
test.h:1:15: warning: converting ‘false’ to pointer type ‘void*’
[-Wconversion-null]
1 | #define FALSE false
  |   ^
test.cc:2:11: note: in expansion of macro ‘FALSE’
2 | void *p = FALSE;
  |   ^

g++ -std=c++03 -c test.cc -save-temps
test.cc:2:11: warning: converting ‘false’ to pointer type ‘void*’
[-Wconversion-null]
2 | void *p = FALSE;
  |   ^

The addition of -save-temps causes the "note: in expansion of macro ‘FALSE’" to
go missing, because the information needed to produce that note is gone by the
time the warning is emitted: the macro expansion tracking is only available at
preprocessing time. It was that macro expansion tracking functionality that GCC
needs to determine that really, the warning should be treated as *not* coming
from a system header, even though it really was.

In short: I think there is no lingering bug here, this is just an unfortunate
result of the current design. However, if you disagree, if you think the macro
expansion tracking state should be included somehow in the preprocessor output
so that the compiler always has access to it, I can report that as a new bug if
you like.

[Bug c++/60304] Including disables -Wconversion-null in C++ 98 mode

2020-09-05 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60304

--- Comment #32 from Jonathan Wakely  ---
Nice analysis. Personally I dislike when you get different results from
separate preprocessing, but I don't know if it should be considered a bug.

[Bug c++/96943] New: incomplete type used in nested name specifier

2020-09-05 Thread tangyixuan at mail dot dlut.edu.cn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96943

Bug ID: 96943
   Summary: incomplete type used in nested name specifier
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tangyixuan at mail dot dlut.edu.cn
  Target Milestone: ---

The following code (maybe valid) is rejected by g++, while is accepted by
clang.

$ cat s.cpp

template < int I > struct CA1{
enum { EA = 0};
};
template < int I > struct CA2{
enum {
EA = 1, EA1 = CA2  :: EA
};
};

$ clang++ -c s.cpp
successful.

$ g++ -c s.cpp

s.cpp:6:43: error: incomplete type ‘CA2<1>’ used in nested name specifier
6 | EA = 1, EA1 = CA2  :: EA
  |

Is this right?

[Bug c++/96944] New: call of overloaded is ambiguous

2020-09-05 Thread tangyixuan at mail dot dlut.edu.cn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96944

Bug ID: 96944
   Summary: call of overloaded is ambiguous
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tangyixuan at mail dot dlut.edu.cn
  Target Milestone: ---

The following code is rejected by g++. clang++ compiles it successfully.

$ cat s.cpp

template < int I > int F( char [I]);
template < int I > int F( char a = I );
int b = F<0>(0);

$ g++ -c s.cpp
s.cpp:3:15: error: call of overloaded ‘F<0>(int)’ is ambiguous
3 | int b = F<0>(0);
  |   ^
s.cpp:1:24: note: candidate: ‘int F(char*) [with int I = 0]’
1 | template < int I > int F( char [I]);
  |^
s.cpp:2:24: note: candidate: ‘int F(char) [with int I = 0]’
2 | template < int I > int F( char a = I );
  |

[Bug c++/96945] New: optimizations regression when defaulting copy constructor

2020-09-05 Thread federico.kircheis at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96945

Bug ID: 96945
   Summary: optimizations regression when defaulting copy
constructor
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: federico.kircheis at gmail dot com
  Target Milestone: ---

While toying with a piece of code, I've noticed that the code did not get
optimized as expected.

All snippets where compiled with -O3.

A)

#include 
struct c {
};
void foo(){
std::vector vi = {c(),c(),c()};
}


gets compiled to: https://godbolt.org/z/s7YaEf


foo():
sub rsp, 24
mov edi, 3
calloperator new(unsigned long)
mov esi, 3
mov rdi, rax
movzx   eax, WORD PTR [rsp+13]
mov WORD PTR [rdi], ax
movzx   eax, BYTE PTR [rsp+15]
mov BYTE PTR [rdi+2], al
add rsp, 24
jmp operator delete(void*, unsigned long)


Adding and defaulting the constructors produces even more optimized code (the
whole vector is optimized out(!): https://godbolt.org/z/E4GT9x

B)

#include 
struct c {
c() = default;
c(const c&) =default;
c(c&&) = default;
};
void foo(){
std::vector vi = {c(),c(),c()};
}



foo():
ret



Adding and defaulting the constructors, except the move constructor produces
the same code as A): https://godbolt.org/z/ch71fb

B)

#include 
struct c {
c() = default;
c(const c&) =default;
c(c&&) = default;
};
void foo(){
std::vector vi = {c(),c(),c()};
}



If the copy or default constructor is implemented and not defaulted, then the
code is optimized as B): https://godbolt.org/z/v8E37b,
https://godbolt.org/z/v3EY69, #include 
struct c {
c() {};
};
void foo(){
std::vector vi = {c(),c(),c()};
}

C)

#include 
struct c {
c() = default;
c(const c&) {};
};
void foo(){
std::vector vi = {c(),c(),c()};
}


D)

#include 
struct c {
c() = default;
c(const c&) {};
c(c&&) = default;
};
void foo(){
std::vector vi = {c(),c(),c()};
}


E)


#include 
struct c {
c() {}
};
void foo(){
std::vector vi = {c(),c(),c()};
}




While ideally the code for those cases is equivalent (as c has no state and all
snippets are functionally equivalent), I would have expected the class with
compiler-defined operators have the best codegen, followed by the class with
defaulted operators, and last the class with a non-defaulted implementation.

Strangely all constructor calls of `c` are always optimized away, but depending
how the class is defined g++ does or does not optimize the whole vector away.

[Bug c++/96945] optimizations regression when defaulting copy constructor

2020-09-05 Thread federico.kircheis at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96945

--- Comment #1 from Federico Kircheis  ---
I've made a copy-paste error (I cant change the submitted bug), after B) it
should come C):




Adding and defaulting the constructors, except the move constructor produces
the same code as A): https://godbolt.org/z/ch71fb

C)

#include 
struct c {
c() = default;
c(const c&) =default;
};
void foo(){
std::vector vi = {c(),c(),c()};
}


[Bug libstdc++/96946] New: std::shared_ptr makes an "unrelated cast" that causes Clang's Control Flow Integrity sanitiser to crash

2020-09-05 Thread cjdb.ns at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96946

Bug ID: 96946
   Summary: std::shared_ptr makes an "unrelated cast" that causes
Clang's Control Flow Integrity sanitiser to crash
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: cjdb.ns at gmail dot com
  Target Milestone: ---

Created attachment 49186
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49186&action=edit
cfi-error temps

# Compiler details

Ubuntu clang version
11.0.0-++20200829062559+2c6a593b5e1-1~exp1~20200829163219.75
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

# System details

Distributor ID: Ubuntu
Description:Ubuntu 20.04.1 LTS
Release:20.04
Codename:   focal

# Compiler configuration

Unknown: compiler obtained from apt.llvm.org.

# Build trigger

clang++ -std=c++14 -flto -fvisibility=hidden -g -fsanitize=cfi-unrelated-cast
cfi-error.cpp

# Compiler output

Nothing, builds fine.

# Run-time output

$ ./a.out
Illegal instruction

# Thanks

Martin Hořeňovský distilled this from a Catch2 bug to a minimal repro that
exposes it's embedded in libstdc++'s shared_ptr.

[Bug target/96941] Initial PPC64LE transcendental auto-vectorization functionality

2020-09-05 Thread dje at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96941

David Edelsohn  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2020-09-05

--- Comment #1 from David Edelsohn  ---
confirmed

[Bug c++/96242] ICE conditionally noexcept defaulted comparison

2020-09-05 Thread johelegp at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96242

--- Comment #2 from Johel Ernesto Guerrero Peña  ---
Thank you, but am I not exempt?

> The only excuses to not send us the preprocessed sources are [...] if you've 
> reduced the testcase to a small file that doesn't include any other file [...]

[Bug libstdc++/86419] codecvt::in() and out() incorrectly return ok in some cases.

2020-09-05 Thread dmjpp at hotmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86419

--- Comment #10 from Dimitrij Mijoski  ---
I was wrong in comment #9. The bug and the proposed fix are ok in comment #7.

While writing some tests for error I discovered yet another bug in UTF-8
decoding. See the example:

// 2 code points, both are 4 byte in UTF-8.
const char u8in[] = u8"\U0010\U0010";
const char32_t u32in[] = U"\U0010\U0010";

void
utf8_to_utf32_in_error_7 (const codecvt &cvt)
{
  char in[7] = {};
  char32_t out[3] = {};
  char_traits::copy (in, u8in, 7);
  in[5] = 'z';
  // Last CP has two errors. Its second code unit is malformed and it
  // misses its last code unit. Because it misses  its last CU, the
  // decoder return too early that it is incomplete.
  // It should return invalid.

  auto state = mbstate_t{};
  auto in_next = (const char *) nullptr;
  auto out_next = (char32_t *) nullptr;
  auto res = codecvt_base::result ();

  res = cvt.in (state, in, in + 7, in_next, out, out + 3, out_next);
  VERIFY (res == cvt.error); //incorrectly returns partial
  VERIFY (in_next == in + 4);
  VERIFY (out_next == out + 1);
  VERIFY (out[0] == u32in[0] && out[1] == 0 && out[2] == 0);
}

I published the full testsuite on Github, licensed under GPL v3+ of course.
https://github.com/dimztimz/codecvt_test/blob/master/codecvt.cpp . I was
thinking of sending a patch, but after this last bug, 4th, I see this needs
more time. Maybe a testsuite from another library like ICU can be incorporated?
Well, whatever, I will pause my work on this.