[Bug target/53929] Bug in the use of Intel asm syntax when a global is named "and"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53929 tk changed: What|Removed |Added CC||u1049321969 at caramail dot com --- Comment #1 from tk --- Hello all, I would like to report that I hit upon a related issue in GCC 10.0.1. Besides complaining on "and", the assembly pass also complains if I use a symbol which happens to be the same as register name, e.g. "bx". $ gcc-10 --version gcc-10 (Ubuntu 10-20200411-0ubuntu1) 10.0.1 20200411 (experimental) [master revision bb87d5cc77d:75961caccb7:f883c46b4877f637e0fa5025b4d6b5c9040ec566] Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ cat test.c int bx[16]; int f(unsigned x) { return bx[x]; } $ gcc-10 -c test.c -O3 -masm=intel /tmp/ccGtGi2X.s: Assembler messages: /tmp/ccGtGi2X.s:12: Error: invalid use of register The offending line in the assembly code says lea rax, bx[rip] The problem does _not_ go away even if I quote the symbol name by hand in the assembly output, e.g. lea rax, "bx"[rip] Thank you!
[Bug libstdc++/96942] New: std::pmr::monotonic_buffer_resource causes CPU cache misses
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942 Bug ID: 96942 Summary: std::pmr::monotonic_buffer_resource causes CPU cache misses Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: dmitriy.ovdienko at gmail dot com Target Milestone: --- Created attachment 49183 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49183&action=edit Original implementation There is a webpage that compares performance of different programming languages: C++, C, Rust, Java, etc. https://benchmarksgame-team.pages.debian.net/benchmarksgame/ There is a "binary trees" test there. In this test application creates `perfect binary tree` and traverses it. https://benchmarksgame-team.pages.debian.net/benchmarksgame/description/binarytrees.html#binarytrees The fastest solution for this test is created in Rust. https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/binarytrees.html C++ implementation of this problem uses `std::pmr::monotonic_buffer_resource` class as a memory storage. https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/binarytrees-gpp-4.html I like C++ very much and I've started an investigation why application compiled in Rust is faster than C++. At first, I've run a `perf` tool and had found that application compiled in C++ generates a lot of CPU cache misses (54%): ```txt root@E5530:/home/dmytro_ovdiienko/Sources# perf stat -B -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations ./bt_orig 21 Performance counter stats for './bt_orig 21': 45,104,136 cache-references 24,448,475 cache-misses # 54.205 % of all cache refs 19,904,251,283 cycles 30,462,013,065 instructions #1.53 insn per cycle 4,834,392,341 branches 234,796 faults 2 migrations 2.083603709 seconds time elapsed 5.559471000 seconds user 0.309529000 seconds sys ``` I thought that it is caused by tree traversing. But after I've modified the code, I found that a lot of cache misses are caused by `std::pmr::monotonic_buffer_resource` class, which is used as a memory pool. I've modified that sample to pre-allocate memory required to hold entire binary tree instead of grow in geometric progression, but it had made things even worse. ```txt root@E5530:/home/dmytro_ovdiienko/Sources# perf stat -B -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations ./bt_orig_prealloc 21 Performance counter stats for './bt_orig_prealloc 21': 66,400,545 cache-references 45,740,962 cache-misses # 68.886 % of all cache refs 21,461,610,267 cycles 31,296,637,782 instructions #1.46 insn per cycle 4,967,611,660 branches 575,100 faults 9 migrations 2.219161594 seconds time elapsed 5.464583000 seconds user 0.854839000 seconds sys ``` That looks really weird and I've implemented my own allocator that behaves like `std::pmr::monotonic_buffer_resource` and with my memory storage CPU cache misses are dropped to 34%. ```txt root@E5530:/home/dmytro_ovdiienko/Sources# perf stat -B -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations ./bt_malloc 21 Performance counter stats for './bt_malloc 21': 40,713,525 cache-references 14,147,648 cache-misses # 34.749 % of all cache refs 14,823,743,812 cycles 22,306,442,507 instructions #1.50 insn per cycle 4,331,968,591 branches 60,227 faults 6 migrations 1.474751692 seconds time elapsed 4.282074000 seconds user 0.092476000 seconds sys ``` Execution time is also dropped from 2.12s to 1.52s (on my laptop). For completness, following is the report for application compiled in Rust: ```txt Performance counter stats for './rust/target/release/rust 2
[Bug libstdc++/96942] std::pmr::monotonic_buffer_resource causes CPU cache misses
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942 --- Comment #2 from Dmitriy Ovdienko --- Created attachment 49185 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49185&action=edit Modified solution with custom allocator based on malloc
[Bug libstdc++/96942] std::pmr::monotonic_buffer_resource causes CPU cache misses
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942 --- Comment #1 from Dmitriy Ovdienko --- Created attachment 49184 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49184&action=edit Original implementation with preallocated buffer
[Bug c++/60304] Including disables -Wconversion-null in C++ 98 mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60304 --- Comment #31 from Harald van Dijk --- (In reply to Jonathan Wakely from comment #30) > I'm curious why the preprocessed code in comment 28 doesn't warn, This was still bugging me, so I looked into it a little bit, and since I had trouble finding this written down somewhere I thought it would be worth including here. The line "# 2 "b.C" 3 4" means that what follows is line 2 of b.C, and b.C is a C system header. The relevant bits of GCC code to see this are https://github.com/gcc-mirror/gcc/blob/releases/gcc-10.2.0/libcpp/directives.c#L1061 https://github.com/gcc-mirror/gcc/blob/releases/gcc-10.2.0/libcpp/internal.h#L358 So this means that "false" is coming from a system header. It is: it is coming from the macro expansion of "false", and the macro definition was in a system header. So far, so good. However, during normal operation, with the integrated preprocessor, when a warning would be emitted in a system header, that get_location_for_expr_unwinding_for_system_header function added by the commit you were asking about, https://github.com/gcc-mirror/gcc/blob/releases/gcc-10.2.0/gcc/cp/call.c#L7146, would change the warning location to that of the macro expansion point, if the warning location was actually inside a macro definition from a system header. Such macro unwinding is not possible when the preprocessor is invoked separately, as this information is missing in the -E output. A non-system-header effect of this can be seen in this test: test.h: #define FALSE false test.cc: #include "test.h" void *p = FALSE; g++ -std=c++03 -c test.cc: In file included from test.cc:1: test.h:1:15: warning: converting ‘false’ to pointer type ‘void*’ [-Wconversion-null] 1 | #define FALSE false | ^ test.cc:2:11: note: in expansion of macro ‘FALSE’ 2 | void *p = FALSE; | ^ g++ -std=c++03 -c test.cc -save-temps test.cc:2:11: warning: converting ‘false’ to pointer type ‘void*’ [-Wconversion-null] 2 | void *p = FALSE; | ^ The addition of -save-temps causes the "note: in expansion of macro ‘FALSE’" to go missing, because the information needed to produce that note is gone by the time the warning is emitted: the macro expansion tracking is only available at preprocessing time. It was that macro expansion tracking functionality that GCC needs to determine that really, the warning should be treated as *not* coming from a system header, even though it really was. In short: I think there is no lingering bug here, this is just an unfortunate result of the current design. However, if you disagree, if you think the macro expansion tracking state should be included somehow in the preprocessor output so that the compiler always has access to it, I can report that as a new bug if you like.
[Bug c++/60304] Including disables -Wconversion-null in C++ 98 mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60304 --- Comment #32 from Jonathan Wakely --- Nice analysis. Personally I dislike when you get different results from separate preprocessing, but I don't know if it should be considered a bug.
[Bug c++/96943] New: incomplete type used in nested name specifier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96943 Bug ID: 96943 Summary: incomplete type used in nested name specifier Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: tangyixuan at mail dot dlut.edu.cn Target Milestone: --- The following code (maybe valid) is rejected by g++, while is accepted by clang. $ cat s.cpp template < int I > struct CA1{ enum { EA = 0}; }; template < int I > struct CA2{ enum { EA = 1, EA1 = CA2 :: EA }; }; $ clang++ -c s.cpp successful. $ g++ -c s.cpp s.cpp:6:43: error: incomplete type ‘CA2<1>’ used in nested name specifier 6 | EA = 1, EA1 = CA2 :: EA | Is this right?
[Bug c++/96944] New: call of overloaded is ambiguous
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96944 Bug ID: 96944 Summary: call of overloaded is ambiguous Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: tangyixuan at mail dot dlut.edu.cn Target Milestone: --- The following code is rejected by g++. clang++ compiles it successfully. $ cat s.cpp template < int I > int F( char [I]); template < int I > int F( char a = I ); int b = F<0>(0); $ g++ -c s.cpp s.cpp:3:15: error: call of overloaded ‘F<0>(int)’ is ambiguous 3 | int b = F<0>(0); | ^ s.cpp:1:24: note: candidate: ‘int F(char*) [with int I = 0]’ 1 | template < int I > int F( char [I]); |^ s.cpp:2:24: note: candidate: ‘int F(char) [with int I = 0]’ 2 | template < int I > int F( char a = I ); |
[Bug c++/96945] New: optimizations regression when defaulting copy constructor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96945 Bug ID: 96945 Summary: optimizations regression when defaulting copy constructor Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: federico.kircheis at gmail dot com Target Milestone: --- While toying with a piece of code, I've noticed that the code did not get optimized as expected. All snippets where compiled with -O3. A) #include struct c { }; void foo(){ std::vector vi = {c(),c(),c()}; } gets compiled to: https://godbolt.org/z/s7YaEf foo(): sub rsp, 24 mov edi, 3 calloperator new(unsigned long) mov esi, 3 mov rdi, rax movzx eax, WORD PTR [rsp+13] mov WORD PTR [rdi], ax movzx eax, BYTE PTR [rsp+15] mov BYTE PTR [rdi+2], al add rsp, 24 jmp operator delete(void*, unsigned long) Adding and defaulting the constructors produces even more optimized code (the whole vector is optimized out(!): https://godbolt.org/z/E4GT9x B) #include struct c { c() = default; c(const c&) =default; c(c&&) = default; }; void foo(){ std::vector vi = {c(),c(),c()}; } foo(): ret Adding and defaulting the constructors, except the move constructor produces the same code as A): https://godbolt.org/z/ch71fb B) #include struct c { c() = default; c(const c&) =default; c(c&&) = default; }; void foo(){ std::vector vi = {c(),c(),c()}; } If the copy or default constructor is implemented and not defaulted, then the code is optimized as B): https://godbolt.org/z/v8E37b, https://godbolt.org/z/v3EY69, #include struct c { c() {}; }; void foo(){ std::vector vi = {c(),c(),c()}; } C) #include struct c { c() = default; c(const c&) {}; }; void foo(){ std::vector vi = {c(),c(),c()}; } D) #include struct c { c() = default; c(const c&) {}; c(c&&) = default; }; void foo(){ std::vector vi = {c(),c(),c()}; } E) #include struct c { c() {} }; void foo(){ std::vector vi = {c(),c(),c()}; } While ideally the code for those cases is equivalent (as c has no state and all snippets are functionally equivalent), I would have expected the class with compiler-defined operators have the best codegen, followed by the class with defaulted operators, and last the class with a non-defaulted implementation. Strangely all constructor calls of `c` are always optimized away, but depending how the class is defined g++ does or does not optimize the whole vector away.
[Bug c++/96945] optimizations regression when defaulting copy constructor
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96945 --- Comment #1 from Federico Kircheis --- I've made a copy-paste error (I cant change the submitted bug), after B) it should come C): Adding and defaulting the constructors, except the move constructor produces the same code as A): https://godbolt.org/z/ch71fb C) #include struct c { c() = default; c(const c&) =default; }; void foo(){ std::vector vi = {c(),c(),c()}; }
[Bug libstdc++/96946] New: std::shared_ptr makes an "unrelated cast" that causes Clang's Control Flow Integrity sanitiser to crash
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96946 Bug ID: 96946 Summary: std::shared_ptr makes an "unrelated cast" that causes Clang's Control Flow Integrity sanitiser to crash Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: cjdb.ns at gmail dot com Target Milestone: --- Created attachment 49186 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49186&action=edit cfi-error temps # Compiler details Ubuntu clang version 11.0.0-++20200829062559+2c6a593b5e1-1~exp1~20200829163219.75 Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/bin # System details Distributor ID: Ubuntu Description:Ubuntu 20.04.1 LTS Release:20.04 Codename: focal # Compiler configuration Unknown: compiler obtained from apt.llvm.org. # Build trigger clang++ -std=c++14 -flto -fvisibility=hidden -g -fsanitize=cfi-unrelated-cast cfi-error.cpp # Compiler output Nothing, builds fine. # Run-time output $ ./a.out Illegal instruction # Thanks Martin Hořeňovský distilled this from a Catch2 bug to a minimal repro that exposes it's embedded in libstdc++'s shared_ptr.
[Bug target/96941] Initial PPC64LE transcendental auto-vectorization functionality
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96941 David Edelsohn changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2020-09-05 --- Comment #1 from David Edelsohn --- confirmed
[Bug c++/96242] ICE conditionally noexcept defaulted comparison
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96242 --- Comment #2 from Johel Ernesto Guerrero Peña --- Thank you, but am I not exempt? > The only excuses to not send us the preprocessed sources are [...] if you've > reduced the testcase to a small file that doesn't include any other file [...]
[Bug libstdc++/86419] codecvt::in() and out() incorrectly return ok in some cases.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86419 --- Comment #10 from Dimitrij Mijoski --- I was wrong in comment #9. The bug and the proposed fix are ok in comment #7. While writing some tests for error I discovered yet another bug in UTF-8 decoding. See the example: // 2 code points, both are 4 byte in UTF-8. const char u8in[] = u8"\U0010\U0010"; const char32_t u32in[] = U"\U0010\U0010"; void utf8_to_utf32_in_error_7 (const codecvt &cvt) { char in[7] = {}; char32_t out[3] = {}; char_traits::copy (in, u8in, 7); in[5] = 'z'; // Last CP has two errors. Its second code unit is malformed and it // misses its last code unit. Because it misses its last CU, the // decoder return too early that it is incomplete. // It should return invalid. auto state = mbstate_t{}; auto in_next = (const char *) nullptr; auto out_next = (char32_t *) nullptr; auto res = codecvt_base::result (); res = cvt.in (state, in, in + 7, in_next, out, out + 3, out_next); VERIFY (res == cvt.error); //incorrectly returns partial VERIFY (in_next == in + 4); VERIFY (out_next == out + 1); VERIFY (out[0] == u32in[0] && out[1] == 0 && out[2] == 0); } I published the full testsuite on Github, licensed under GPL v3+ of course. https://github.com/dimztimz/codecvt_test/blob/master/codecvt.cpp . I was thinking of sending a patch, but after this last bug, 4th, I see this needs more time. Maybe a testsuite from another library like ICU can be incorporated? Well, whatever, I will pause my work on this.