[Bug libstdc++/66416] string::find_last_of 3.5 times slower than memrchr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66416 AK changed: What|Removed |Added CC||hiraditya at msn dot com --- Comment #2 from AK --- Could it be because string::find is more optimized than string::rfind? see: https://gcc.gnu.org/legacy-ml/libstdc++/2017-01/msg00034.html
[Bug libstdc++/66414] string::find ten times slower than strstr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66414 AK changed: What|Removed |Added CC||hiraditya at msn dot com --- Comment #8 from AK --- Should we consider this fixed?
[Bug libstdc++/93584] std::string::find_first_not_of is about 9X slower than strspn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93584 AK changed: What|Removed |Added CC||hiraditya at msn dot com --- Comment #3 from AK --- Could it be because string::find_first_not_of is not as optimized as string::find? https://github.com/gcc-mirror/gcc/commit/cb627cdf5c0761f9e1be587a1416db9446a4801b
[Bug libstdc++/94747] New: Undefined behavior: integer overflow in libsupc++/dyncast.cc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94747 Bug ID: 94747 Summary: Undefined behavior: integer overflow in libsupc++/dyncast.cc Product: gcc Version: 7.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- Integer overflow reported by asan with the following stack trace. If this is not sufficient I can try to provide a repro gcc/7.x/libstdc++-v3/libsupc++/dyncast.cc:53:11: runtime error: negation of 16 cannot be represented in type 'unsigned long' > #0 in __dynamic_cast gcc/7.x/libstdc++-v3/libsupc++/dyncast.cc:53 > #1 in bool std::has_facet >(std::locale const&) > gcc/7.x/.../bits/locale_classes.tcc:110 > #2 in std::basic_ios > >::_M_cache_locale(std::locale const&) gcc/7.x/.../bits/basic_ios.tcc:159 > #3 in std::basic_ios > >::init(std::basic_streambuf >*) > gcc/7.x/.../bits/basic_ios.tcc:132 > #4 in std::basic_ostream > >::basic_ostream(std::basic_streambuf >*) > gcc/7.x/.../ostream:85 > #5 in std::ios_base::Init::Init() > gcc/7.x/libstdc++-v3/src/c++98/ios_init.cc:91 > #6 in __cxx_global_var_init gcc/7.x/.../iostream:74
[Bug libstdc++/94823] modulo arithmetic bug in random.tcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94823 AK changed: What|Removed |Added CC||hiraditya at msn dot com --- Comment #1 from AK --- Here's the partial stack trace in case it helps ``` in bits/random.tcc:3274:20: runtime error: unsigned integer overflow: 0 - 1 cannot be represented in type 'unsigned long' in void std::seed_seq::generate(unsigned int*, unsigned int*) in std::enable_if::value, void>::type __gnu_cxx::simd_fast_mersenne_twister_engine::seed(std::seed_seq&) ext/random.tcc:110 in __gnu_cxx::simd_fast_mersenne_twister_engine::simd_fast_mersenne_twister_engine(std::seed_seq&) ext/random:104 ```
[Bug libstdc++/94823] modulo arithmetic bug in random.tcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94823 --- Comment #4 from AK --- Makes sense. Thanks for the explanation.
[Bug libstdc++/94823] modulo arithmetic bug in random.tcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94823 --- Comment #5 from AK --- > So when __k == 0, then all three of those loads will be _Type(0x8b8b8b8bu) > really; no matter what the values of __n, __p will be. Will it be a good idea to add the explanation in comments, as this may be tricky for someone to comprehend in future?
[Bug tree-optimization/95565] New: [Feature request] add a flag to only instrument function entry.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95565 Bug ID: 95565 Summary: [Feature request] add a flag to only instrument function entry. Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- The flag -finstrument-functions instruments both entry and the exit of function. There are many scenarios (like cheap profiling) only instrumenting the function entry is sufficient. But gcc instruments exit as well contributing to unwanted code size increase.
[Bug tree-optimization/95565] [Feature request] add a flag to only instrument function entry.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95565 --- Comment #1 from AK --- I believe we need to conditionally disable the following code, but I'm not sure of all the implications. If someone can implement it that'd be great. ``` gcc/gimplify.c Around Line:14997 x = builtin_decl_implicit (BUILT_IN_RETURN_ADDRESS); call = gimple_build_call (x, 1, integer_zero_node); tmp_var = create_tmp_var (ptr_type_node, "return_addr"); gimple_call_set_lhs (call, tmp_var); gimplify_seq_add_stmt (&cleanup, call); x = builtin_decl_implicit (BUILT_IN_PROFILE_FUNC_EXIT); call = gimple_build_call (x, 2, this_fn_addr, tmp_var); gimplify_seq_add_stmt (&cleanup, call); tf = gimple_build_try (seq, cleanup, GIMPLE_TRY_FINALLY); ```
[Bug tree-optimization/92638] New: gcc unable to remove empty loop after loop body is removed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92638 Bug ID: 92638 Summary: gcc unable to remove empty loop after loop body is removed Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- $cat test.c #include #include #include #include char* get(const char* value, const char separator) { int separator_index = strchr(value, separator) - value; char* result = (char*)malloc(separator_index); memcpy(result, value, separator_index); result[separator_index] = '\0'; return result; } int main() { const char separator = ','; clock_t t = clock(); for (size_t i = 0; i < 1; ++i) { free(get("127.0.0.1, 127.0.0.2:", separator)); } float elapsed_seconds = (((double)(clock() - t)) / CLOCKS_PER_SEC); printf("%f seconds.\n", elapsed_seconds); return 0; } $ gcc -O3 -S -o- get(char const*, char): pushrbp movsx esi, sil mov rbp, rdi pushrbx sub rsp, 8 callstrchr sub rax, rbp movsx rbx, eax mov rdi, rbx callmalloc mov rdx, rbx mov rsi, rbp mov rdi, rax callmemcpy mov BYTE PTR [rax+rbx], 0 add rsp, 8 pop rbx pop rbp ret .LC1: .string "%f seconds.\n" main: pushrbx callclock mov rbx, rax mov eax, 1 .L5: sub rax, 1 // <-- Loop body still there. jne .L5 callclock pxorxmm0, xmm0 mov edi, OFFSET FLAT:.LC1 sub rax, rbx cvtsi2sdxmm0, rax mov eax, 1 divsd xmm0, QWORD PTR .LC0[rip] cvtsd2ssxmm0, xmm0 cvtss2sdxmm0, xmm0 callprintf xor eax, eax pop rbx ret .LC0: .long 0 .long 1093567616
[Bug tree-optimization/92638] gcc unable to remove empty loop after loop body is removed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92638 --- Comment #1 from AK --- FYI: clang -O3 optimizes the empty loop.
[Bug tree-optimization/85610] New: Unable to optimize away mov followed by compare into a cmpb in case of atomic_load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85610 Bug ID: 85610 Summary: Unable to optimize away mov followed by compare into a cmpb in case of atomic_load Product: gcc Version: tree-ssa Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- $ cat test.cpp #include std::atomic flag_atomic{false}; extern void f1(); extern void f2(); void foo() { bool b = flag_atomic.load(std::memory_order_relaxed); if (b == false) { f1(); } else { f2(); } } $ g++-7 -O3 -S -o - test.cpp -std=c++14 __Z3foov: LFB342: movzbl _flag_atomic(%rip), %eax testb %al, %al je L4 jmp __Z2f2v .align 4,0x90 L4: jmp __Z2f1v We could just use `cmpb $0, _flag_atomic(%rip)` and avoid a register in this case. When _flag_atomic is a scalar boolean global variable, that's what happens.
[Bug tree-optimization/85611] New: Suboptimal code generation for (potentially) redundant atomic loads
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85611 Bug ID: 85611 Summary: Suboptimal code generation for (potentially) redundant atomic loads Product: gcc Version: tree-ssa Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- $ cat test.cpp #include std::atomic atomic_var{100}; int somevar; bool cond; void run1() { auto a = atomic_var.load(std::memory_order_relaxed); auto b = atomic_var.load(std::memory_order_relaxed); // Some code using a and b; } void run2() { if (atomic_var.load(std::memory_order_relaxed) == 2 && cond) { if (atomic_var.load(std::memory_order_relaxed) * somevar > 3) { /*...*/ } } } $ g++-7 -O3 -std=c++17 -S -o - test.cpp -fno-exceptions .text .align 4,0x90 .globl __Z4run1v __Z4run1v: LFB339: movl_atomic_var(%rip), %eax movl_atomic_var(%rip), %eax ret LFE339: .align 4,0x90 .globl __Z4run2v __Z4run2v: LFB340: movl_atomic_var(%rip), %eax cmpl$2, %eax je L5 L3: ret .align 4,0x90 L5: cmpb$0, _cond(%rip) je L3 movl_atomic_var(%rip), %eax ret
[Bug rtl-optimization/87238] New: Redundant Restore of $x0 when memcpy always returns the first argument.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87238 Bug ID: 87238 Summary: Redundant Restore of $x0 when memcpy always returns the first argument. Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- $ cat test.cpp struct BigStruct { int x[64]; }; void structByValue(BigStruct s); void callStructByValue(int unused, int unused2, BigStruct s) { structByValue(s); } $ g++ -O3 -arch arm64 test.cpp -S -o - callStructByValue(int, int, BigStruct): stp x29, x30, [sp, -272]! mov x1, x2 mov x2, 256 add x29, sp, 0 add x0, x29, 16 << bl memcpy add x0, x29, 16 << redundant bl structByValue(BigStruct) ldp x29, x30, [sp], 272 ret We could just do remove the second 'add x0, x29, 16' as memcpy is guaranteed to return the pointer to desination. http://man7.org/linux/man-pages/man3/memcpy.3.html Possibly duplicate of PR82991 but not sure.
[Bug tree-optimization/87505] New: Vectorizer generates a lot of code for a small loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87505 Bug ID: 87505 Summary: Vectorizer generates a lot of code for a small loop Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- test.cpp #include int bar(int* v, std::size_t base) { int sum = 0; for (int i = base; i < base + 4; ++i) { sum += v[i]; } return sum; } $ gcc-8.2 -std=c++17 -O3 -DNDEBUG test.cpp bar(int*, unsigned long): movslq %esi, %rcx leaq4(%rsi), %r8 movl%esi, %edx cmpq%r8, %rcx jnb .L7 leaq3(%rsi), %rax movq%r8, %r9 subq%rcx, %rax subq%rcx, %r9 cmpq$3, %rax jbe .L8 movq%r9, %rdx leaq(%rdi,%rcx,4), %rax pxor%xmm0, %xmm0 shrq$2, %rdx salq$4, %rdx addq%rax, %rdx .L5: movdqu (%rax), %xmm2 addq$16, %rax paddd %xmm2, %xmm0 cmpq%rdx, %rax jne .L5 movdqa %xmm0, %xmm1 movq%r9, %r10 psrldq $8, %xmm1 andq$-4, %r10 paddd %xmm1, %xmm0 addq%r10, %rcx leal(%rsi,%r10), %edx movdqa %xmm0, %xmm1 psrldq $4, %xmm1 paddd %xmm1, %xmm0 movd%xmm0, %eax cmpq%r10, %r9 je .L10 .L3: addl(%rdi,%rcx,4), %eax leal1(%rdx), %ecx movslq %ecx, %rcx cmpq%r8, %rcx jnb .L1 addl(%rdi,%rcx,4), %eax leal2(%rdx), %ecx movslq %ecx, %rcx cmpq%rcx, %r8 jbe .L1 addl$3, %edx addl(%rdi,%rcx,4), %eax movslq %edx, %rdx cmpq%rdx, %r8 jbe .L1 addl(%rdi,%rdx,4), %eax ret .L7: xorl%eax, %eax .L1: ret .L10: ret .L8: xorl%eax, %eax jmp .L3
[Bug c++/17913] ICE jumping into statement expression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=17913 AK changed: What|Removed |Added CC||hiraditya at msn dot com --- Comment #25 from AK --- I think this bug is fixed: void f(void) { 1 ? 1 : ({ a : 1; 1; }); goto a; } g++-7.3 -O3 -std=c++11 test.c -S -o - f(): .L2: jmp .L2
[Bug c++/22238] Awful error messages with virtual functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=22238 AK changed: What|Removed |Added CC||hiraditya at msn dot com --- Comment #24 from AK --- The recent error messages look much better. Maybe we can close this. prog.cpp: In member function ‘void A::bar()’: prog.cpp:6:23: error: could not convert ‘A::foo()’ from ‘void’ to ‘bool’ void bar() { if (foo()) ; }
[Bug c++/12333] [DR 272] Explicit call to MyClass::~MyClass() not allowed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=12333 --- Comment #17 from AK --- The following workarounds do not emit compiler errors, although I'm not sure if second option is a correct workaround. 1. this->~X(); 2. X::~X(0); FYI, ICC 18 also has the same bug. The first workaround works for ICC but not the second one.
[Bug c++/87628] New: Redundant check of pointer when delete is called
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87628 Bug ID: 87628 Summary: Redundant check of pointer when delete is called Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- https://godbolt.org/z/DY9ruv void if_delete(char *p) { if (p) { delete(p); } } $ gcc-8.2 -Os -fno-exceptions if_delete(char*): test rdi, rdi je .L1 mov esi, 1 jmp operator delete(void*, unsigned long) .L1: ret While clang removes the check at -Oz: $ clang -Oz -fno-exceptions if_delete(char*): jmp operator delete(void*)
[Bug libstdc++/79349] unused std::string is not optimized away in presense of a call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79349 AK changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |WORKSFORME --- Comment #2 from AK --- The problem is exceptions. When I compile without exceptions (-fno-exceptions) g++ does optimize this away and gives same output as clang. It seems clang++ compiles without exceptions by default and behaves like g++ when -fexceptions is passed.
[Bug libstdc++/79349] unused std::string is not optimized away in presense of a call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79349 AK changed: What|Removed |Added Status|RESOLVED|NEW Resolution|WORKSFORME |--- --- Comment #3 from AK --- (In reply to AK from comment #2) > The problem is exceptions. When I compile without exceptions > (-fno-exceptions) g++ does optimize this away and gives same output as > clang. It seems clang++ compiles without exceptions by default and behaves > like g++ when -fexceptions is passed. Correction: For the example: #include int main() { std::string s("abc"); return 0; } clang (with libc++) optimizes away the std::string even in the presence of exceptions (-fexceptions). When I compile without exceptions (-fno-exceptions) g++ does optimize the std::string away. However, when I introduce a call: #include void foo(); int main() { std::string s("abc"); foo(); return 0; } clang++ still optimizes the std::string but g++ does not. I think the problem is with libstdc++ because when clang is using libstdc++ I can see the destructor. Sorry for the confusion.
[Bug libstdc++/78702] New: [libstdc++] class __shim in locale::facet is private
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78702 Bug ID: 78702 Summary: [libstdc++] class __shim in locale::facet is private Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- In file: include/bits/locale_classes.h 371 class locale::facet 372 { ... 465 class __shim; 466 467 const facet* _M_sso_shim(const id*) const; 468 const facet* _M_cow_shim(const id*) const; However in file: src/c++11/cxx11-shim_facets.cc numpunct_shim derives from facet::__shim which results in compilation error. 227 namespace // unnamed 228 { 229 template 230 struct numpunct_shim : std::numpunct<_CharT>, facet::__shim 231 { 232 typedef typename numpunct<_CharT>::__cache_type __cache_type; 233 234 // f must point to a type derived from numpunct[abi:other] 235 numpunct_shim(const facet* f, __cache_type* c = new __cache_type) 236 : std::numpunct<_CharT>(c), __shim(f), _M_cache(c) 237 { 238 __numpunct_fill_cache(other_abi{}, f, c); 239 } What could be a possible fix here?
[Bug libstdc++/78702] [libstdc++] class __shim in locale::facet is private
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78702 --- Comment #2 from AK --- Sorry for the confusion, I was using clang++ (trunk) to build libstdc++
[Bug libstdc++/78702] [libstdc++] class __shim in locale::facet is private
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78702 --- Comment #3 from AK --- llvm-project/install/bin/clang++ -std=gnu++14 -D_GLIBCXX_SHARED -fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual -Wabi -fdiagnostics-show-location=once -ffunction-sections -fdata-sections -frandom-seed=cxx11-shim_facets.lo -c ../../../src/c++11/cxx11-shim_facets.cc -fPIC -DPIC -D_GLIBCXX_SHARED -o cxx11-shim_facets.o (+ some include flags from Makefile) ../../../src/c++11/cxx11-shim_facets.cc:230:60: error: '__shim' is a private member of 'std::locale::facet' struct numpunct_shim : std::numpunct<_CharT>, facet::__shim ^ ~/s/work/gcc/libstdc++-v3/build/include/bits/locale_classes.h:464:11: note: declared private here class __shim; clang version 4.0.0 Target: x86_64-unknown-linux-gnu
[Bug libstdc++/78717] New: no definition of string::find when lowered to gimple
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78717 Bug ID: 78717 Summary: no definition of string::find when lowered to gimple Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- $ cat test.cpp #include int foo(const std::string &s1, const std::string &s2, int i) { return s1.find(s2) == i; } ../gcc/install/usr/bin/g++ -S -o a.s ../a.cpp -fdump-tree-all-all $ cat a.cpp.004t.gimple int foo(const string&, const string&, int) (const struct string & s1, const struct string & s2, int i) { intD.9 D.27718; # USE = anything # CLB = anything _1 = _ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE4findERKS4_mD.18492 (s1D.24055, s2D.24056, 0); _2 = (long unsigned intD.14) iD.24057; _3 = _1 == _2; D.27718 = (intD.9) _3; return D.27718; } The problem is that now inliner cannot see the definition of std::string::find and hence cannot inline it. Maybe because std::basic_string is an extern template, but I would hope that at least the definition should be visible to the optimizer. That would help improve the performance of programs using string::find. Thanks,
[Bug libstdc++/66414] string::find ten times slower than strstr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66414 AK changed: What|Removed |Added CC||hiraditya at msn dot com --- Comment #6 from AK --- I have posted a patch up for review for string::find which might help as well. https://gcc.gnu.org/ml/libstdc++/2016-12/msg00051.html Please give feedback for improvement. -Aditya
[Bug libstdc++/78717] no definition of string::find when lowered to gimple
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78717 --- Comment #2 from AK --- With -O3 I see only the following definition of find which calls the real find function I was expecting to be visible in the gimple. D.12805 = _ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE4findERKS4_mD.11635 (s1D.12794, s2D.12795, 0); std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::size_type std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::find(const std::__cxx11::basic_string<_CharT, _Traits, _Alloc>&, std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::size_type) const But after I #defined _GLIBCXX_DEBUG in test.cpp (before including string), I can see the find function.
[Bug middle-end/79349] New: unused std::string is not optimized away in presense of a call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79349 Bug ID: 79349 Summary: unused std::string is not optimized away in presense of a call Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- g++ version (GCC) 7.0.0 20170118 (experimental) $ cat t.cpp #include void foo(); int main() { std::string s("abc"); foo (); return 0; } $ install/bin/g++ -O3 t.cpp -S -o t.s $ cat t.s main: .LFB995: .cfi_startproc .cfi_personality 0x3,__gxx_personality_v0 .cfi_lsda 0x3,.LLSDA995 pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 subq$32, %rsp .cfi_def_cfa_offset 48 leaq16(%rsp), %rax movb$99, 18(%rsp) movq$3, 8(%rsp) movb$0, 19(%rsp) movq%rax, (%rsp) movl$25185, %eax movw%ax, 16(%rsp) .LEHB0: call_Z3foov .LEHE0: movq(%rsp), %rdi leaq16(%rsp), %rax cmpq%rax, %rdi je .L6 call_ZdlPv .L6: addq$32, %rsp .cfi_remember_state .cfi_def_cfa_offset 16 xorl%eax, %eax popq%rbx .cfi_def_cfa_offset 8 ret .L5: .cfi_restore_state movq(%rsp), %rdi leaq16(%rsp), %rdx movq%rax, %rbx cmpq%rdx, %rdi je .L4 call_ZdlPv .L4: movq%rbx, %rdi .LEHB1: call_Unwind_Resume .LEHE1: .cfi_endproc .LFE995: .globl __gxx_personality_v0 .section.gcc_except_table,"a",@progbits While clang++ optimizes it away: clang version 5.0.0 (llvm-project SHA: 28b7c19c2379e17b26571260933467b9f98b449c) $ ./bin/clang++ -O3 t.cpp -S -o t.s -stdlib=libc++ $ cat t.s main: # @main .cfi_startproc # BB#0: # %entry pushq %rax .Lcfi0: .cfi_def_cfa_offset 16 callq _Z3foov xorl%eax, %eax popq%rcx retq .Lfunc_end0: .size main, .Lfunc_end0-main .cfi_endproc .ident "clang version 5.0.0 " .section".note.GNU-stack","",@progbits
[Bug libstdc++/80331] New: unused const std::string not optimized away
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80331 Bug ID: 80331 Summary: unused const std::string not optimized away Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- $ cat t.cpp #include int sain() { const std::string s("a"); return 0; } # gcc version 7.0.0 20170118 (experimental) (GCC) $ g++ -S -o t.s t.cpp -O2 -fno-exceptions -std=c++11 $ cat t.s .type _Z4sainv, @function _Z4sainv: .LFB940: .cfi_startproc pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 movl$.LC0+1, %esi subq$32, %rsp .cfi_def_cfa_offset 48 leaq16(%rsp), %rbx movq%rsp, %rdi movq%rbx, (%rsp) call _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_M_constructIPKcEEvT_S8_St20forward_iterator_tag.isra.16.constprop.20 movq(%rsp), %rdi cmpq%rbx, %rdi je .L13 call_ZdlPv .L13: addq$32, %rsp .cfi_def_cfa_offset 16 xorl%eax, %eax popq%rbx .cfi_def_cfa_offset 8 ret .cfi_endproc .LFE940: .size _Z4sainv, .-_Z4sainv clang++, on the other hand, completely optimizes the const string. .type _Z4sainv,@function _Z4sainv: # @_Z4sainv .cfi_startproc # BB#0: xorl%eax, %eax retq .Lfunc_end0: .size _Z4sainv, .Lfunc_end0-_Z4sainv
[Bug tree-optimization/82776] New: Unable to optimize the loop when iteration count is unavailable.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82776 Bug ID: 82776 Summary: Unable to optimize the loop when iteration count is unavailable. Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- Compiling with g++ -O2 --std=c++14 -msse4 -DENABLE_FORLOOP vs. g++ -O2 --std=c++14 -msse4 gives dramatically different results in the sense that the loop is completely optimized when for-loop is present instead of `while(true)` loop. Reproduces with g++-7.2 and g++-trunk. $ cat test.cpp #include #include #include #include #include #include #include struct Chunk { std::array tags_; uint8_t control_; bool eof() const { return (control_ & 1) != 0; } static constexpr unsigned kFullMask = (1 << 14) - 1; __m128i const* tagVector() const { return static_cast<__m128i const*>(static_cast(&tags_[0])); } unsigned emptyMask() const { auto tagV = _mm_load_si128(tagVector()); auto emptyTagV = _mm_cmpeq_epi8(tagV, _mm_setzero_si128()); return _mm_movemask_epi8(emptyTagV) & kFullMask; } unsigned occupiedMask() const { return emptyMask() ^ kFullMask; } }; #define LIKELY(x) __builtin_expect((x), true) #define UNLIKELY(x) __builtin_expect((x), false) struct Iter { Chunk* chunk_; std::size_t index_; void advance() { // common case is packed entries while (index_ > 0) { --index_; if (LIKELY(chunk_->tags_[index_] != 0)) { return; } } // bar only skips the work of advance() if this loop can // be guaranteed to terminate #ifdef ENABLE_FORLOOP for (std::size_t i = 1; i != 0; ++i) { #else while (true) { #endif // exhausted the current chunk if (chunk_->eof()) { chunk_ = nullptr; break; } ++chunk_; auto m = chunk_->occupiedMask(); if (m != 0) { index_ = 31 - __builtin_clz(m); break; } } } }; static Iter foo(Iter iter) { puts("hello"); iter.advance(); return iter; } void bar(Iter iter) { foo(iter); }
[Bug tree-optimization/82776] Unable to optimize the loop when iteration count is unavailable.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82776 --- Comment #9 from AK --- Are we also taking advantage of this statement in the standard: > An iteration statement that performs no input/output operations, does not > access volatile objects, and performs no synchronization or atomic operations in its body, controlling expression, or (in the case of a for statement) its expression may be assumed by the implementation to terminate.
[Bug rtl-optimization/82889] New: Unnecessary sign extension of int32 to int64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82889 Bug ID: 82889 Summary: Unnecessary sign extension of int32 to int64 Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- $ cat t.cpp #include int lol(int32_t* table, int32_t* ht, uint32_t hash, uint32_t mask) { for (uint64_t probe = (uint32_t)hash & mask, i = 1;; ++i) { int32_t pos = ht[probe]; if (pos >= 0) { if (table[pos] == 42) { return true; } } else if (pos & 1) { return false; } probe += i; probe &= mask; } // notreached } compile with: gcc -std=c++11 -O3 -s -o - lol(int*, int*, unsigned int, unsigned int): andl%ecx, %edx movl$1, %r8d movl%ecx, %ecx jmp .L5 .L10: cmpl$42, (%rdi,%rax,4) je .L9 .L4: addq%r8, %rdx addq$1, %r8 andq%rcx, %rdx .L5: movslq (%rsi,%rdx,4), %rax #<sign extended testl %eax, %eax jns .L10 testb $1, %al je .L4 xorl%eax, %eax ret .L9: movl$1, %eax ret Is it possible to get rid of this?
[Bug tree-optimization/46186] Clang creates code running 1600 times faster than gcc's
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186 AK changed: What|Removed |Added CC||hiraditya at msn dot com --- Comment #27 from AK --- Seems PR65855 is related to this. btw, it may be worthwhile to try the patch posted by Sebastian https://gcc.gnu.org/bugzilla/attachment.cgi?id=22201
[Bug ipa/65972] New: ICE after applying a patch to enable the vectorizer.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65972 Bug ID: 65972 Summary: ICE after applying a patch to enable the vectorizer. Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- I was trying to debug a compiler crash while compiling a program with auto-fdo enabled during which the compiler crashed. I could triage the bug to this small case. The issue here is that ssa is not in a valid form at this point and the compiler analyzes a function for inlining. diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c index 5d99887..d76f396 100644 --- a/gcc/ipa-inline-analysis.c +++ b/gcc/ipa-inline-analysis.c @@ -128,6 +128,7 @@ along with GCC; see the file COPYING3. If not see #include "ipa-utils.h" #include "cilk.h" #include "cfgexpand.h" +#include "tree-ssa.h" /* Estimate runtime of function can easilly run into huge numbers with many nested loops. Be sure we can compute time * INLINE_SIZE_SCALE * 2 in an @@ -2506,6 +2507,7 @@ estimate_function_body_sizes (struct cgraph_node *node, bool early) <0,2>. */ basic_block bb; struct function *my_function = DECL_STRUCT_FUNCTION (node->decl); + verify_ssa (false, true); int freq; struct inline_summary *info = inline_summaries->get (node); struct predicate bb_predicate; Configured with: ../configure --prefix=../install-crash --enable-shared --enable-static --target=arm-foo-linux-gnueabihf --with-sysroot=/compiler/tools/cross/arm-foo-linux-gnueabihf/gcc-5.1/2015-04-22-222321-2c9bf40-release-5.1/arm-foo-linux-gnueabihf/sysroot --disable-__cxa_atexit --with-gnu-ld --disable-libssp --disable-multilib --with-gmp=/compiler/tools/cross/arm-foo-linux-gnueabihf/gcc-5.1/2015-04-22-222321-2c9bf40-release-5.1 --with-mpfr=/compiler/tools/cross/arm-foo-linux-gnueabihf/gcc-5.1/2015-04-22-222321-2c9bf40-release-5.1 --enable-target-optspace --disable-libquadmath --enable-tls --disable-libmudflap --enable-threads --with-mpc=/compiler/tools/cross/arm-foo-linux-gnueabihf/gcc-5.1/2015-04-22-222321-2c9bf40-release-5.1 --disable-decimal-float --enable-languages=c,c++,lto --with-build-time-tools=/compiler/tools/cross/arm-foo-linux-gnueabihf/gcc-5.1/2015-04-22-222321-2c9bf40-release-5.1/arm-foo-linux-gnueabihf/bin --disable-libgomp --with-pkgversion='Built at foo from commit SHA:2c9bf40, Rev:222321' --with-bugurl=foo...@samsung.com --with-abi=aapcs-linux --with-cpu=cortex-a15 --with-fpu=neon --with-float=hard --with-mode=arm --disable-bootstrap --enable-languages=c,c++ Thread model: posix The compiler crashes like this: ../../../libgcc/config/arm/unwind-arm.c: In function ‘__gnu_unwind_pr_common’: ../../../libgcc/config/arm/unwind-arm.c:511:1: error: virtual definition of statement not up-to-date } ^ # .MEM_102 = VDEF <.MEM_97> _103 = _Unwind_decode_typeinfo_ptr.isra.0 (_101); ../../../libgcc/config/arm/unwind-arm.c:511:1: internal compiler error: verify_ssa failed gcc/build-crash/./gcc/xgcc -Bgcc/build-crash/./gcc/ -Bgcc/build-crash/../install-crash/arm-foo-linux-gnueabihf/bin/ -Bgcc/build-crash/../install-crash/arm-foo-linux-gnueabihf/lib/ -isystem gcc/build-crash/../install-crash/arm-foo-linux-gnueabihf/include -isystem gcc/build-crash/../install-crash/arm-foo-linux-gnueabihf/sys-include-g -O2 -g -Os -O2 -g -O2 -g -Os -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -isystem ./include -fPIC -fno-inline -g -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector -fPIC -fno-inline -I. -I. -I../.././gcc -I../../../libgcc -I../../../libgcc/. -I../../../libgcc/../gcc -I../../../libgcc/../include -DHAVE_CC_TLS -o unwind-arm.o -MT unwind-arm.o -MD -MP -MF unwind-arm.dep -fexceptions -c ../../../libgcc/config/arm/unwind-arm.c 0xb80500 convert_callers_for_node ../../gcc/tree-sra.c:4940 0xb853cb cgraph_node::call_for_symbol_and_aliases(bool (*)(cgraph_node*, void*), void*, bool) ../../gcc/cgraph.h:3025 0xb853cb convert_callers ../../gcc/tree-sra.c:4955 0xb853cb modify_function ../../gcc/tree-sra.c:5011 0xb8e861 ipa_early_sra ../../gcc/tree-sra.c:5239 0xb8e861 execute ../../gcc/tree-sra.c:5286 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. make[2]: *** [unwind-arm.o] Error 1 make[2]: *** Waiting for unfinished jobs 0xc966ab verify_ssa(bool, bool) ../../gcc/tree-ssa.c:1068 0x8ef1d0 estimate_function_body_sizes ../../gcc/ipa-inline-analysis.c:2510 0x8f1f43 compute_inline_parameters(cgraph_node*, bool) ../../gcc/ipa-inline-analysis.c:2973 0xb80500 convert_cal
[Bug ipa/65972] ICE after applying a patch to enable verify_ssa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65972 --- Comment #1 from AK --- PS: The bootstrap fails after applying this patch and emits the error reported above.
[Bug ipa/65972] ICE after applying a patch to enable verify_ssa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65972 --- Comment #3 from AK --- Created attachment 35457 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35457&action=edit Preprocesed unwind-arm.i
[Bug middle-end/65947] Vectorizer misses conditional assignment of constant
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65947 AK changed: What|Removed |Added CC||hiraditya at msn dot com --- Comment #2 from AK --- Can you please explain how the conditional is commutative. That will be very helpful. Thanks,
[Bug ipa/65972] ICE after applying a patch to enable verify_ssa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65972 --- Comment #6 from AK --- Your patch did fix the problem. Thanks!
[Bug rtl-optimization/66206] New: Address of stack memory associated with local variable returned to caller
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66206 Bug ID: 66206 Summary: Address of stack memory associated with local variable returned to caller Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- Clang static analyzer reported this potential bug. File: gcc/bt-load.c Location: line 234, column 6 Description:Address of stack memory associated with local variable 'x' returned to caller 220 static rtx * 221 find_btr_use (rtx x, rtx *excludep = 0) 222 { 223 subrtx_ptr_iterator::array_type array; 224 FOR_EACH_SUBRTX_PTR (iter, array, &x, NONCONST) 225 { 226 rtx *loc = *iter; 227 if (loc == excludep) 228 iter.skip_subrtxes (); 229 else 230 { 231 const_rtx x = *loc; 232 if (REG_P (x) 233 && overlaps_hard_reg_set_p (all_btrs, GET_MODE (x), REGNO (x))) 234 return loc; // <- Address of stack memory associated with local variable 'x' returned to caller 235 } 236 } 237 return 0; 238 }
[Bug tree-optimization/48052] loop not vectorized if index is "unsigned int"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48052 --- Comment #13 from AK --- We have an updated patch that works for both the cases. https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01991.html
[Bug tree-optimization/67700] New: [graphite] miscompile due to wrong codegen
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67700 Bug ID: 67700 Summary: [graphite] miscompile due to wrong codegen Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- A reduced test case is below: compile with trunk gcc: gcc -O2 -fgraphite-identity test.c ./a.out int main(): Assertion `abcd->a[8] == 29' failed struct abc { int a[81]; } *abcd; #define FPMATH_SSE 2 int global; void __attribute__ ((noinline)) foo() { int pos = 0; int i; if (!((global & FPMATH_SSE) != 0)) for (i = 8; i <= 15; i++) abcd->a[pos++] = i; for (i = 29; i <= 36; i++) abcd->a[pos++] = i; } #include #include #include int main() { int i; abcd = (struct abc*) malloc (sizeof (abc)); for (i = 0; i <= 80; i++) abcd->a[i] = 0; foo(); assert (abcd->a[8] == 29); return 0; }
[Bug tree-optimization/67700] [graphite] miscompile due to wrong codegen
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67700 --- Comment #1 from AK --- The problem seems to be in static void canonicalize_loop_closed_ssa (loop_p loop) which generates phi node at a wrong place in this case.
[Bug tree-optimization/67842] Incorrect check in sese.h:bb_in_region
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67842 AK changed: What|Removed |Added CC||hiraditya at msn dot com --- Comment #1 from AK --- This check is irrelevant now. I'll put up a patch to remove this check.
[Bug tree-optimization/65850] [5/6 Regression] [graphite]: isl_constraint.c:625: expecting integer value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65850 AK changed: What|Removed |Added CC||hiraditya at msn dot com --- Comment #4 from AK --- This bug appears to be a duplicate of #61929. The fix (r225942) has been pushed today. Please verify if this fixes your problem.
[Bug middle-end/64394] ICE: in build_linearized_memory_access, at graphite-interchange.c:121 (isl_constraint.c:558: expecting integer value) with -floop-interchange
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64394 AK changed: What|Removed |Added CC||hiraditya at msn dot com --- Comment #3 from AK --- This bug appears to be a duplicate of bug61929. The fix (r225942) has been pushed today. Please verify if this fixes your problem.
[Bug middle-end/70159] missed CSE optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70159 AK changed: What|Removed |Added CC||hiraditya at msn dot com --- Comment #11 from AK --- Just as an update, the new gvn-hoist pass in llvm hoists the common computations: @cat test.c float foo_p(float d, float min, float max, float a) { float tmin; float tmax; float inv = 1.0f / d; if (inv >= 0) { tmin = (min - a) * inv; tmax = (max - a) * inv; } else { tmin = (max - a) * inv; tmax = (min - a) * inv; } return tmax + tmin; } clang -c -Ofast test.c -mllvm -print-after-all *** IR Dump Before Early GVN Hoisting of Expressions *** ; Function Attrs: nounwind uwtable define float @_Z5foo_p(float %d, float %min, float %max, float %a) #0 { entry: %div = fdiv fast float 1.00e+00, %d %cmp = fcmp fast oge float %div, 0.00e+00 br i1 %cmp, label %if.then, label %if.else if.then: ; preds = %entry %sub = fsub fast float %min, %a %mul = fmul fast float %sub, %div %sub1 = fsub fast float %max, %a %mul2 = fmul fast float %sub1, %div br label %if.end if.else: ; preds = %entry %sub3 = fsub fast float %max, %a %mul4 = fmul fast float %sub3, %div %sub5 = fsub fast float %min, %a %mul6 = fmul fast float %sub5, %div br label %if.end if.end: ; preds = %if.else, %if.then %tmax.0 = phi float [ %mul2, %if.then ], [ %mul6, %if.else ] %tmin.0 = phi float [ %mul, %if.then ], [ %mul4, %if.else ] %add = fadd fast float %tmax.0, %tmin.0 ret float %add } *** IR Dump After Early GVN Hoisting of Expressions *** ; Function Attrs: nounwind uwtable define float @_Z5foo_p(float %d, float %min, float %max, float %a) #0 { entry: %div = fdiv fast float 1.00e+00, %d %cmp = fcmp fast oge float %div, 0.00e+00 %sub = fsub fast float %min, %a %mul = fmul fast float %sub, %div %sub1 = fsub fast float %max, %a %mul2 = fmul fast float %sub1, %div br i1 %cmp, label %if.then, label %if.else if.then: ; preds = %entry br label %if.end if.else: ; preds = %entry br label %if.end if.end: ; preds = %if.else, %if.then %tmax.0 = phi float [ %mul2, %if.then ], [ %mul, %if.else ] %tmin.0 = phi float [ %mul, %if.then ], [ %mul2, %if.else ] %add = fadd fast float %tmax.0, %tmin.0 ret float %add }
[Bug tree-optimization/100004] New: Dead write not removed when indirection is introduced.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14 Bug ID: 14 Summary: Dead write not removed when indirection is introduced. Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- struct Foo { int x; }; struct Bar { int x; }; void alias(Foo* foo, Bar* bar) { foo->x = 5; foo->x = bar->x; } struct Wrap1 { Foo foo; }; struct Wrap2 { Foo foo; }; void assign_direct(Wrap1* w1, Wrap2* w2) { w1->foo.x = 5; w1->foo.x = w2->foo.x; } void assign_via_pointer(Wrap1* w1, Wrap2* w2) { Foo* f1 = &w1->foo; Foo* f2 = &w2->foo; f1->x = 5; f1->x = f2->x; } $ gcc-arm64 -O2 -std=c++17 -fstrict-aliasing -S -o - alias(Foo*, Bar*): ldr w1, [x1] str w1, [x0] ret assign_direct(Wrap1*, Wrap2*): ldr w1, [x1] str w1, [x0] ret assign_via_pointer(Wrap1*, Wrap2*): mov w2, 5 str w2, [x0] ldr w1, [x1] str w1, [x0] ret
[Bug tree-optimization/100004] Dead write not removed when indirection is introduced.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14 --- Comment #1 from AK --- godbolt link: https://gcc.godbolt.org/z/f7Y6G1svf
[Bug tree-optimization/98497] New: [Potential Perf regression] jne to hot branch instead je to cold
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98497 Bug ID: 98497 Summary: [Potential Perf regression] jne to hot branch instead je to cold Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- In the following code generated by gcc 10.2 ``` .L2: movups xmm3, XMMWORD PTR [rax] add rax, 16 addps xmm0, xmm3 cmp rax, rdx je .L6 jmp .L2 matrix_sum_column_major.cold: .L6: movaps xmm2, xmm0 # . ``` I think `jne .L2; jmp.L6` should be more efficient as it avoids one instruction in the hot path. c code: ``` float matrix_sum_column_major(float* x, int n) { n = 32767; float sum = 0; for (int i = 0; i < n; i++) for (int j = 0; j < n; j++) sum += x[j * n + i]; return sum; } ``` gcc -Ofast -floop-nest-optimize -o - ``` matrix_sum_column_major: mov eax, 4294836212 lea rdx, [rdi+131056] pxorxmm1, xmm1 lea rcx, [rdi+rax] .L3: mov rax, rdi pxorxmm0, xmm0 .L2: movups xmm3, XMMWORD PTR [rax] add rax, 16 addps xmm0, xmm3 cmp rax, rdx je .L6 jmp .L2 matrix_sum_column_major.cold: .L6: movaps xmm2, xmm0 addss xmm1, DWORD PTR [rax+8] lea rdx, [rax+131068] add rdi, 131068 movhlps xmm2, xmm0 addps xmm2, xmm0 movaps xmm0, xmm2 shufps xmm0, xmm2, 85 addps xmm0, xmm2 movss xmm2, DWORD PTR [rax+4] addss xmm2, DWORD PTR [rax] addss xmm1, xmm2 addss xmm1, xmm0 cmp rdx, rcx jne .L3 movaps xmm0, xmm1 ret ``` Link to godbolt: https://gcc.godbolt.org/z/ac7YY1
[Bug libstdc++/59048] operator== between std::string and const char* slower than strcmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59048 AK changed: What|Removed |Added CC||hiraditya at msn dot com --- Comment #17 from AK --- Now that we have string_view, will it be possible to avoid creating a copy?
[Bug tree-optimization/101116] New: missed peephole optimization not of bitwise and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101116 Bug ID: 101116 Summary: missed peephole optimization not of bitwise and Product: gcc Version: 11.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- $ cat test.c bool foo(unsigned i) { return !(i & 1); } gcc -O2 test.c -S -o- foo(unsigned int): mov eax, edi not eax and eax, 1 ret clang -O2 test.c -S -o- foo(unsigned int): # @foo(unsigned int) testb $1, %dil sete %al retq Ref: https://godbolt.org/z/Tndb1dM8Y
[Bug c++/101138] New: Ambiguous code (with operator==) compiled without error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101138 Bug ID: 101138 Summary: Ambiguous code (with operator==) compiled without error Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- $ cat test.cpp #include using namespace std; template struct D { template bool operator==(Y a) const { cout << "f" < bool operator==(T a, D b) { cout << "fD" < a, b; if (a == b) return 0; return 1; } gcc compiles this code fine, bug clang errors out. https://godbolt.org/z/c13EExxeY
[Bug c/108915] New: invalid pointer access preserved in optimized code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108915 Bug ID: 108915 Summary: invalid pointer access preserved in optimized code Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- Testcase has been reduced from u-boot's linker-list macro: https://github.com/u-boot/u-boot/blob/master/include/linker_lists.h#L127 #include char* bar() { static char start_bar[0] __attribute__((aligned(16))) __attribute__((unused)) __attribute__((section("__u_boot_list_2_1"))); char *p = (char *)start_bar; for (int i = p[0]; i < p[9]; i++) printf("asdfasd"); return 0; } $ gcc -O3 -fno-unroll-loops -S -o - .LC0: .string "asdfasd" bar: pushrbx movsx eax, BYTE PTR start_bar.1[rip+9] movsx ebx, BYTE PTR start_bar.1[rip] cmp ebx, eax jge .L2 .L3: mov edi, OFFSET FLAT:.LC0 xor eax, eax add ebx, 1 callprintf movsx eax, BYTE PTR start_bar.1[rip+9] cmp eax, ebx jg .L3 .L2: xor eax, eax pop rbx ret - $ clang -O3 -fno-unroll-loops -S -o - bar:# @bar xor eax, eax ret
[Bug tree-optimization/108915] invalid pointer access preserved in optimized code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108915 AK changed: What|Removed |Added Resolution|INVALID |FIXED --- Comment #4 from AK --- Adding `__attribute__((used))` also fixed it. Does it reflect the same behavior as using `asm` as you suggested?
[Bug c++/109017] ICE on unexpanded pack from C++20 explicit-template-parameter lambda syntax
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109017 AK changed: What|Removed |Added CC||hiraditya at msn dot com --- Comment #1 from AK --- Example from twitter: https://twitter.com/seanbax/status/1631689332007337985 which had discussion on similar bug. ``` template struct outer1_t { void g() { // Compiles for mysterious reasons. int array[] { [](){ int i = Is2; return i; }.template operator()() ... }; } }; int main() { // Compiles OKAY when this is commented out. // ICEs when it's compiled. outer1_t<1, 5, 10>().g(); } ``` clang issues a compiler error: https://godbolt.org/z/7f6E55svM ``` :6:15: error: initializer contains unexpanded parameter pack 'Is2' int i = Is2; ```
[Bug other/92396] -ftime-trace support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92396 AK changed: What|Removed |Added CC||hiraditya at msn dot com --- Comment #12 from AK --- I was building a giant file that takes around 100 minutes. The -ftime-report gave nothing useful to find out hotspots. It is also not clear what we are reporting here as there is no documentation for it in man gcc. The %ages don't add up to 100 and that makes it confusing. I'm wondering if making this task a GSoC project will get more attention?
[Bug libstdc++/78717] no definition of string::find when lowered to gimple
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78717 --- Comment #3 from AK --- Even with a high inline limit, string::find didn't inline. g++-11.0.2 -O3 -finline-limit=10 -S -o a.s s.cpp cat a.s ``` _Z3fooRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES6_i: .LFB1240: .cfi_startproc endbr64 pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 movq8(%rsi), %rcx movslq %edx, %rbx xorl%edx, %edx movq(%rsi), %rsi call _ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE4findEPKcmm@PLT cmpq%rax, %rbx popq%rbx .cfi_def_cfa_offset 8 sete%al movzbl %al, %eax ret ```
[Bug rtl-optimization/82889] Unnecessary sign extension of int32 to int64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82889 --- Comment #4 from AK --- Seems like clang doesn't sign extend. $ clang -O3 -std=c++14 -g0 ``` .text .intel_syntax noprefix .file "example.cpp" .globl lol(int*, int*, unsigned int, unsigned int) # -- Begin function lol(int*, int*, unsigned int, unsigned int) .p2align4, 0x90 .type lol(int*, int*, unsigned int, unsigned int),@function lol(int*, int*, unsigned int, unsigned int): # @lol(int*, int*, unsigned int, unsigned int) .cfi_startproc # %bb.0: # kill: def $edx killed $edx def $rdx and edx, ecx mov r8d, ecx mov ecx, 1 jmp .LBB0_1 .p2align4, 0x90 .LBB0_4:# in Loop: Header=BB0_1 Depth=1 testal, 1 jne .LBB0_5 .LBB0_7:# in Loop: Header=BB0_1 Depth=1 add edx, ecx and edx, r8d inc rcx .LBB0_1:# =>This Inner Loop Header: Depth=1 mov eax, dword ptr [rsi + 4*rdx] testeax, eax js .LBB0_4 # %bb.2:# in Loop: Header=BB0_1 Depth=1 cmp dword ptr [rdi + 4*rax], 42 jne .LBB0_7 # %bb.3: mov eax, 1 ret .LBB0_5: xor eax, eax ret .Lfunc_end0: .size lol(int*, int*, unsigned int, unsigned int), .Lfunc_end0-lol(int*, int*, unsigned int, unsigned int) .cfi_endproc # -- End function .ident "clang version 16.0.0 (https://github.com/llvm/llvm-project.git 5e22ef3198d1686f7978dd150a3eefad4f737bfc)" .section".note.GNU-stack","",@progbits .addrsig ``` $ gcc -O3 -std=c++14 -g0 ``` lol(int*, int*, unsigned int, unsigned int): and edx, ecx mov r8d, 1 mov ecx, ecx jmp .L5 .L10: cmp DWORD PTR [rdi+rax*4], 42 je .L9 .L4: add rdx, r8 add r8, 1 and rdx, rcx .L5: movsx rax, DWORD PTR [rsi+rdx*4] <--- sign extend testeax, eax jns .L10 testal, 1 je .L4 xor eax, eax ret .L9: mov eax, 1 ret ```
[Bug rtl-optimization/82889] Unnecessary sign extension of int32 to int64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82889 --- Comment #5 from AK --- Link to compiler explorer: https://godbolt.org/z/dGYG4dG15
[Bug c++/87628] Redundant check of pointer when operator delete is called
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87628 --- Comment #3 from AK --- Still happening with gcc trunk. https://godbolt.org/z/5K94665GK
[Bug c++/106991] New: new+delete pair not optimized by g++ at -O3 but optimized at -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106991 Bug ID: 106991 Summary: new+delete pair not optimized by g++ at -O3 but optimized at -Os Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- https://godbolt.org/z/PeYcoqTKn --- #include #include int volatile gv = 0; void* operator new(long unsigned sz ) { ++gv; return malloc( sz ); } void operator delete(void *p) noexcept { --gv; free(p); } class c { int l; public: c() : l(0) {} int get(){ return l; } }; int caller( void ){ c *f = new c(); assert( f->get() == 0 ); delete f; return gv; } --- $ g++ -std=c++20 -O3 operator new(unsigned long): mov eax, DWORD PTR gv[rip] add eax, 1 mov DWORD PTR gv[rip], eax jmp malloc operator delete(void*): mov eax, DWORD PTR gv[rip] sub eax, 1 mov DWORD PTR gv[rip], eax jmp free caller(): sub rsp, 8 mov eax, DWORD PTR gv[rip] mov edi, 4 add eax, 1 mov DWORD PTR gv[rip], eax callmalloc mov esi, 4 mov rdi, rax calloperator delete(void*, unsigned long) mov eax, DWORD PTR gv[rip] add rsp, 8 ret gv: .zero 4 --- $ g++ -std=c++20 -Os operator new(unsigned long): mov eax, DWORD PTR gv[rip] inc eax mov DWORD PTR gv[rip], eax jmp malloc operator delete(void*): mov eax, DWORD PTR gv[rip] dec eax mov DWORD PTR gv[rip], eax jmp free caller(): mov eax, DWORD PTR gv[rip] ret gv: .zero 4
[Bug c++/87628] Redundant check of pointer when operator delete is called
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87628 --- Comment #4 from AK --- Seems like clang now added the check: $ clang++ -Oz -fno-exceptions if_delete(char*): # @if_delete(char*) testrdi, rdi jne operator delete(void*)@PLT # TAILCALL ret
[Bug ipa/106991] new+delete pair not optimized by g++ at -O3 but optimized at -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106991 --- Comment #3 from AK --- Thanks for identifying the underlying issue @Jan After modifying the definition of operator delete. gcc does optimize it at -O3 as well. https://godbolt.org/z/1WPqaWrEr // source code #include #include int volatile gv = 0; void* operator new(long unsigned sz ) { ++gv; return malloc( sz ); } void operator delete(void *p, unsigned long) noexcept { --gv; free(p); } class c { int l; public: c() : l(0) {} int get(){ return l; } }; int caller( void ){ c *f = new c(); assert( f->get() == 0 ); delete f; return gv; } $ $ g++ -std=c++20 -O3 ``` operator new(unsigned long): mov eax, DWORD PTR gv[rip] add eax, 1 mov DWORD PTR gv[rip], eax jmp malloc operator delete(void*, unsigned long): mov eax, DWORD PTR gv[rip] sub eax, 1 mov DWORD PTR gv[rip], eax jmp free caller(): mov eax, DWORD PTR gv[rip] add eax, 1 mov DWORD PTR gv[rip], eax mov eax, DWORD PTR gv[rip] sub eax, 1 mov DWORD PTR gv[rip], eax mov eax, DWORD PTR gv[rip] ret gv: .zero 4 ```
[Bug tree-optimization/107005] New: gcc not exploiting undefined behavior to optimize away the result of division
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107005 Bug ID: 107005 Summary: gcc not exploiting undefined behavior to optimize away the result of division Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- #include int main() { return INT_MIN / -1; } gcc -O2 main: mov eax, -2147483648 ret clang -O2 main: # @main ret https://godbolt.org/z/Tjxx3KGdK
[Bug tree-optimization/95565] [Feature request] add a flag to only instrument function entry.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95565 --- Comment #2 from AK --- clang has `-finstrument-function-entry-bare` to this effect: https://reviews.llvm.org/D40276
[Bug tree-optimization/107011] New: instruction with undefined behavior not optimized away
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107011 Bug ID: 107011 Summary: instruction with undefined behavior not optimized away Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- #include int main() { return INT_MIN / -1; } $ gcc -O3 main: mov eax, -2147483648 ret $ clang -O3 main: # @main ret https://godbolt.org/z/393EMqs1E PS: I reported this bug yesterday as well but for some reason it does not appear in bugzilla so I'm creating another one.
[Bug tree-optimization/107011] instruction with undefined behavior not optimized away
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107011 --- Comment #2 from AK --- ah ok. sorry for the noise.
[Bug rtl-optimization/107063] New: [X86_64 codegen] Using inc eax instead of inc dword ptr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107063 Bug ID: 107063 Summary: [X86_64 codegen] Using inc eax instead of inc dword ptr Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- int volatile gv = 0; void foo() { ++gv; } $ gcc -Os foo(): mov eax, DWORD PTR gv[rip] inc eax mov DWORD PTR gv[rip], eax ret gv: .zero 4 $ clang -Os foo():# @foo() inc dword ptr [rip + gv] ret gv: .long 0 https://godbolt.org/z/vzq4jr5vj
[Bug tree-optimization/85611] Suboptimal code generation for (potentially) redundant atomic loads
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85611 AK changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |INVALID --- Comment #2 from AK --- Don't remember what I was expecting.
[Bug c++/107335] New: call to throw_bad_cast even with -fno-exceptions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107335 Bug ID: 107335 Summary: call to throw_bad_cast even with -fno-exceptions Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- Testcase: #include void foo() { std::cout << std::endl; } $ g++ -std=c++17 -O3 -fno-exceptions ```asm foo(): mov rax, QWORD PTR std::cout[rip] pushrbx mov rax, QWORD PTR [rax-24] mov rbx, QWORD PTR std::cout[rax+240] testrbx, rbx je .L10 cmp BYTE PTR [rbx+56], 0 je .L5 movsx esi, BYTE PTR [rbx+67] .L6: mov edi, OFFSET FLAT:std::cout callstd::basic_ostream >::put(char) pop rbx mov rdi, rax jmp std::basic_ostream >::flush() .L5: mov rdi, rbx callstd::ctype::_M_widen_init() const mov rax, QWORD PTR [rbx] mov esi, 10 mov rax, QWORD PTR [rax+48] cmp rax, OFFSET FLAT:_ZNKSt5ctypeIcE8do_widenEc je .L6 mov rdi, rbx callrax movsx esi, al jmp .L6 .L10: callstd::__throw_bad_cast() <--- call to __throw_bad_cast _GLOBAL__sub_I_foo(): sub rsp, 8 mov edi, OFFSET FLAT:_ZStL8__ioinit callstd::ios_base::Init::Init() [complete object constructor] mov edx, OFFSET FLAT:__dso_handle mov esi, OFFSET FLAT:_ZStL8__ioinit mov edi, OFFSET FLAT:_ZNSt8ios_base4InitD1Ev add rsp, 8 jmp __cxa_atexit ```
[Bug libstdc++/107335] call to throw_bad_cast even with -fno-exceptions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107335 --- Comment #4 from AK --- I wasn't sure if this is expected. Thanks for clarifying.
[Bug libstdc++/107335] call to throw_bad_cast even with -fno-exceptions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107335 --- Comment #5 from AK --- Is this the definition of throw_bad_cast? https://github.com/gcc-mirror/gcc/blob/16e2427f50c208dfe07d07f18009969502c25dc8/gcc/cp/rtti.c#L221
[Bug libstdc++/107335] call to throw_bad_cast even with -fno-exceptions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107335 AK changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |INVALID --- Comment #7 from AK --- not a bug
[Bug c++/105796] New: error: no matching function for call with template function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105796 Bug ID: 105796 Summary: error: no matching function for call with template function Product: gcc Version: 12.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- test.cpp ``` int func(int, char); template int testFunc(int (*)(TArgs..., char)); int x = testFunc(func); ``` With gcc trunk: g++ -std=c++20 test.cpp -c :6:22: error: no matching function for call to 'testFunc(int (&)(int, char))' 6 | int x = testFunc(func); | ~^~ :4:5: note: candidate: 'template int testFunc(int (*)(TArgs ..., char))' 4 | int testFunc(int (*)(TArgs..., char)); | ^~~~ :4:5: note: template argument deduction/substitution failed: :6:22: note: mismatched types 'char' and 'int' 6 | int x = testFunc(func); | ~^~ Compiler returned: 1
[Bug tree-optimization/105830] New: call to memcpy when -nostdlib -nodefaultlibs flags provided
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105830 Bug ID: 105830 Summary: call to memcpy when -nostdlib -nodefaultlibs flags provided Product: gcc Version: 12.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- https://godbolt.org/z/jTEa6ajn3 ``` // test.c // Type your code here, or load an example. /* Nonzero if either X or Y is not aligned on a "long" boundary. */ #define UNALIGNED(X, Y) \ (((unsigned long)X & (sizeof (unsigned long) - 1)) | ((unsigned long)Y & (sizeof (unsigned long) - 1))) #define UNALIGNED1(a) \ ((unsigned long)(a) & (sizeof(unsigned long)-1)) /* How many bytes are copied each iteration of the 4X unrolled loop. */ #define BIGBLOCKSIZE(sizeof (unsigned long) * 4) /* How many bytes are copied each iteration of the word copy loop. */ #define LITTLEBLOCKSIZE (sizeof (unsigned long)) /* Threshhold for punting to the byte copier. */ #define TOO_SMALL(LEN) ((LEN) < BIGBLOCKSIZE) void * memcpy (void *__restrict dst0, const void *__restrict src0, unsigned long len0) { unsigned char *dst = dst0; const unsigned char *src = src0; /* If the size is small, or either SRC or DST is unaligned, then punt into the byte copy loop. This should be rare. */ if (len0 >= LITTLEBLOCKSIZE && !UNALIGNED (src, dst)) { unsigned long *aligned_dst; const unsigned long *aligned_src; aligned_dst = (unsigned long*)dst; aligned_src = (const unsigned long*)src; /* Copy one long word at a time if possible. */ /* Copy one long word at a time if possible. */ do { *aligned_dst++ = *aligned_src++; len0 -= LITTLEBLOCKSIZE; } while (len0 >= LITTLEBLOCKSIZE); /* Pick up any residual with a byte copier. */ dst = (unsigned char*)aligned_dst; src = (const unsigned char*)aligned_src; } for (; len0; len0--) *dst++ = *src++; return dst0; } // ARM gcc trunk gcc -O3 -nostdlib -nodefaultlibs -S -o - memcpy: push{r3, r4, r5, r6, r7, lr} cmp r2, #3 mov r4, r2 mov r5, r0 mov r6, r1 bls .L5 orr r3, r0, r1 lslsr3, r3, #30 beq .L9 .L3: mov r2, r4 mov r1, r6 bl memcpy ; <- call to memcpy mov r0, r5 pop {r3, r4, r5, r6, r7, pc} .L9: subsr7, r2, #4 and r4, r2, #3 bic r7, r7, #3 addsr7, r7, #4 mov r2, r7 add r6, r6, r7 bl memcpy ; <- call to memcpy addsr0, r5, r7 .L5: cmp r4, #0 bne .L3 mov r0, r5 pop {r3, r4, r5, r6, r7, pc}
[Bug tree-optimization/105830] call to memcpy when -nostdlib -nodefaultlibs flags provided
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105830 --- Comment #3 from AK --- with -ffreestanding the calls to memcpy did disappear. Thanks.
[Bug libstdc++/80331] unused const std::string not optimized away
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80331 --- Comment #9 from AK --- can't repro this with gcc 12.1 Seems like this is fixed? https://godbolt.org/z/e6n94zK4E
[Bug c++/114342] New: suboptimal codegen of vector::vector(range)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114342 Bug ID: 114342 Summary: suboptimal codegen of vector::vector(range) Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- #include #include std::vector td() { int arr[]{-5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15,-5, 10, 15 -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10,-5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10,-5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10,-5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10,-5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10,}; auto b = std::ranges::begin(arr); auto e = std::ranges::end(arr); std::vector dd(b, e); return dd; } What is the reason for calling `rep movsq` twice? $ gcc -O3 -std=c++23 ``` td(): pushrbp mov esi, OFFSET FLAT:.LC0 mov ecx, 55 pxorxmm0, xmm0 pushrbx mov rbx, rdi sub rsp, 456 mov QWORD PTR [rbx+16], 0 mov rbp, rsp movups XMMWORD PTR [rbx], xmm0 mov rdi, rbp rep movsq mov eax, DWORD PTR [rsi] mov DWORD PTR [rdi], eax mov edi, 444 calloperator new(unsigned long) lea rdx, [rax+444] mov QWORD PTR [rbx], rax lea rdi, [rax+8] mov rsi, rbp mov QWORD PTR [rbx+16], rdx mov rcx, QWORD PTR [rsp] and rdi, -8 mov QWORD PTR [rax], rcx mov rcx, QWORD PTR [rsp+436] mov QWORD PTR [rax+436], rcx sub rax, rdi sub rsi, rax add eax, 444 shr eax, 3 mov ecx, eax mov rax, rbx rep movsq mov QWORD PTR [rbx+8], rdx add rsp, 456 pop rbx pop rbp ret mov rbp, rax jmp .L2 td() [clone .cold]: .L2: mov rdi, QWORD PTR [rbx] mov rsi, QWORD PTR [rbx+16] sub rsi, rdi testrdi, rdi je .L3 calloperator delete(void*, unsigned long) .L3: mov rdi, rbp call_Unwind_Resume ``` https://godbolt.org/z/5333db8Px
[Bug middle-end/114342] suboptimal codegen of vector::vector(range)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114342 AK changed: What|Removed |Added Resolution|--- |DUPLICATE Version|unknown |14.0 Status|NEW |RESOLVED --- Comment #3 from AK --- I see. marking as duplicate. Thanks for clarifying! *** This bug has been marked as a duplicate of bug 59863 ***
[Bug middle-end/59863] const array in function is placed on stack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59863 AK changed: What|Removed |Added CC||hiraditya at msn dot com --- Comment #9 from AK --- *** Bug 114342 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/107263] Memcpy not elided when initializing struct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107263 AK changed: What|Removed |Added CC||hiraditya at msn dot com --- Comment #3 from AK --- Seems like a duplicate of #59863 ?
[Bug c++/110819] New: Missed optimization: when vector size is 0 but vector::reserve has been called.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110819 Bug ID: 110819 Summary: Missed optimization: when vector size is 0 but vector::reserve has been called. Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- #include void f(int); void use_idx_const_size_reserve() { std::vector v; v.reserve(10); auto s = v.size(); for (std::vector::size_type i = 0; i < s; i++) f(v[i]); } $ g++ -O3 use_idx_const_size_reserve(): sub rsp, 8 mov edi, 40 calloperator new(unsigned long) mov esi, 40 add rsp, 8 mov rdi, rax jmp operator delete(void*, unsigned long) $ clang++ -O3 -stdlib=libc++ use_idx_const_size_reserve():# @use_idx_const_size_reserve() ret
[Bug tree-optimization/110819] Missed optimization: when vector's size is 0 but vector::reserve has been called.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110819 --- Comment #2 from AK --- > When compiled with clang, libstdc++'s std::vector uses __builtin_operator_new > which always has the -fassume-sane-operator-new semantics, and so can be > optimized. yes clang optimizes with libstdc++ as well. what can be done in gcc for it to detect that the new+delete pair can be optimized away?
[Bug c++/110137] implement clang -fassume-sane-operator-new
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110137 --- Comment #3 from AK --- 1. clang also has noalias on nothrow versions of operator new. will `-fassume-sane-operator-new` enable that as well? 2. as per: http://eel.is/c++draft/basic.stc.dynamic#allocation-2 """If the request succeeds, the value returned by a replaceable allocation function is a non-null pointer value ([basic.compound]) p0 different from any previously returned value p1, unless that value p1 was subsequently passed to a replaceable deallocation function.""" Does this mean that all successful new allocations can be assumed to be a noalias as long as the pointer wasn't passed to a deallocation function? In that case when possible, can the compiler `infer` from a bottom-up analysis that an allocation is a noalias?
[Bug c++/110909] New: Suboptimal codegen in vector copy assignment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110909 Bug ID: 110909 Summary: Suboptimal codegen in vector copy assignment Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- #include using Container = std::vector; int copy_assignment(const Container &v1, Container &v2) { v2 = v1; return 0; } I'd expect this to only generate a memcpy. but i'm not sure why memmoves are generated? $ gcc -std=c++2a -O3 -fno-exceptions copy_assignment(std::vector > const&, std::vector >&): cmp rsi, rdi je .L21 pushr13 pushr12 pushrbp mov rbp, rdi pushrbx mov rbx, rsi sub rsp, 8 mov rax, QWORD PTR [rdi+8] mov r13, QWORD PTR [rdi] mov rdx, QWORD PTR [rsi+16] mov rdi, QWORD PTR [rsi] mov r12, rax sub r12, r13 sub rdx, rdi cmp rdx, r12 jb .L25 mov rcx, QWORD PTR [rsi+8] mov rdx, rcx sub rdx, rdi cmp rdx, r12 jnb .L26 cmp rdx, 4 jle .L12 mov rsi, r13 callmemmove mov rcx, QWORD PTR [rbx+8] mov rdi, QWORD PTR [rbx] mov rax, QWORD PTR [rbp+8] mov r13, QWORD PTR [rbp+0] mov rdx, rcx sub rdx, rdi .L13: lea rsi, [r13+0+rdx] sub rax, rsi mov rdx, rax cmp rax, 4 jle .L14 mov rdi, rcx callmemmove mov rax, QWORD PTR [rbx] add rax, r12 .L8: mov QWORD PTR [rbx+8], rax add rsp, 8 xor eax, eax pop rbx pop rbp pop r12 pop r13 ret .L21: xor eax, eax ret .L25: movabs rax, 9223372036854775804 cmp rax, r12 jb .L27 mov rdi, r12 calloperator new(unsigned long) mov rbp, rax cmp r12, 4 jle .L5 mov rdx, r12 mov rsi, r13 mov rdi, rax callmemcpy .L6: mov rdi, QWORD PTR [rbx] testrdi, rdi je .L7 mov rsi, QWORD PTR [rbx+16] sub rsi, rdi calloperator delete(void*, unsigned long) .L7: lea rax, [rbp+0+r12] mov QWORD PTR [rbx], rbp mov QWORD PTR [rbx+16], rax jmp .L8 .L26: cmp r12, 4 jle .L10 mov rdx, r12 mov rsi, r13 callmemmove mov rax, QWORD PTR [rbx] add rax, r12 jmp .L8 .L14: lea rax, [rdi+r12] jne .L8 mov edx, DWORD PTR [rsi] mov DWORD PTR [rcx], edx jmp .L8 .L12: jne .L13 mov esi, DWORD PTR [r13+0] mov DWORD PTR [rdi], esi jmp .L13 .L10: lea rax, [rdi+r12] jne .L8 mov edx, DWORD PTR [r13+0] mov DWORD PTR [rdi], edx jmp .L8 .L5: mov eax, DWORD PTR [r13+0] mov DWORD PTR [rbp+0], eax jmp .L6 .L27: callstd::__throw_bad_array_new_length() Ideally, the above C++ code should translate to an equivalent of the following C++ code: using Container = std::vector; int copy_assignment(const Container &v1, Container &v2) { v2.reserve(v1.size()); std::memcpy(&v2[0], &v1[0], v1.size()*sizeof(int)); // change the size: v2.size() = v1.size() return 0; }
[Bug tree-optimization/111393] New: ICE: Segmentation fault src/gcc/toplev.cc:314
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111393 Bug ID: 111393 Summary: ICE: Segmentation fault src/gcc/toplev.cc:314 Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- riscv64-gnu-linux (version Debian 13.1) building llvm-project (GlobalModuleIndex.cpp) crashed with ICE. src/gcc/toplev.cc:314 profile_count::operator==(proile_count const&) const ../../src/gcc/profile-count.h:865 profile_count::apply_probability(proile_probability) const ../../src/gcc/profile-count.h:1104
[Bug tree-optimization/111393] ICE: Segmentation fault src/gcc/toplev.cc:314
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111393 --- Comment #1 from AK --- oot/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/build# ninja clang check-clang [100/845] Building CXX object tools/clang/lib/Serialization/CMakeFiles/obj.clangSerialization.dir/GlobalModuleIndex.cpp.o FAILED: tools/clang/lib/Serialization/CMakeFiles/obj.clangSerialization.dir/GlobalModuleIndex.cpp.o /usr/bin/c++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D_LIBCPP_ENABLE_HARDENED_MODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/build/tools/clang/lib/Serialization -I/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/clang/lib/Serialization -I/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/clang/include -I/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/build/tools/clang/include -I/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/build/include -I/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -fno-lifetime-dse -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -fno-strict-aliasing -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -std=c++17 -MD -MT tools/clang/lib/Serialization/CMakeFiles/obj.clangSerialization.dir/GlobalModuleIndex.cpp.o -MF tools/clang/lib/Serialization/CMakeFiles/obj.clangSerialization.dir/GlobalModuleIndex.cpp.o.d -o tools/clang/lib/Serialization/CMakeFiles/obj.clangSerialization.dir/GlobalModuleIndex.cpp.o -c /media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/clang/lib/Serialization/GlobalModuleIndex.cpp In file included from /media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/llvm/include/llvm/ADT/DenseMapInfo.h:20, from /media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/llvm/include/llvm/ADT/DenseMap.h:17, from /media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/clang/include/clang/Serialization/GlobalModuleIndex.h:18, from /media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/clang/lib/Serialization/GlobalModuleIndex.cpp:13: /usr/include/c++/13/tuple: In instantiation of ‘struct std::_Tuple_impl<0, clang::ModuleFileExtensionReader*, std::default_delete >’: /usr/include/c++/13/tuple:1232:11: required from ‘class std::tuple >’ /usr/include/c++/13/bits/unique_ptr.h:232:27: required from ‘class std::__uniq_ptr_impl >’ /usr/include/c++/13/bits/unique_ptr.h:239:12: required from ‘struct std::__uniq_ptr_data, true, true>’ /usr/include/c++/13/bits/unique_ptr.h:283:33: required from ‘class std::unique_ptr’ /usr/include/c++/13/bits/stl_vector.h:367:35: required from ‘std::_Vector_base<_Tp, _Alloc>::~_Vector_base() [with _Tp = std::unique_ptr; _Alloc = std::allocator >]’ /usr/include/c++/13/bits/stl_vector.h:528:7: required from here /usr/include/c++/13/tuple:269:7: internal compiler error: Segmentation fault 269 | _M_head(_Tuple_impl& __t) noexcept { return _Base::_M_head(__t); } | ^~~ 0x85d7c5 crash_signal ../../src/gcc/toplev.cc:314 0xa0d5e0 profile_count::operator==(profile_count const&) const ../../src/gcc/profile-count.h:865 0xa0d5e0 profile_count::apply_probability(profile_probability) const ../../src/gcc/profile-count.h:1104 0xa0d5e0 edge_def::count() const ../../src/gcc/basic-block.h:639 0xa0d5e0 eliminate_tail_call ../../src/gcc/tree-tailcall.cc:982 0xa0d5e0 optimize_tail_call ../../src/gcc/tree-tailcall.cc:1053 0xa0d5e0 tree_optimize_tail_calls_1 ../../src/gcc/tree-tailcall.cc:1193 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See for instructions.
[Bug tree-optimization/111393] ICE: Segmentation fault src/gcc/toplev.cc:314
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111393 --- Comment #3 from AK --- gcc -v COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/riscv64-linux-gnu/13/lto-wrapper Target: riscv64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 13.1.0-6' --with-bugurl=file:///usr/share/doc/gcc-13/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --prefix=/usr --with-gcc-major-version-only --program-suffix=-13 --program-prefix=riscv64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --disable-multilib --with-arch=rv64gc --with-abi=lp64d --enable-checking=release --build=riscv64-linux-gnu --host=riscv64-linux-gnu --target=riscv64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=32 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 13.1.0 (Debian 13.1.0-6) root@lpi4a:/media/root/d2fc9f48-c166-4a9
[Bug tree-optimization/111393] ICE: Segmentation fault src/gcc/toplev.cc:314
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111393 --- Comment #5 from AK --- Created attachment 55890 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55890&action=edit GlobalModuleIndex.cpp preprocessed files Everytime the crash is in a different file. it could be just because of memory issues.
[Bug tree-optimization/111393] ICE: Segmentation fault src/gcc/toplev.cc:314
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111393 --- Comment #8 from AK --- > this does seem like a HW issue. Are you sure you have a decent RISCV machine > without any memory issues? > I suspect ninja is building with all of the cores which pushes the memory > usage high. possible. I have the https://sipeed.com/licheepi4a (licheepi 4a board) > Maybe lower the clock speed of the CPU you are using. will do. thanks
[Bug tree-optimization/111393] ICE: Segmentation fault src/gcc/toplev.cc:314
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111393 --- Comment #9 from AK --- i think it is okay to close this bug as this doesn't seem to be related to gcc.
[Bug c/111420] New: relocation truncated to fit: R_RISCV_JAL against `.L12287'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111420 Bug ID: 111420 Summary: relocation truncated to fit: R_RISCV_JAL against `.L12287' Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- CGBuiltin.cpp:(.text._ZN5clang7CodeGen15CodeGenFunction20EmitRISCVBuiltinExprEjPKNS_8CallExprENS0_15ReturnValueSlotE+0x10d0): relocation truncated to fit: R_RISCV_JAL against `.L12287' command: : && /usr/bin/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -fno-lifetime-dse -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -fno-strict-aliasing -Wl,-z,defs -Wl,-z,nodelete -Wl,-rpath-link,/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/build/./lib -Wl,--gc-sections -shared -Wl,-soname,libclangCodeGen.so.18git -o lib/libclangCodeGen.so.18git tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/ABIInfo.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/ABIInfoImpl.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/BackendUtil.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGAtomic.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGBlocks.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGBuiltin.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCUDANV.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCUDARuntime.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCXX.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCXXABI.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCall.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGClass.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCleanup.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCoroutine.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGDebugInfo.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGDecl.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGDeclCXX.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGException.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGExpr.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGExprAgg.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGExprCXX.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGExprComplex.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGExprConstant.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGExprScalar.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGGPUBuiltin.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGHLSLRuntime.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGLoopInfo.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGNonTrivialStruct.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGObjC.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGObjCGNU.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGObjCMac.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGObjCRuntime.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGOpenCLRuntime.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGOpenMPRuntime.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGOpenMPRuntimeGPU.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGRecordLayoutBuilder.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGStmt.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGStmtOpenMP.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGVTT.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGVTables.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenABITypes.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenAction.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenFunction.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenModule.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenPGO.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenTBAA.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenTypes.cpp.o tools/clang/lib/CodeGen/CMakeFiles/ob
[Bug c/111420] relocation truncated to fit: R_RISCV_JAL against `.L12287'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111420 --- Comment #1 from AK --- I got this error while building clang (ninja clang) on a riscv machine. root@lpi4a:~# gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/riscv64-linux-gnu/13/lto-wrapper Target: riscv64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 13.1.0-6' --with-bugurl=file:///usr/share/doc/gcc-13/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --prefix=/usr --with-gcc-major-version-only --program-suffix=-13 --program-prefix=riscv64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --disable-multilib --with-arch=rv64gc --with-abi=lp64d --enable-checking=release --build=riscv64-linux-gnu --host=riscv64-linux-gnu --target=riscv64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=32 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 13.1.0 (Debian 13.1.0-6) -- root@lpi4a:~# uname -a Linux lpi4a 5.10.113-g7b352f5ac2ba #1 SMP PREEMPT Wed Apr 12 12:06:11 UTC 2023 riscv64 GNU/Linux
[Bug target/111420] relocation truncated to fit: R_RISCV_JAL against `.L12287'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111420 --- Comment #4 from AK --- good catch. By mistake i built at -O0, i wanted to build at -O3.
[Bug target/111420] relocation truncated to fit: R_RISCV_JAL against `.L12287'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111420 AK changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |MOVED --- Comment #5 from AK --- Created: https://sourceware.org/bugzilla/show_bug.cgi?id=30855
[Bug target/111420] relocation truncated to fit: R_RISCV_JAL against `.L12287'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111420 --- Comment #6 from AK --- To confirm what Andrew mentioned, the release build (-O3) built successfully.
[Bug tree-optimization/108915] invalid pointer access preserved in optimized code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108915 --- Comment #6 from AK --- For reference, I had opened a related bug in clang: https://github.com/llvm/llvm-project/issues/60967
[Bug tree-optimization/109440] New: Missed optimization of vector::at when a function is called inside the loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109440 Bug ID: 109440 Summary: Missed optimization of vector::at when a function is called inside the loop Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- #include #include using namespace std; bool bar(); using T = int; T vat(std::vector v) { T s; for (auto i = 0; i < v.size(); ++i) { if (bar()) s += v.at(i); } return s; } $ gcc -O2 -fexceptions -fno-unroll-loops .LC0: .string "vector::_M_range_check: __n (which is %zu) >= this->size() (which is %zu)" vat(std::vector >): mov rax, QWORD PTR [rdi] cmp QWORD PTR [rdi+8], rax je .L9 pushr12 pushrbp mov rbp, rdi pushrbx xor ebx, ebx jmp .L6 .L14: mov rax, QWORD PTR [rbp+8] sub rax, QWORD PTR [rbp+0] add rbx, 1 sar rax, 2 cmp rbx, rax jnb .L13 .L6: callbar() testal, al je .L14 mov rcx, QWORD PTR [rbp+0] mov rdx, QWORD PTR [rbp+8] sub rdx, rcx sar rdx, 2 mov rax, rdx cmp rbx, rdx jnb .L15 add r12d, DWORD PTR [rcx+rbx*4] add rbx, 1 cmp rbx, rax jb .L6 .L13: mov eax, r12d pop rbx pop rbp pop r12 ret .L9: mov eax, r12d ret .L15: mov rsi, rbx mov edi, OFFSET FLAT:.LC0 xor eax, eax callstd::__throw_out_of_range_fmt(char const*, ...)
[Bug tree-optimization/109441] New: missed optimization when all elements of vector are known
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109441 Bug ID: 109441 Summary: missed optimization when all elements of vector are known Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- Reference: https://godbolt.org/z/af4x6zhz9 When all elements of vector are 0, then the compiler should be able to remove the loop and just return 0. Testcase: #include using namespace std; using T = int; T v() { T s; std::vector v; v.resize(1000, 0); for (auto i = 0; i < v.size(); ++i) { s += v[i]; } return s; } $ g++ -O3 -std=c++17 .LC0: .string "vector::_M_fill_insert" v(): push rbx pxor xmm0, xmm0 mov edx, 1000 xor esi, esi sub rsp, 48 lea rcx, [rsp+12] lea rdi, [rsp+16] mov QWORD PTR [rsp+32], 0 mov DWORD PTR [rsp+12], 0 movaps XMMWORD PTR [rsp+16], xmm0 call std::vector >::_M_fill_insert(__gnu_cxx::__normal_iterator > >, unsigned long, int const&) mov rdx, QWORD PTR [rsp+24] mov rdi, QWORD PTR [rsp+16] mov rax, rdx sub rax, rdi mov rsi, rax sar rsi, 2 cmp rdx, rdi je .L99 test rax, rax mov ecx, 1 cmovne rcx, rsi cmp rax, 12 jbe .L107 mov rdx, rcx pxor xmm0, xmm0 mov rax, rdi shr rdx, 2 sal rdx, 4 add rdx, rdi .L101: movdqu xmm2, XMMWORD PTR [rax] add rax, 16 paddd xmm0, xmm2 cmp rdx, rax jne .L101 movdqa xmm1, xmm0 psrldq xmm1, 8 paddd xmm0, xmm1 movdqa xmm1, xmm0 psrldq xmm1, 4 paddd xmm0, xmm1 movd ebx, xmm0 test cl, 3 je .L99 and rcx, -4 mov eax, ecx .L100: lea edx, [rax+1] add ebx, DWORD PTR [rdi+rcx*4] movsx rdx, edx cmp rdx, rsi jnb .L99 add eax, 2 lea rcx, [0+rdx*4] add ebx, DWORD PTR [rdi+rdx*4] cdqe cmp rax, rsi jnb .L99 add ebx, DWORD PTR [rdi+4+rcx] .L99: test rdi, rdi je .L98 mov rsi, QWORD PTR [rsp+32] sub rsi, rdi call operator delete(void*, unsigned long) .L98: add rsp, 48 mov eax, ebx pop rbx ret .L107: xor eax, eax xor ecx, ecx jmp .L100 mov rbx, rax jmp .L105 v() [clone .cold]:
[Bug tree-optimization/109441] missed optimization when all elements of vector are known
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109441 --- Comment #1 from AK --- I guess a better test case is this: #include using namespace std; using T = int; T v(std::vector v) { T s; std::fill(v.begin(), v.end(), T()); for (auto i = 0; i < v.size(); ++i) { s += v[i]; } return s; } which has similar effect. $ g++ -O3 -std=c++17 v(std::vector >): pushrbp pushrbx sub rsp, 8 mov rbp, QWORD PTR [rdi+8] mov rcx, QWORD PTR [rdi] cmp rcx, rbp je .L7 sub rbp, rcx mov rdi, rcx xor esi, esi mov rbx, rcx mov rdx, rbp callmemset mov rdi, rbp mov edx, 1 mov rcx, rbx sar rdi, 2 testrbp, rbp cmovne rdx, rdi cmp rbp, 12 jbe .L8 mov rax, rdx pxorxmm0, xmm0 shr rax, 2 sal rax, 4 add rax, rbx .L4: movdqu xmm2, XMMWORD PTR [rbx] add rbx, 16 paddd xmm0, xmm2 cmp rbx, rax jne .L4 movdqa xmm1, xmm0 psrldq xmm1, 8 paddd xmm0, xmm1 movdqa xmm1, xmm0 psrldq xmm1, 4 paddd xmm0, xmm1 movdeax, xmm0 testdl, 3 je .L1 and rdx, -4 mov esi, edx .L3: add eax, DWORD PTR [rcx+rdx*4] lea edx, [rsi+1] movsx rdx, edx cmp rdx, rdi jnb .L1 add esi, 2 lea r8, [0+rdx*4] add eax, DWORD PTR [rcx+rdx*4] movsx rsi, esi cmp rsi, rdi jnb .L1 add eax, DWORD PTR [rcx+4+r8] .L1: add rsp, 8 pop rbx pop rbp ret .L7: add rsp, 8 xor eax, eax pop rbx pop rbp ret .L8: xor eax, eax xor esi, esi xor edx, edx jmp .L3