[Bug target/98348] [10 Regression] GCC 10.2 AVX512 Mask regression from GCC 9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98348 Dávid Bolvanský changed: What|Removed |Added CC||david.bolvansky at gmail dot com --- Comment #20 from Dávid Bolvanský --- Some small regression (missed opportunity to use vptestnmd): Current trunk compare(unsigned int __vector(16)): vpxor xmm1, xmm1, xmm1 vpcmpd k0, zmm0, zmm1, 0 vpmovm2d zmm0, k0 ret GCC 9.2 compare(unsigned int __vector(16)): vptestnmd k0, zmm0, zmm0 vpmovm2d zmm0, k0 ret https://gcc.godbolt.org/z/5vK68jM3r
[Bug tree-optimization/99971] GCC generates partially vectorized and scalar code at once
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99971 Dávid Bolvanský changed: What|Removed |Added CC||david.bolvansky at gmail dot com --- Comment #7 from Dávid Bolvanský --- Still bad for -O3 -march=skylake-avx512 https://godbolt.org/z/azb8aTG43
[Bug c/98658] New: Loop idiom recognization for memcpy/memmove
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98658 Bug ID: 98658 Summary: Loop idiom recognization for memcpy/memmove Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: david.bolvansky at gmail dot com Target Milestone: --- void copy(int *__restrict__ d, int * s, __SIZE_TYPE__ sz) { __SIZE_TYPE__ i; for (i = 0; i < sz; i++) { *d++ = *s++; } } gcc emits call to memcpy. void copy(int * d, int * s, __SIZE_TYPE__ sz) { __SIZE_TYPE__ i; for (i = 0; i < sz; i++) { *d++ = *s++; } } gcc could emit memmove, but currently does not: https://godbolt.org/z/5n1rnh
[Bug c/98658] Loop idiom recognization for memcpy/memmove
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98658 --- Comment #1 from Dávid Bolvanský --- ICC produces memcpy: https://godbolt.org/z/oKxxTM
[Bug c/98658] Loop idiom recognization for memcpy/memmove
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98658 --- Comment #3 from Dávid Bolvanský --- Yes, runtime check.
[Bug other/98663] gcc generates endless loop at -O2 or greater depending on order of testExpression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98663 Dávid Bolvanský changed: What|Removed |Added CC||david.bolvansky at gmail dot com --- Comment #1 from Dávid Bolvanský --- Compiler can do anything if there is UB in the code.
[Bug c/98713] New: Failure to generate branch version of abs if user requested it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713 Bug ID: 98713 Summary: Failure to generate branch version of abs if user requested it Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: david.bolvansky at gmail dot com Target Milestone: --- int branch_abs(int v) { return __builtin_expect(v > 0, 1) ? v : -v; } GCC -O2 now: branch_abs: mov eax, edi neg eax cmovs eax, edi ret Expected: branch_abs: mov eax, edi test edi, edi js .LBB0_1 ret .LBB0_1: neg eax ret Same for min/max.
[Bug middle-end/98713] Failure to generate branch version of abs if user requested it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98713 --- Comment #5 from Dávid Bolvanský --- User knows the data better, so he/she may prefer abs with branch. Also PGO may say that branch for abs is better based on profile data.
[Bug c/100260] New: DSE: join stores
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100260 Bug ID: 100260 Summary: DSE: join stores Product: gcc Version: tree-ssa Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: david.bolvansky at gmail dot com Target Milestone: --- #include struct pam { void *p1; void *p2; #ifdef LONG unsigned long size; #else unsigned int pad; unsigned int size; #endif }; extern int use(struct pam *param); unsigned int foo(void) { struct pam s_pam; memset(&s_pam, 0, sizeof(struct pam)); s_pam.size = 1; return use(&s_pam); } INT foo(): sub rsp, 40 pxor xmm0, xmm0 mov rdi, rsp mov DWORD PTR [rsp+16], 0 mov DWORD PTR [rsp+20], 1 movaps XMMWORD PTR [rsp], xmm0 call use(pam*) add rsp, 40 ret LONG foo(): sub rsp, 40 pxor xmm0, xmm0 mov rdi, rsp movaps XMMWORD PTR [rsp], xmm0 mov QWORD PTR [rsp+16], 1 call use(pam*) add rsp, 40 ret Stores mov DWORD PTR [rsp+16], 0 mov DWORD PTR [rsp+20], 1 can be replaced with one mov QWORD..
[Bug c/108593] New: No inlining after function cloning
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108593 Bug ID: 108593 Summary: No inlining after function cloning Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: david.bolvansky at gmail dot com Target Milestone: --- int __attribute__ ((noinline)) foo (int arg) { return 2 * arg; } int bar (int arg) { return foo (5); } results in: foo.constprop.0: mov eax, 10 ret foo: lea eax, [rdi+rdi] ret bar: jmp foo.constprop.0 But ... why foo.constprop.0 is not inlined fully into bar? Maybe foo.constprop.0 inherits noinline attribute from foo? If so, gcc should drop attributes from cloned functions..
[Bug ipa/104187] Call site specific attribute to control inliner
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104187 --- Comment #8 from Dávid Bolvanský --- So this works in Clang now int foo(int x, int y) { // any compiler will happily inline this function return x / y; } int test(int x, int y) { int r = 0; [[clang::noinline]] r += foo(x, y); // for some reason we don't want any inlining here return r; } foo(int, int): # @foo(int, int) mov eax, edi cdq idiv esi ret test(int, int): # @test(int, int) jmp foo(int, int) # TAILCALL foo(int, int): # @foo(int, int) mov eax, edi cdq idiv esi ret test(int, int): # @test(int, int) jmp foo(int, int) # TAILCALL
[Bug c/104187] New: Call site specific attribute to control inliner
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104187 Bug ID: 104187 Summary: Call site specific attribute to control inliner Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: david.bolvansky at gmail dot com Target Milestone: --- It could be useful to have more control over inlining. Use cases: int foo(); void bar(); int g; void test() { g = __builtin_always_inline(foo()); // force inlining of foo() here __builtin_noinline(bar()); // never inline bar to this function }
[Bug ipa/104187] Call site specific attribute to control inliner
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104187 --- Comment #5 from Dávid Bolvanský --- So you prefer eg. g = a[i] - [[gnu::always_inline]] foo(x, y) + 2 * bar(); over g = a[i] - __builtin_always_inline(foo(x, y)) + 2 * bar(); ? What is your proposed syntax?
[Bug tree-optimization/93150] (A&N) == CST1 &( ((A&M)==CST2) | ((A&O)==CST3) ) is not simplified
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93150 Dávid Bolvanský changed: What|Removed |Added CC||david.bolvansky at gmail dot com --- Comment #2 from Dávid Bolvanský --- Bin ops with constants are simplified by compiler itself..
[Bug tree-optimization/102483] New: Regression in codegen of reduction of 4 chars
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102483 Bug ID: 102483 Summary: Regression in codegen of reduction of 4 chars Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: david.bolvansky at gmail dot com Target Milestone: --- char foo (char* p) { char sum = 0; for (int i = 0; i != 4; i++) sum += p[i]; return sum; } -O3 -march=x86-64 GCC trunk: foo: mov edx, DWORD PTR [rdi] movzx eax, dh mov ecx, edx add eax, edx shr ecx, 16 add eax, ecx shr edx, 24 add eax, edx ret GCC 11 (much better): foo: movzx eax, BYTE PTR [rdi+1] add al, BYTE PTR [rdi] add al, BYTE PTR [rdi+2] add al, BYTE PTR [rdi+3] ret Best? llvm-mca says so.. foo:# @foo movdxmm0, dword ptr [rdi] # xmm0 = mem[0],zero,zero,zero pxorxmm1, xmm1 psadbw xmm1, xmm0 movdeax, xmm1 ret https://godbolt.org/z/sT9svvj7W
[Bug tree-optimization/102564] New: Missed loop vectorization with reduction and ptr load/store inside loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102564 Bug ID: 102564 Summary: Missed loop vectorization with reduction and ptr load/store inside loop Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: david.bolvansky at gmail dot com Target Milestone: --- void test1(int *p, int *t, int N) { for (int i = 0; i != N; i++) *t += p[i]; } void test2(int *p, int *t, int N) { if (N > 1024) // hint, N is not small for (int i = 0; i != N; i++) *t += p[i]; } void test3(int *p, int *t, int N) { if (N > 1024) { // hint, N is not small int s = 0; for (int i = 0; i != N; i++) s += p[i]; *t += s; } } test3 is successfully vectorized with LLVM, GCC, ICC. Sadly, only ICC can catch test1 and test2. https://godbolt.org/z/PzoYd4eEK
[Bug tree-optimization/103002] New: Missed loop unrolling opportunity
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103002 Bug ID: 103002 Summary: Missed loop unrolling opportunity Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: david.bolvansky at gmail dot com Target Milestone: --- #define C 3 struct node { struct node *next; int payload; }; static int count_nodes(const node* p) { int size = 0; while (p) { p = p->next; size++; } return size; } bool has_one_node(const node* p) { return count_nodes(p) == 1; } bool has_C_nodes(const node* p) { return count_nodes(p) == C; } has_one_node(node const*):# @has_one_node(node const*) testrdi, rdi je .LBB0_1 mov eax, 1 .LBB0_3:# =>This Inner Loop Header: Depth=1 mov rdi, qword ptr [rdi] add eax, -1 testrdi, rdi jne .LBB0_3 testeax, eax seteal ret .LBB0_1: xor eax, eax ret has_C_nodes(node const*): # @has_C_nodes(node const*) testrdi, rdi je .LBB1_1 mov eax, 3 .LBB1_3:# =>This Inner Loop Header: Depth=1 mov rdi, qword ptr [rdi] add eax, -1 testrdi, rdi jne .LBB1_3 testeax, eax seteal ret .LBB1_1: xor eax, eax ret has_C_nodes is simple with some kind of loop deletion pass, but generally, these loops can be unrolled for some reasonable C values. https://godbolt.org/z/do656c17b
[Bug rtl-optimization/7061] Access of bytes in struct parameters
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=7061 Dávid Bolvanský changed: What|Removed |Added CC||david.bolvansky at gmail dot com --- Comment #10 from Dávid Bolvanský --- llvm emits just: im: # @im shufps xmm0, xmm0, 85 # xmm0 = xmm0[1,1,1,1] ret
[Bug c++/117465] New: Disable -Wnonnull-compare in macros
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117465 Bug ID: 117465 Summary: Disable -Wnonnull-compare in macros Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: david.bolvansky at gmail dot com Target Milestone: --- #include #define DEBUG(ptr) if (ptr) printf("%p", (void *)ptr); class Clz { int data; public: Clz(void) { DEBUG(this); } }; int main(void) { int i; DEBUG(&i); Clz a; return 0; } g++ -Wall -Wextra code.cpp warning: 'nonnull' argument 'this' compared to NULL [-Wnonnull-compare] 3 | #define DEBUG(ptr) if (ptr) printf("%p", (void *)ptr); This is quite annoying in macros. Derived from real code, where null check is needed, as macro is universal.