[Bug tree-optimization/56456] [meta-bug] bogus warning when inlining or unrolling: "array subscript is above array bounds"

2017-10-12 Thread slash.tmp at free dot fr
, ||law at redhat dot com, ||rguenth at gcc dot gnu.org, ||slash.tmp at free dot fr --- Comment #2 from Mason --- A few more bugs should be added to this tracker: (It seems I

[Bug tree-optimization/56456] [meta-bug] bogus warning when inlining or unrolling: "array subscript is above array bounds"

2017-10-18 Thread slash.tmp at free dot fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56456 --- Comment #5 from Mason --- Slightly smaller testcase, similar to bug 80907. extern int M[16]; void foo(int n) { for (int i = 0; i < n; ++i) for (int j = 0; j < i; ++j) M[i+j] = 0; } $ gcc-7 -O3

[Bug tree-optimization/65461] -Warray-bounds warnings in the linux kernel (free_area_init_nodes)

2017-10-18 Thread slash.tmp at free dot fr
, ||slash.tmp at free dot fr --- Comment #3 from Mason --- Here is a reduced test case: extern void foo(int *p); extern int array[2]; void func(void) { int i; for (i = 1; i < 2; i++) { if (i == 1) continue; array[i-1] = 0; } foo

[Bug middle-end/66031] Spurious array bounds warning

2017-10-18 Thread slash.tmp at free dot fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66031 Mason changed: What|Removed |Added CC||slash.tmp at free dot fr --- Comment #2 from

[Bug rtl-optimization/83272] New: Unnecessary mask instruction generated

2017-12-04 Thread slash.tmp at free dot fr
-optimization Assignee: unassigned at gcc dot gnu.org Reporter: slash.tmp at free dot fr Target Milestone: --- Consider the following testcase: char foo(unsigned char n) { static const char map[16] = "wxyz"; return map[n / 16]; } gcc-7 -O2 -march=

[Bug rtl-optimization/83272] Unnecessary mask instruction generated

2017-12-04 Thread slash.tmp at free dot fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83272 --- Comment #2 from Mason --- (In reply to Jakub Jelinek from comment #1) > I don't believe the andl is not needed after shrb, as that is an 8-bit > operand size, it should leave the upper 56 bits of the register unmodified. > And unsigned char

[Bug rtl-optimization/83272] Unnecessary mask instruction generated

2017-12-05 Thread slash.tmp at free dot fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83272 --- Comment #3 from Mason --- I think Jakub is right about an interaction between movzbl and shrb. unsigned long long foo1(unsigned char *p) { return *p; } foo1: movzbl (%rdi), %eax ret I.e. gcc "knows" that movzbl clears the

[Bug target/105617] [12/13/14 Regression] Slp is maybe too aggressive in some/many cases

2023-06-01 Thread slash.tmp at free dot fr via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105617 --- Comment #18 from Mason --- Hello Michael_S, As far as I can see, massaging the source helps GCC generate optimal code (in terms of instruction count, not convinced about scheduling). #include typedef unsigned long long u64; void add4i(u64

[Bug target/110104] New: gcc produces sub-optimal code for _addcarry_u64 chain

2023-06-03 Thread slash.tmp at free dot fr via Gcc-bugs
Component: target Assignee: unassigned at gcc dot gnu.org Reporter: slash.tmp at free dot fr Target Milestone: --- Consider the following code: #include typedef unsigned long long u64; typedef unsigned __int128 u128; void testcase1(u64 *acc, u64 a, u64 b) { u128 res = (u128)a

[Bug target/102974] GCC optimization is very poor for add carry and multiplication combos

2023-06-03 Thread slash.tmp at free dot fr via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102974 --- Comment #11 from Mason --- Here's umul_least_64() rewritten as mul_64x64_128() in C typedef unsigned int u32; typedef unsigned long long u64; /* u32 acc[3], a[1], b[1] */ static void mul_add_32x32(u32 *acc, const u32 *a, const u32 *b) {

[Bug target/102974] GCC optimization is very poor for add carry and multiplication combos

2023-06-05 Thread slash.tmp at free dot fr via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102974 --- Comment #12 from Mason --- Actually, in this case, we don't need to propagate the carry over 3 limbs. typedef unsigned int u32; typedef unsigned long long u64; /* u32 acc[2], a[1], b[1] */ static void mul_add_32x32(u32 *acc, const u32 *a,

[Bug target/102974] GCC optimization is very poor for add carry and multiplication combos

2023-06-06 Thread slash.tmp at free dot fr via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102974 --- Comment #16 from Mason --- For the record, the example I provided was intended to show that, with some help, GCC can generate good code for bigint multiplication. In this situation, "help" means a short asm template.

[Bug target/105617] [12/13/14 Regression] Slp is maybe too aggressive in some/many cases

2023-06-13 Thread slash.tmp at free dot fr via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105617 --- Comment #20 from Mason --- Doh! You're right. I come from a background where overlapping/aliasing inputs are heresy, thus got blindsided :( This would be the optimal code, right? add4i: # rdi = dst, rsi = a, rdx = b movq 0(%rdx

[Bug target/110104] gcc produces sub-optimal code for _addcarry_u64 chain

2023-06-14 Thread slash.tmp at free dot fr via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110104 --- Comment #2 from Mason --- You meant PR79173 ;) Latest update: https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621554.html I didn't see my testcase specifically in Jakub's patch, but I'll test trunk on godbolt when/if the patch lands.

[Bug target/110104] gcc produces sub-optimal code for _addcarry_u64 chain

2023-06-16 Thread slash.tmp at free dot fr via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110104 --- Comment #4 from Mason --- I confirm that trunk now emits the same code for testcase1 and testcase2. Thanks Jakub and Roger, great work!

[Bug target/110104] gcc produces sub-optimal code for _addcarry_u64 chain

2023-07-07 Thread slash.tmp at free dot fr via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110104 --- Comment #5 from Mason --- FWIW, trunk (gcc14) translates testcase3 to the same code as the other testcases, while remaining portable across all architectures: $ gcc-trunk -O3 -march=bdver3 testcase3.c typedef unsigned long long u64; typede