,
||law at redhat dot com,
||rguenth at gcc dot gnu.org,
||slash.tmp at free dot fr
--- Comment #2 from Mason ---
A few more bugs should be added to this tracker:
(It seems I
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56456
--- Comment #5 from Mason ---
Slightly smaller testcase, similar to bug 80907.
extern int M[16];
void foo(int n)
{
for (int i = 0; i < n; ++i)
for (int j = 0; j < i; ++j)
M[i+j] = 0;
}
$ gcc-7 -O3
,
||slash.tmp at free dot fr
--- Comment #3 from Mason ---
Here is a reduced test case:
extern void foo(int *p);
extern int array[2];
void func(void)
{
int i;
for (i = 1; i < 2; i++) {
if (i == 1) continue;
array[i-1] = 0;
}
foo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66031
Mason changed:
What|Removed |Added
CC||slash.tmp at free dot fr
--- Comment #2 from
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: slash.tmp at free dot fr
Target Milestone: ---
Consider the following testcase:
char foo(unsigned char n)
{
static const char map[16] = "wxyz";
return map[n / 16];
}
gcc-7 -O2 -march=
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83272
--- Comment #2 from Mason ---
(In reply to Jakub Jelinek from comment #1)
> I don't believe the andl is not needed after shrb, as that is an 8-bit
> operand size, it should leave the upper 56 bits of the register unmodified.
> And unsigned char
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83272
--- Comment #3 from Mason ---
I think Jakub is right about an interaction between movzbl and shrb.
unsigned long long foo1(unsigned char *p) { return *p; }
foo1:
movzbl (%rdi), %eax
ret
I.e. gcc "knows" that movzbl clears the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105617
--- Comment #18 from Mason ---
Hello Michael_S,
As far as I can see, massaging the source helps GCC generate optimal code
(in terms of instruction count, not convinced about scheduling).
#include
typedef unsigned long long u64;
void add4i(u64
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: slash.tmp at free dot fr
Target Milestone: ---
Consider the following code:
#include
typedef unsigned long long u64;
typedef unsigned __int128 u128;
void testcase1(u64 *acc, u64 a, u64 b)
{
u128 res = (u128)a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102974
--- Comment #11 from Mason ---
Here's umul_least_64() rewritten as mul_64x64_128() in C
typedef unsigned int u32;
typedef unsigned long long u64;
/* u32 acc[3], a[1], b[1] */
static void mul_add_32x32(u32 *acc, const u32 *a, const u32 *b)
{
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102974
--- Comment #12 from Mason ---
Actually, in this case, we don't need to propagate the carry over 3 limbs.
typedef unsigned int u32;
typedef unsigned long long u64;
/* u32 acc[2], a[1], b[1] */
static void mul_add_32x32(u32 *acc, const u32 *a,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102974
--- Comment #16 from Mason ---
For the record, the example I provided was intended to show that, with some
help, GCC can generate good code for bigint multiplication. In this situation,
"help" means a short asm template.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105617
--- Comment #20 from Mason ---
Doh! You're right.
I come from a background where overlapping/aliasing inputs are heresy,
thus got blindsided :(
This would be the optimal code, right?
add4i:
# rdi = dst, rsi = a, rdx = b
movq 0(%rdx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110104
--- Comment #2 from Mason ---
You meant PR79173 ;)
Latest update:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621554.html
I didn't see my testcase specifically in Jakub's patch,
but I'll test trunk on godbolt when/if the patch lands.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110104
--- Comment #4 from Mason ---
I confirm that trunk now emits the same code for testcase1 and testcase2.
Thanks Jakub and Roger, great work!
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110104
--- Comment #5 from Mason ---
FWIW, trunk (gcc14) translates testcase3 to the same code as the other
testcases, while remaining portable across all architectures:
$ gcc-trunk -O3 -march=bdver3 testcase3.c
typedef unsigned long long u64;
typede
16 matches
Mail list logo