https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63304
--- Comment #30 from Evandro ---
The performance impact of always referring to constants as if they were far
away is significant on targets which do not fuse ADRP and LDR together. What's
the status of the solution that evaluates the function si
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63304
--- Comment #32 from Evandro ---
(In reply to Ramana Radhakrishnan from comment #31)
> (In reply to Evandro from comment #30)
> > The performance impact of always referring to constants as if they were far
> > away is significant on targets which
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63304
--- Comment #34 from Evandro ---
(In reply to Wilco from comment #33)
> (In reply to Evandro from comment #32)
> ADRP latency to load-address should be zero on any OoO core - ADRP is
> basically a move-immediate, so can execute early and hide any
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63304
--- Comment #36 from Evandro ---
(In reply to Ramana Radhakrishnan from comment #35)
> (In reply to Evandro from comment #32)
> > Because of side effects of the Haiffa scheduler, the loads now pile up, and
> > the ADRPs may affect the load issue
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63304
--- Comment #37 from Evandro ---
Here's what I had in mind:
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01787.html
Feedback is welcome.
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: e.menezes at samsung dot com
CC: spop at gcc dot gnu.org
Target: aarch64-*
Curious why Geekbench's {D,S}GEMM by GCC were 8-9% slower than by LLVM,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503
--- Comment #3 from Evandro Menezes ---
(In reply to Andrew Pinski from comment #1)
> The other question here are there denormals happening? That might cause
> some performance differences between using fmadd and fmul/fadd.
Nope, no denormals.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503
--- Comment #4 from Evandro Menezes ---
Here's a simplified code to reproduce these results:
double sum(double *A, double *B, int n)
{
int i;
double res = 0;
for (i = 0; i < n; i++)
res += A [i] * B [i];
return res;
}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503
--- Comment #8 from Evandro Menezes ---
(In reply to Ramana Radhakrishnan from comment #7)
> As Evandro doesn't mention flags it's hard to say whether there really is a
> problem here or not.
Both GCC and LLVM were given "-O3 -ffast-math".
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503
--- Comment #9 from Evandro Menezes ---
(In reply to Wilco from comment #6)
> I ran the assembler examples on A57 hardware with identical input. The FMADD
> code is ~20% faster irrespectively of the size of the input. This is not a
> surprise giv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503
--- Comment #12 from Evandro Menezes ---
Created attachment 33774
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33774&action=edit
Simple test-case
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503
--- Comment #14 from Evandro Menezes ---
Compiling the test-case above with just -O2, I can reproduce the code I
mentioned initially and easily measure the cycle count to run it on target
using perf.
The binary created by GCC runs in about 44700
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503
--- Comment #16 from Evandro ---
(In reply to Wilco from comment #15)
> Using -Ofast is not any different from -O3 -ffast-math when compiling
> non-Fortran code. As comment 10 shows, both loops are vectorized, however
> LLVM unrolls twice and use
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503
--- Comment #17 from Evandro ---
Created attachment 33785
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33785&action=edit
Simple matrix multiplication
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503
Evandro changed:
What|Removed |Added
Attachment #33774|0 |1
is obsolete|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61915
--- Comment #11 from Evandro ---
(In reply to Wilco from comment #9)
> The performance cost is a much bigger issue than codesize. The problem is
> that when register pressure is high, the register allocator decides to
> allocate integer liverange
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61915
--- Comment #12 from Evandro ---
(In reply to Evandro from comment #11)
> Do you have an idea of the performance impact of this patch?
At least in Dhrystone, it improved by over 2% on A57.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61915
--- Comment #14 from Evandro ---
(In reply to Wilco from comment #10)
> Note currently it is not possible to use FP registers for spilling using the
> hooks - basically you still end up with int<->fp moves for every definition
> and use (even whe
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503
--- Comment #21 from Evandro ---
(In reply to ramana.radhakrish...@arm.com from comment #20)
> What's the kind of performance delta you see if you managed to unroll
> the loop just a wee bit ? Probably not much looking at the code produced
> he
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503
--- Comment #23 from Evandro ---
(In reply to Wilco from comment #22)
> Unrolling alone isn't good enough in sum reductions. As I mentioned before,
> GCC doesn't enable any of the useful loop optimizations by default. So add
> -fvariable-expansio
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61915
--- Comment #20 from Evandro ---
(In reply to Ramana Radhakrishnan from comment #19)
> To my mind it seems like 407 fmoves is just a bit too berserk and regardless
> of how efficient your core is, there is no point in having so many moves
> back
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58623
--- Comment #6 from Evandro ---
What's the PR of the fwprop issue?
Thank you.
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: e.menezes at samsung dot com
The issue that I observed in code size due to the default use of the LRA
results in the spilling of the FP register used to spill variables into, which
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61915
--- Comment #3 from Evandro Menezes ---
In Opteron, there was a path from FP to the GP registers, but not the other way
around. That path was eventually made symmetric in Barcelona only.
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: e.menezes at samsung dot com
Created attachment 33245
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33245&action=edit
This patch should fix this issue, though it needs a test-case.
In some cases, when
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62014
--- Comment #2 from Evandro Menezes ---
(In reply to Andrew Pinski from comment #1)
> + /* Do not spill into FP registers when "-mgeneral-regs-only" is
> specified. *
>
> You are missing a / in your comment.
Ermahgerd!
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62014
Evandro Menezes changed:
What|Removed |Added
Attachment #33245|0 |1
is obsolete|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61915
--- Comment #5 from Evandro Menezes ---
Created attachment 33249
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33249&action=edit
Dhrystone, part 2 of 3
I firstly observed this issue when looking into Dhrystone built with fairly
standard o
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62014
--- Comment #9 from Evandro Menezes ---
It seems to me that it's the LRA which is forcing the use of FP registers, so,
even if the patterns are fixed, I believe that in the end the combiner would
just give up and ICE. With this assumption, which
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62014
--- Comment #11 from Evandro Menezes ---
(In reply to ktkachov from comment #10)
> What we really need here is a preprocessed testcase showing the problem.
> It should be fairly easy to lock down on the problem then
I'm on it.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62014
--- Comment #13 from Evandro Menezes ---
Created attachment 33253
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33253&action=edit
Test-case
This test-case is a stripped-down version of Dhrystone, where the issue was
first observed.
Built
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62014
Evandro Menezes changed:
What|Removed |Added
Attachment #33246|0 |1
is obsolete|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62014
Evandro Menezes changed:
What|Removed |Added
Status|WAITING |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61915
--- Comment #7 from Evandro Menezes ---
(In reply to Vladimir Makarov from comment #6)
>
> Evandro, thanks for reporting this. Sorry, I am busy with other thing these
> days. I'll start to work on this PR in September to try to make some
> pro
34 matches
Mail list logo