[Bug target/63304] Aarch64 pc-relative load offset out of range

2015-11-06 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63304 --- Comment #30 from Evandro --- The performance impact of always referring to constants as if they were far away is significant on targets which do not fuse ADRP and LDR together. What's the status of the solution that evaluates the function si

[Bug target/63304] Aarch64 pc-relative load offset out of range

2015-11-06 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63304 --- Comment #32 from Evandro --- (In reply to Ramana Radhakrishnan from comment #31) > (In reply to Evandro from comment #30) > > The performance impact of always referring to constants as if they were far > > away is significant on targets which

[Bug target/63304] Aarch64 pc-relative load offset out of range

2015-11-06 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63304 --- Comment #34 from Evandro --- (In reply to Wilco from comment #33) > (In reply to Evandro from comment #32) > ADRP latency to load-address should be zero on any OoO core - ADRP is > basically a move-immediate, so can execute early and hide any

[Bug target/63304] Aarch64 pc-relative load offset out of range

2015-11-06 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63304 --- Comment #36 from Evandro --- (In reply to Ramana Radhakrishnan from comment #35) > (In reply to Evandro from comment #32) > > Because of side effects of the Haiffa scheduler, the loads now pile up, and > > the ADRPs may affect the load issue

[Bug target/63304] Aarch64 pc-relative load offset out of range

2015-11-13 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63304 --- Comment #37 from Evandro --- Here's what I had in mind: https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01787.html Feedback is welcome.

[Bug target/63503] New: [AArch64] A57 executes fused multiply-add poorly in some situations

2014-10-09 Thread e.menezes at samsung dot com
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: e.menezes at samsung dot com CC: spop at gcc dot gnu.org Target: aarch64-* Curious why Geekbench's {D,S}GEMM by GCC were 8-9% slower than by LLVM,

[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

2014-10-09 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503 --- Comment #3 from Evandro Menezes --- (In reply to Andrew Pinski from comment #1) > The other question here are there denormals happening? That might cause > some performance differences between using fmadd and fmul/fadd. Nope, no denormals.

[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

2014-10-09 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503 --- Comment #4 from Evandro Menezes --- Here's a simplified code to reproduce these results: double sum(double *A, double *B, int n) { int i; double res = 0; for (i = 0; i < n; i++) res += A [i] * B [i]; return res; }

[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

2014-10-14 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503 --- Comment #8 from Evandro Menezes --- (In reply to Ramana Radhakrishnan from comment #7) > As Evandro doesn't mention flags it's hard to say whether there really is a > problem here or not. Both GCC and LLVM were given "-O3 -ffast-math".

[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

2014-10-14 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503 --- Comment #9 from Evandro Menezes --- (In reply to Wilco from comment #6) > I ran the assembler examples on A57 hardware with identical input. The FMADD > code is ~20% faster irrespectively of the size of the input. This is not a > surprise giv

[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

2014-10-21 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503 --- Comment #12 from Evandro Menezes --- Created attachment 33774 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33774&action=edit Simple test-case

[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

2014-10-22 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503 --- Comment #14 from Evandro Menezes --- Compiling the test-case above with just -O2, I can reproduce the code I mentioned initially and easily measure the cycle count to run it on target using perf. The binary created by GCC runs in about 44700

[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

2014-10-22 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503 --- Comment #16 from Evandro --- (In reply to Wilco from comment #15) > Using -Ofast is not any different from -O3 -ffast-math when compiling > non-Fortran code. As comment 10 shows, both loops are vectorized, however > LLVM unrolls twice and use

[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

2014-10-22 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503 --- Comment #17 from Evandro --- Created attachment 33785 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33785&action=edit Simple matrix multiplication

[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

2014-10-22 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503 Evandro changed: What|Removed |Added Attachment #33774|0 |1 is obsolete|

[Bug target/61915] [AArch64] High amounts of GP to FP register moves using LRA on AArch64

2014-10-24 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61915 --- Comment #11 from Evandro --- (In reply to Wilco from comment #9) > The performance cost is a much bigger issue than codesize. The problem is > that when register pressure is high, the register allocator decides to > allocate integer liverange

[Bug target/61915] [AArch64] High amounts of GP to FP register moves using LRA on AArch64

2014-10-24 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61915 --- Comment #12 from Evandro --- (In reply to Evandro from comment #11) > Do you have an idea of the performance impact of this patch? At least in Dhrystone, it improved by over 2% on A57.

[Bug target/61915] [AArch64] High amounts of GP to FP register moves using LRA on AArch64

2014-10-24 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61915 --- Comment #14 from Evandro --- (In reply to Wilco from comment #10) > Note currently it is not possible to use FP registers for spilling using the > hooks - basically you still end up with int<->fp moves for every definition > and use (even whe

[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

2014-10-28 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503 --- Comment #21 from Evandro --- (In reply to ramana.radhakrish...@arm.com from comment #20) > What's the kind of performance delta you see if you managed to unroll > the loop just a wee bit ? Probably not much looking at the code produced > he

[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

2014-10-28 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503 --- Comment #23 from Evandro --- (In reply to Wilco from comment #22) > Unrolling alone isn't good enough in sum reductions. As I mentioned before, > GCC doesn't enable any of the useful loop optimizations by default. So add > -fvariable-expansio

[Bug target/61915] [AArch64] High amounts of GP to FP register moves using LRA on AArch64 - Improve Generic register_move_cost and memory_move_cost

2014-10-31 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61915 --- Comment #20 from Evandro --- (In reply to Ramana Radhakrishnan from comment #19) > To my mind it seems like 407 fmoves is just a bit too berserk and regardless > of how efficient your core is, there is no point in having so many moves > back

[Bug target/58623] lack of ldp/stp optimization

2014-12-15 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58623 --- Comment #6 from Evandro --- What's the PR of the fwprop issue? Thank you.

[Bug target/61915] New: [AArch64] Default use of the LRA results in extra code size

2014-07-25 Thread e.menezes at samsung dot com
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: e.menezes at samsung dot com The issue that I observed in code size due to the default use of the LRA results in the spilling of the FP register used to spill variables into, which

[Bug target/61915] [AArch64] Default use of the LRA results in extra code size

2014-07-25 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61915 --- Comment #3 from Evandro Menezes --- In Opteron, there was a path from FP to the GP registers, but not the other way around. That path was eventually made symmetric in Barcelona only.

[Bug target/62014] New: [AArch64] Using -mgeneral-regs-only may lead to ICE

2014-08-04 Thread e.menezes at samsung dot com
Component: target Assignee: unassigned at gcc dot gnu.org Reporter: e.menezes at samsung dot com Created attachment 33245 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33245&action=edit This patch should fix this issue, though it needs a test-case. In some cases, when

[Bug target/62014] [AArch64] Using -mgeneral-regs-only may lead to ICE

2014-08-04 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62014 --- Comment #2 from Evandro Menezes --- (In reply to Andrew Pinski from comment #1) > + /* Do not spill into FP registers when "-mgeneral-regs-only" is > specified. * > > You are missing a / in your comment. Ermahgerd!

[Bug target/62014] [AArch64] Using -mgeneral-regs-only may lead to ICE

2014-08-04 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62014 Evandro Menezes changed: What|Removed |Added Attachment #33245|0 |1 is obsolete|

[Bug target/61915] [AArch64] High amounts of GP to FP register moves using LRA on AArch64

2014-08-05 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61915 --- Comment #5 from Evandro Menezes --- Created attachment 33249 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33249&action=edit Dhrystone, part 2 of 3 I firstly observed this issue when looking into Dhrystone built with fairly standard o

[Bug target/62014] [AArch64] Using -mgeneral-regs-only may lead to ICE

2014-08-05 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62014 --- Comment #9 from Evandro Menezes --- It seems to me that it's the LRA which is forcing the use of FP registers, so, even if the patterns are fixed, I believe that in the end the combiner would just give up and ICE. With this assumption, which

[Bug target/62014] [AArch64] Using -mgeneral-regs-only may lead to ICE

2014-08-05 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62014 --- Comment #11 from Evandro Menezes --- (In reply to ktkachov from comment #10) > What we really need here is a preprocessed testcase showing the problem. > It should be fairly easy to lock down on the problem then I'm on it.

[Bug target/62014] [AArch64] Using -mgeneral-regs-only may lead to ICE

2014-08-05 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62014 --- Comment #13 from Evandro Menezes --- Created attachment 33253 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33253&action=edit Test-case This test-case is a stripped-down version of Dhrystone, where the issue was first observed. Built

[Bug target/62014] [AArch64] Using -mgeneral-regs-only may lead to ICE

2014-08-05 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62014 Evandro Menezes changed: What|Removed |Added Attachment #33246|0 |1 is obsolete|

[Bug target/62014] [AArch64] Using -mgeneral-regs-only may lead to ICE

2014-08-06 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62014 Evandro Menezes changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|---

[Bug target/61915] [AArch64] High amounts of GP to FP register moves using LRA on AArch64

2014-08-14 Thread e.menezes at samsung dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61915 --- Comment #7 from Evandro Menezes --- (In reply to Vladimir Makarov from comment #6) > > Evandro, thanks for reporting this. Sorry, I am busy with other thing these > days. I'll start to work on this PR in September to try to make some > pro