Re: [PATCH 3/3][AArch64] Emit division using the Newton series

2016-06-03 Thread Evandro Menezes
Rebasing the patch... -- Evandro Menezes >From d791090aae6a29fa94d8fc10894ee1053b05bcc2 Mon Sep 17 00:00:00 2001 From: Evandro Menezes Date: Mon, 4 Apr 2016 14:02:24 -0500 Subject: [PATCH 3/3] [AArch64] Emit division using the Newton series 2016-04-04 Evandro Menezes Wi

Re: [PATCH 1/3][AArch64] Add more choices for the reciprocal square root approximation

2016-06-03 Thread Evandro Menezes
On 06/01/16 03:35, James Greenhalgh wrote: On Fri, May 27, 2016 at 05:57:23PM -0500, Evandro Menezes wrote: From 86d7690632d03ec85fd69bfaef8e89c0542518ad Mon Sep 17 00:00:00 2001 From: Evandro Menezes Date: Thu, 3 Mar 2016 18:13:46 -0600 Subject: [PATCH 1/3] [AArch64] Add more choices for the

Re: [PATCH][AArch64] Increase code alignment

2016-06-03 Thread Evandro Menezes
most comfortable with, but I also wonder if the -falign-labels shouldn't also be a parameter in tune_params. Thoughts? -- Evandro Menezes

Re: [PATCH][AArch64] Increase code alignment

2016-06-03 Thread Evandro Menezes
On 06/03/16 17:22, Evandro Menezes wrote: On 06/03/16 05:51, Wilco Dijkstra wrote: It looks almost all AArch64 cores agree on alignment of 16 for function, and 8 for loops and branches, so we should change -mcpu=generic as well if there is no disagreement - feedback welcome. I'll see

Re: [PATCH 3/3][AArch64] Emit division using the Newton series

2016-06-13 Thread Evandro Menezes
On 06/13/16 05:15, James Greenhalgh wrote: Thanks for your patience on this patch series. Just checked the series in. Thank y'all for your assistance and patience. Cheers, -- Evandro Menezes

Re: [PATCH 3/3][AArch64] Emit division using the Newton series

2016-06-14 Thread Evandro Menezes
On 06/14/16 03:28, Christophe Lyon wrote: On 13 June 2016 at 21:06, Evandro Menezes wrote: On 06/13/16 05:15, James Greenhalgh wrote: Thanks for your patience on this patch series. Just checked the series in. If I'm not mistaken, it looks like you forgot to update the ChangeLog fil

RE: [PATCH][AArch64] Enable -frename-registers at -O2 and higher

2016-06-15 Thread Evandro Menezes
significant improvements for me to be comfortable with -frename-registers being a generic default for AArch64. I'll run some larger benchmarks tonight, but I'm leaning towards having it as a target specific extra tuning option. Thank you, -- Evandro Menezes

RE: [PATCH][AArch64] Enable -frename-registers at -O2 and higher

2016-06-16 Thread Evandro Menezes
larger benchmarks tonight, but I'm leaning towards having it as a > target specific extra tuning option. The results are in and -frename-registers is not a good idea for Exynos M1. Thank you, -- Evandro Menezes Austin, TX

Re: [PATCH][AArch64] Increase code alignment

2016-06-29 Thread Evandro Menezes
-systems.com; Evandro Menezes Subject: [PATCH][AArch64] Increase code alignment Increase loop alignment on Cortex cores to 8 and set function alignment to 16. This makes things consistent across big.LITTLE cores, improves performance of benchmarks with tight loops and reduces performance

Re: [AArch64] Emit square root using the Newton series

2016-03-08 Thread Evandro Menezes
On 02/16/16 14:56, Evandro Menezes wrote: On 12/08/15 15:35, Evandro Menezes wrote: Emit square root using the Newton series 2015-12-03 Evandro Menezes gcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): Declare new function. * config

Re: [AArch64] Emit square root using the Newton series

2016-03-08 Thread Evandro Menezes
On 03/08/16 16:08, Evandro Menezes wrote: On 02/16/16 14:56, Evandro Menezes wrote: On 12/08/15 15:35, Evandro Menezes wrote: Emit square root using the Newton series 2015-12-03 Evandro Menezes gcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): Declare new

Re: [AArch64] Emit square root using the Newton series

2016-03-08 Thread Evandro Menezes
On 03/08/16 16:08, Evandro Menezes wrote: On 02/16/16 14:56, Evandro Menezes wrote: On 12/08/15 15:35, Evandro Menezes wrote: Emit square root using the Newton series 2015-12-03 Evandro Menezes gcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): Declare new

Re: [PATCH][AArch64] Replace insn to zero up DF register

2016-03-09 Thread Evandro Menezes
On 03/01/16 13:08, Evandro Menezes wrote: On 03/01/16 13:02, Wilco Dijkstra wrote: Evandro Menezes wrote: The meaning of these attributes are not clear to me. Is there a reference somewhere about which insns are FP or SIMD or neither? The meaning should be clear, "fp" is a floa

Re: [PATCH][AArch64] Replace insn to zero up DF register

2016-03-10 Thread Evandro Menezes
On 03/10/16 07:23, James Greenhalgh wrote: On Wed, Mar 09, 2016 at 03:35:43PM -0600, Evandro Menezes wrote: On 03/01/16 13:08, Evandro Menezes wrote: On 03/01/16 13:02, Wilco Dijkstra wrote: Evandro Menezes wrote: The meaning of these attributes are not clear to me. Is there a reference

Re: [PATCH][AArch64] Replace insn to zero up DF register

2016-03-10 Thread Evandro Menezes
On 03/10/16 10:27, Evandro Menezes wrote: On 03/10/16 07:23, James Greenhalgh wrote: On Wed, Mar 09, 2016 at 03:35:43PM -0600, Evandro Menezes wrote: On 03/01/16 13:08, Evandro Menezes wrote: On 03/01/16 13:02, Wilco Dijkstra wrote: Evandro Menezes wrote: The meaning of these attributes are

Re: [AArch64] Emit square root using the Newton series

2016-03-10 Thread Evandro Menezes
han it is today. Thanks for the pointer, Wilco. Will work it in the patch. -- Evandro Menezes

Re: [AArch64] Emit square root using the Newton series

2016-03-10 Thread Evandro Menezes
fmulv2.4s, v1.4s, v1.4s frsqrts v2.4s, v0.4s, v2.4s fmulv1.4s, v1.4s, v2.4s and v1.4s, v3.4s fmulv0.4s, v1.4s, v0.4s Thanks, -- Evandro Menezes

Re: [AArch64] Emit square root using the Newton series

2016-03-14 Thread Evandro Menezes
On 03/10/16 19:06, Wilco Dijkstra wrote: Evandro Menezes wrote: That's what I had in mind too, but around the approximation for x^-1/2 and using masks for vector cases thusly: fcmne v3.4s, v0.4s, #0.0 frsqrte v1.4s, v0.4s fmulv2.4s, v1.4s,

Re: [PATCH 2/4][AArch64] Increase the loop peeling limit

2016-03-18 Thread Evandro Menezes
On 02/03/16 13:46, Evandro Menezes wrote: On 01/08/16 16:55, Evandro Menezes wrote: On 12/16/2015 02:11 PM, Evandro Menezes wrote: On 12/16/2015 05:24 AM, Richard Earnshaw (lists) wrote: On 15/12/15 23:34, Evandro Menezes wrote: On 12/14/2015 05:26 AM, James Greenhalgh wrote: On Thu, Dec 03

Re: [AArch64] Add precision choices for the reciprocal square root approximation

2016-03-18 Thread Evandro Menezes
On 03/18/16 17:20, Wilco Dijkstra wrote: Evandro Menezes wrote: On 03/18/16 10:21, Wilco Dijkstra wrote: Hi Evandro, For example, though this approximation is improves the performance noticeably for DF on A57, for SF, not so much, if at all. I'm still skeptical that you ever can ge

Re: [AArch64] Emit square root using the Newton series

2016-03-18 Thread Evandro Menezes
On 03/10/16 19:06, Wilco Dijkstra wrote: Evandro Menezes wrote: That's what I had in mind too, but around the approximation for x^-1/2 and using masks for vector cases thusly: fcmne v3.4s, v0.4s, #0.0 frsqrte v1.4s, v0.4s fmulv2.4s, v1.4s,

[COMMITTED][AArch64] Tweak the pipeline model for Exynos M1

2016-03-18 Thread Evandro Menezes
Tweak the pipeline model for Exynos M1 * gcc/config/aarch64/aarch64.c (exynosm1_tunings): Enable the weak prefetching model. Committed as r234307. -- Evandro Menezes >From a75d875a3c64180c9d6c368e2d87036d70f66036 Mon Sep 17 00:00:00 2001 From: evandro D

Re: [AArch64] Add precision choices for the reciprocal square root approximation

2016-03-18 Thread Evandro Menezes
tion patch makes the decision in the md file which does not seem a good idea). I agree. Will modify it. Thank you, -- Evandro Menezes

Re: [AArch64] Emit square root using the Newton series

2016-03-19 Thread Evandro Menezes
On 03/08/16 16:08, Evandro Menezes wrote: On 02/16/16 14:56, Evandro Menezes wrote: On 12/08/15 15:35, Evandro Menezes wrote: Emit square root using the Newton series 2015-12-03 Evandro Menezes gcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): Declare new

[AArch64] Add precision choices for the reciprocal square root approximation

2016-03-19 Thread Evandro Menezes
, not so much, if at all. Feedback appreciated. Thank you, -- Evandro Menezes

Emit square root using the Newton series

2016-03-19 Thread Evandro Menezes
2016-03-16 Evandro Menezes Wilco Dijkstra gcc/ * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_APPROX_SQRT_{SF,DF}): New tuning macros. * config/aarch64/aarch64-protos.h (aarch64_emit_approx_rsqrt): Replace with

[AArch64] Add precision choices for the reciprocal square root approximation

2016-03-19 Thread Evandro Menezes
, not so much, if at all. Feedback appreciated. Thank you, -- Evandro Menezes >From 95581aefcf324233c3603f4d8232ee18c5836f8a Mon Sep 17 00:00:00 2001 From: Evandro Menezes Date: Thu, 17 Mar 2016 17:00:03 -0500 Subject: [PATCH] Add precision choices for the reciprocal square root approximat

Re: [AArch64] Emit square root using the Newton series

2016-03-19 Thread Evandro Menezes
On 03/17/16 09:55, James Greenhalgh wrote: On Wed, Mar 16, 2016 at 02:45:37PM -0500, Evandro Menezes wrote: On 03/08/16 16:08, Evandro Menezes wrote: On 02/16/16 14:56, Evandro Menezes wrote: On 12/08/15 15:35, Evandro Menezes wrote: Emit square root using the Newton series 2015-12-03

[AArch64] Emit division using the Newton series

2016-03-19 Thread Evandro Menezes
Emit division using the Newton series 2016-03-17 Evandro Menezes gcc/ * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_APPROX_DIV_{SF,DF}: New tuning macros. * config/aarch64/aarch64-protos.h

Re: [AArch64] Emit division using the Newton series

2016-03-23 Thread Evandro Menezes
On 03/17/16 15:09, Evandro Menezes wrote: This patch implements FP division by an approximation using the Newton series. With this patch, DF division is sped up by over 100% and SF division, zilch, both on A57 and on M1. gcc/ * config/aarch64/aarch64-tuning-flags.def

Re: [AArch64] Emit division using the Newton series

2016-03-23 Thread Evandro Menezes
On 03/17/16 15:09, Evandro Menezes wrote: This patch implements FP division by an approximation using the Newton series. With this patch, DF division is sped up by over 100% and SF division, zilch, both on A57 and on M1. gcc/ * config/aarch64/aarch64-tuning-flags.def

Re: [AArch64] Emit square root using the Newton series

2016-03-24 Thread Evandro Menezes
On 03/17/16 17:46, Evandro Menezes wrote: This patch refactors the function to emit the reciprocal square root approximation to also emit the square root approximation. 2016-03-23 Evandro Menezes Wilco Dijkstra gcc/ * config/aarch64/aarch64-tuning

[AArch64] Fix SIMD predicate

2016-03-30 Thread Evandro Menezes
Add scalar 0.0 to the aarch64_simd_reg_or_zero predicate. 2016-03-30 Evandro Menezes * gcc/config/aarch64/predicates.md (aarch64_simd_reg_or_zero predicate): Add the "const_double" constraint. It seems to me that the aarch64_simd_reg_or_zero should also

Re: [AArch64] Emit division using the Newton series

2016-03-31 Thread Evandro Menezes
On 03/23/16 11:24, Evandro Menezes wrote: On 03/17/16 15:09, Evandro Menezes wrote: This patch implements FP division by an approximation using the Newton series. With this patch, DF division is sped up by over 100% and SF division, zilch, both on A57 and on M1. gcc

Re: [AArch64] Add precision choices for the reciprocal square root approximation

2016-03-31 Thread Evandro Menezes
On 03/18/16 18:00, Evandro Menezes wrote: On 03/18/16 17:20, Wilco Dijkstra wrote: Evandro Menezes wrote: On 03/18/16 10:21, Wilco Dijkstra wrote: Hi Evandro, For example, though this approximation is improves the performance noticeably for DF on A57, for SF, not so much, if at all. I&#

Re: [PATCH 2/4][AArch64] Increase the loop peeling limit

2016-03-31 Thread Evandro Menezes
On 03/16/16 14:48, Evandro Menezes wrote: On 02/03/16 13:46, Evandro Menezes wrote: On 01/08/16 16:55, Evandro Menezes wrote: On 12/16/2015 02:11 PM, Evandro Menezes wrote: On 12/16/2015 05:24 AM, Richard Earnshaw (lists) wrote: On 15/12/15 23:34, Evandro Menezes wrote: On 12/14/2015 05:26

Re: [AArch64] Add precision choices for the reciprocal square root approximation

2016-04-01 Thread Evandro Menezes
On 04/01/16 08:47, Wilco Dijkstra wrote: Evandro Menezes wrote: Ping^1 I haven't seen a newer version that incorporates my feedback. To recap what I'd like to see is a more general way to select approximations based on mode. I don't believe that looking at the inner mode works

Re: [AArch64] Add precision choices for the reciprocal square root approximation

2016-04-01 Thread Evandro Menezes
On 04/01/16 09:06, James Greenhalgh wrote: On Fri, Apr 01, 2016 at 02:47:05PM +0100, Wilco Dijkstra wrote: Evandro Menezes wrote: Ping^1 I haven't seen a newer version that incorporates my feedback. To recap what I'd like to see is a more general way to select approximations based

Re: [AArch64] Add more precision choices for the reciprocal square root approximation

2016-04-01 Thread Evandro Menezes
ument for the mode. This patch allows a target to choose the mode of this operation when it is beneficial to use the approximate version. I hope that this gets in the ballpark of what's been discussed previously. Thank you, -- Evandro Menezes >From 17ac33719bae8966a481cc833c9ac06

Re: [AArch64] Emit division using the Newton series

2016-04-01 Thread Evandro Menezes
On 04/01/16 08:58, Wilco Dijkstra wrote: Evandro Menezes wrote: On 03/23/16 11:24, Evandro Menezes wrote: On 03/17/16 15:09, Evandro Menezes wrote: This patch implements FP division by an approximation using the Newton series. With this patch, DF division is sped up by over 100% and SF

Re: [AArch64] Fix SIMD predicate

2016-04-01 Thread Evandro Menezes
On 03/31/16 04:52, James Greenhalgh wrote: On Wed, Mar 30, 2016 at 11:18:27AM -0500, Evandro Menezes wrote: Add scalar 0.0 to the aarch64_simd_reg_or_zero predicate. 2016-03-30 Evandro Menezes * gcc/config/aarch64/predicates.md (aarch64_simd_reg_or_zero predicate

Re: [AArch64] Emit division using the Newton series

2016-04-01 Thread Evandro Menezes
On 04/01/16 16:22, Wilco Dijkstra wrote: Evandro Menezes wrote: The division variant should use the same latency reduction trick I mentioned for sqrt. I don't think that it applies here, since it doesn't have to deal with special cases. No it applies as it's exactly the same

Re: [AArch64] Emit square root using the Newton series

2016-04-01 Thread Evandro Menezes
On 03/24/16 14:11, Evandro Menezes wrote: On 03/17/16 17:46, Evandro Menezes wrote: This patch refactors the function to emit the reciprocal square root approximation to also emit the square root approximation. This version of the patch cleans up the changes to the MD files and fixes some bugs

Re: [AArch64] Emit division using the Newton series

2016-04-01 Thread Evandro Menezes
On 04/01/16 17:45, Wilco Dijkstra wrote: Evandro Menezes wrote: However, I don't think that there's the need to handle any special case for division. The only case when the approximation differs from division is when the numerator is infinity and the denominator, zero, when the app

Re: [AArch64] Add more precision choices for the reciprocal square root approximation

2016-04-04 Thread Evandro Menezes
On 04/01/16 18:08, Wilco Dijkstra wrote: Evandro Menezes wrote: I hope that this gets in the ballpark of what's been discussed previously. Yes that's very close to what I had in mind. A minor issue is that the vector modes cannot work as they start at MAX_MODE_FLOAT (whi

Re: [AArch64] Emit square root using the Newton series

2016-04-04 Thread Evandro Menezes
On 04/01/16 17:45, Evandro Menezes wrote: On 03/24/16 14:11, Evandro Menezes wrote: On 03/17/16 17:46, Evandro Menezes wrote: This patch refactors the function to emit the reciprocal square root approximation to also emit the square root approximation. This version of the patch cleans up the

Re: [AArch64] Emit division using the Newton series

2016-04-04 Thread Evandro Menezes
On 04/01/16 17:52, Evandro Menezes wrote: On 04/01/16 17:45, Wilco Dijkstra wrote: Evandro Menezes wrote: However, I don't think that there's the need to handle any special case for division. The only case when the approximation differs from division is when the numerator is infini

Re: [AArch64] Emit square root using the Newton series

2016-04-05 Thread Evandro Menezes
ave a patchset that applies cleanly so I can try all approximation routines? Hi, Wilco. The original patches should be independent of each other, so indeed they duplicate code. This patch suite should be suitable for testing. HTH -- Evandro Menezes >From cbc2b62f7df5c3e2fef2a24157b1bdd1a6de191b

Re: [AArch64] Emit division using the Newton series

2016-04-12 Thread Evandro Menezes
On 04/04/16 14:06, Evandro Menezes wrote: On 04/01/16 17:52, Evandro Menezes wrote: On 04/01/16 17:45, Wilco Dijkstra wrote: Evandro Menezes wrote: However, I don't think that there's the need to handle any special case for division. The only case when the approximation di

Re: [AArch64] Add more precision choices for the reciprocal square root approximation

2016-04-12 Thread Evandro Menezes
On 04/04/16 11:13, Evandro Menezes wrote: On 04/01/16 18:08, Wilco Dijkstra wrote: Evandro Menezes wrote: I hope that this gets in the ballpark of what's been discussed previously. Yes that's very close to what I had in mind. A minor issue is that the vector modes cannot work as

Re: [AArch64] Emit square root using the Newton series

2016-04-12 Thread Evandro Menezes
On 04/05/16 17:30, Evandro Menezes wrote: On 04/05/16 13:37, Wilco Dijkstra wrote: I can't get any of these to work... Not only do I get a large number of collisions and duplicated code between these patches, when I try to resolve them, all I get is crashes whenever I try to use sqrt

RE: [AArch64] Emit division using the Newton series

2016-04-21 Thread Evandro Menezes
> On 04/04/16 14:06, Evandro Menezes wrote: > > On 04/01/16 17:52, Evandro Menezes wrote: > >> On 04/01/16 17:45, Wilco Dijkstra wrote: > >>> Evandro Menezes wrote: > >>> > >>>> However, I don't think that there's the need to h

RE: [AArch64] Emit square root using the Newton series

2016-04-21 Thread Evandro Menezes
> On 04/05/16 17:30, Evandro Menezes wrote: > > On 04/05/16 13:37, Wilco Dijkstra wrote: > >> I can't get any of these to work... Not only do I get a large number > >> of collisions and duplicated code between these patches, when I try > >> to resolve the

RE: [AArch64] Add more precision choices for the reciprocal square root approximation

2016-04-21 Thread Evandro Menezes
> On 04/04/16 11:13, Evandro Menezes wrote: > > On 04/01/16 18:08, Wilco Dijkstra wrote: > >> Evandro Menezes wrote: > >>> I hope that this gets in the ballpark of what's been discussed > >>> previously. > >> Yes that's very close to wh

Re: [PATCH][AArch64] Improve aarch64_case_values_threshold setting

2016-04-25 Thread Evandro Menezes
assume that you mean that such improvements are true for -mcpu=generic, yes? On which target, A53 or A57 or other? Otherwise, it seems to be a sensible change, but I'm trying to understand how generally beneficial it is. Thank you, -- Evandro Menezes

Re: [PATCH][AArch64][wwwdocs] Summarise some more AArch64 changes for GCC6

2016-04-25 Thread Evandro Menezes
On 04/21/16 03:15, Kyrill Tkachov wrote: Ok to commit? LGTM -- Evandro Menezes

Re: [PATCH][AArch64] Adjust SIMD integer preference

2016-04-25 Thread Evandro Menezes
On 04/22/16 10:35, Wilco Dijkstra wrote: OK for trunk? LGTM -- Evandro Menezes

Re: [PATCH][AArch64] Replace insn to zero up SIMD registers

2016-04-25 Thread Evandro Menezes
On 03/10/16 10:37, James Greenhalgh wrote: On Thu, Mar 10, 2016 at 10:32:15AM -0600, Evandro Menezes wrote: I agree to postpone until GCC 7. [AArch64] Replace insn to zero up SIMD registers gcc/ * config/aarch64/aarch64.md (*movhf_aarch64): Add "mo

Re: [PATCH][AArch64] Improve aarch64_case_values_threshold setting

2016-04-25 Thread Evandro Menezes
On 04/25/16 14:21, Wilco Dijkstra wrote: Evandro Menezes wrote: I assume that you mean that such improvements are true for -mcpu=generic, yes? On which target, A53 or A57 or other? It's true for any CPU setting. The SPEC results are for Cortex-A57 however I wrote a microbenchmark that

Re: [PATCH][AArch64] Improve aarch64_case_values_threshold setting

2016-04-25 Thread Evandro Menezes
On 04/25/16 14:58, Wilco Dijkstra wrote: Evandro Menezes wrote: I agree with your assessment, but I'm more curious to understand how this change affects code built with the default -mcpu=generic when run on both A53 and A57, the typical configuration of big.LITTLE machines. I wouldn

Re: [PATCH][AArch64] Improve aarch64_case_values_threshold setting

2016-04-26 Thread Evandro Menezes
On 04/26/16 11:14, Wilco Dijkstra wrote: Evandro Menezes wrote: True, but the results when running on A53 could be quite different. GCC is ~1.2% faster on Cortex-A53 built for generic, but there is no difference in perlbench. Looks good, then. Fine by me. Thanks for your patience

Re: [PATCH][AArch64] Allow multiple-of-8 immediate offsets for TImode LDP/STP

2016-07-13 Thread Evandro Menezes
stp x2, x3, [x0] ret whereas with this patch we generate: bar: ldp x2, x3, [x1, 8] stp x2, x3, [x0, 8] ret Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? LGTM -- Evandro Menezes

Re: [AArch64] Emit division using the Newton series

2016-04-27 Thread Evandro Menezes
ut so are users to use it through the command line option -mlow-precision-div. -- Evandro Menezes

Re: [AArch64] Emit square root using the Newton series

2016-04-27 Thread Evandro Menezes
On 04/27/16 09:23, James Greenhalgh wrote: On Tue, Apr 12, 2016 at 01:14:51PM -0500, Evandro Menezes wrote: On 04/05/16 17:30, Evandro Menezes wrote: On 04/05/16 13:37, Wilco Dijkstra wrote: I can't get any of these to work... Not only do I get a large number of collisions and duplicated

Re: [PATCH][AArch64] Simplify ashl3 expander for SHORT modes

2016-04-27 Thread Evandro Menezes
On 04/27/16 09:10, Kyrill Tkachov wrote: 2016-04-27 Kyrylo Tkachov * config/aarch64/aarch64.md (ashl3, SHORT modes): Use const_int_operand for operand 2 predicate. Simplify expand code as a result. LGTM -- Evandro Menezes

Re: [PATCH][AArch64] Replace insn to zero up SIMD registers

2016-04-27 Thread Evandro Menezes
On 04/26/16 08:25, Wilco Dijkstra wrote: Evandro Menezes wrote: On 03/10/16 10:37, James Greenhalgh wrote: Thanks for sticking with it. This is OK for GCC 7 when development opens. Remember to mention the most recent changes in your Changelog entry (Remove "fp" attribute from *mov

[PATCH 0/3][AArch64] Add infrastructure for more approximate FP operations

2016-04-27 Thread Evandro Menezes
approximation 2. [PATCH 2/3][AArch64] Emit square root using the Newton series 3. [PATCH 3/3][AArch64] Emit division using the Newton series Thank you, -- Evandro Menezes

[PATCH 1/3][AArch64] Add more choices for the reciprocal square root approximation

2016-04-27 Thread Evandro Menezes
(aarch64_optab_supported_p): New argument for the mode. * doc/invoke.texi (-mlow-precision-recip-sqrt): Reword description. -- Evandro Menezes >From 2cb6c0f35bbdc3b4cc6f88c61a50f3fbb168ec99 Mon Sep 17 00:00:00 2001 From: Evandro Menezes Date: Thu, 3 Mar 2016 18:13:46 -0600 Subjec

[PATCH 2/3][AArch64] Emit square root using the Newton series

2016-04-27 Thread Evandro Menezes
n and insn definitions. * config/aarch64/aarch64.md: Likewise. * config/aarch64/aarch64.opt (mlow-precision-sqrt): Add new option description. * doc/invoke.texi (mlow-precision-sqrt): Likewise. -- Evandro Menezes >From 753115a8691afd7aed4a510d9e9cb0a8e859acf4 Mon Sep 1

[PATCH 3/3][AArch64] Emit division using the Newton series

2016-04-27 Thread Evandro Menezes
Define new function. * config/aarch64/aarch64.md ("div3"): New expansion. * config/aarch64/aarch64-simd.md ("div3"): Likewise. * config/aarch64/aarch64.opt (-mlow-precision-div): Add new option. * doc/invoke.texi (-mlow-precision-div): Describe

Re: [PATCH][AArch64] Improve aarch64_case_values_threshold setting

2016-05-23 Thread Evandro Menezes
n. Cheers, -- Evandro Menezes

Re: [PATCH 0/3][AArch64] Add infrastructure for more approximate FP operations

2016-05-23 Thread Evandro Menezes
On 04/27/16 16:13, Evandro Menezes wrote: This patch suite increases the granularity of target selections of approximate FP operations and adds the options of emitting approximate square root and division. The full suite is contained in the emails tagged: 1. [PATCH 1/3][AArch64] Add more

Re: [PATCH][AArch64] Improve aarch64_case_values_threshold setting

2016-05-24 Thread Evandro Menezes
ue to code alignment or some other secondary effect. I always thought that this patch, that lays out the branch tree more optimally, deserved to be revisited: https://gcc.gnu.org/ml/gcc-patches/2008-04/msg02197.html Cheers, -- Evandro Menezes

Re: [PATCH][AArch64] Improve aarch64_case_values_threshold setting

2016-05-24 Thread Evandro Menezes
On 05/23/16 15:32, Evandro Menezes wrote: I'm fine with this patch, as it achieves in part what I intended before: going beyond the default_case_values_threshold, too conservative for Exynos M1. My concern is particularly what happens to in-order targets, like the ubiquitous A53.

Re: [PATCH 2/4][AArch64] Increase the loop peeling limit

2015-12-15 Thread Evandro Menezes
On 12/14/2015 05:26 AM, James Greenhalgh wrote: On Thu, Dec 03, 2015 at 03:07:43PM -0600, Evandro Menezes wrote: On 11/20/2015 05:53 AM, James Greenhalgh wrote: On Thu, Nov 19, 2015 at 04:04:41PM -0600, Evandro Menezes wrote: On 11/05/2015 02:51 PM, Evandro Menezes wrote: 2015-11-05 Evandro

Re: [PATCH 2/4][AArch64] Increase the loop peeling limit

2015-12-16 Thread Evandro Menezes
On 12/16/2015 05:24 AM, Richard Earnshaw (lists) wrote: On 15/12/15 23:34, Evandro Menezes wrote: On 12/14/2015 05:26 AM, James Greenhalgh wrote: On Thu, Dec 03, 2015 at 03:07:43PM -0600, Evandro Menezes wrote: On 11/20/2015 05:53 AM, James Greenhalgh wrote: On Thu, Nov 19, 2015 at 04:04

Re: [PATCH][AArch64] Replace insn to zero up DF register

2015-12-16 Thread Evandro Menezes
On 10/30/2015 05:24 AM, Marcus Shawcroft wrote: On 20 October 2015 at 00:40, Evandro Menezes wrote: In the existing targets, it seems that it's always faster to zero up a DF register with "movi %d0, #0" instead of "fmov %d0, xzr". This patch modifies the respect

Re: [PATCH 2/4 v2][AArch64] Add support for FCCMP

2016-01-05 Thread Evandro Menezes
. (aarch64_gen_ccmp_next): Add FP support. gcc/testsuite/ * gcc.target/aarch64/ccmp_1.c: New testcase. Add support for the FCCMP insn types 2016-01-04 Evandro Menezes gcc/ * config/aarch64/aarch64.md (fccmp): Change insn type. (fccmpe): Likewise

Re: [PATCH 2/4 v2][AArch64] Add support for FCCMP

2016-01-06 Thread Evandro Menezes
e new attributes look fine (I've got a similar outstanding change), however please don't add them to non-AArch64 cores. We only need it for thunderx.md, cortex-a53.md, cortex-a57.md, xgene1.md and exynos-m1.md. Add support for the FCCMP insn types 2016-01-04 Evandro Menezes

Re: [PATCH 2/4][AArch64] Increase the loop peeling limit

2016-01-08 Thread Evandro Menezes
On 12/16/2015 02:11 PM, Evandro Menezes wrote: On 12/16/2015 05:24 AM, Richard Earnshaw (lists) wrote: On 15/12/15 23:34, Evandro Menezes wrote: On 12/14/2015 05:26 AM, James Greenhalgh wrote: On Thu, Dec 03, 2015 at 03:07:43PM -0600, Evandro Menezes wrote: On 11/20/2015 05:53 AM, James

Re: [Patch AArch64] Use software sqrt expansion always for -mlow-precision-recip-sqrt

2016-01-11 Thread Evandro Menezes
seful when specifying flags for specific functions, irrespective of the core. Thoughts? -- Evandro Menezes

Re: [PATCH][AArch64] Replace insn to zero up DF register

2016-01-12 Thread Evandro Menezes
On 12/16/2015 03:30 PM, Evandro Menezes wrote: On 10/30/2015 05:24 AM, Marcus Shawcroft wrote: On 20 October 2015 at 00:40, Evandro Menezes wrote: In the existing targets, it seems that it's always faster to zero up a DF register with "movi %d0, #0" instead of "fmov %d

[PATCH] aarch64: Add SVE instruction types

2023-05-12 Thread Evandro Menezes via Gcc-patches
This patch adds the attribute `type` to most SVE1 instructions, as in the other instructions. -- Evandro Menezes 0002-aarch64-Add-SVE-instruction-types.patch Description: Binary data

Re: [PATCH] aarch64: Add SVE instruction types

2023-05-15 Thread Evandro Menezes via Gcc-patches
instructions in its group. Do you have specific instances in mind? Thank you, -- Evandro Menezes > Em 15 de mai. de 2023, à(s) 04:00, Richard Sandiford > escreveu: > > Evandro Menezes via Gcc-patches writes: >> This patch adds the attribute `type` to most SVE1 instructions, a

Re: [PATCH] aarch64: Add SVE instruction types

2023-05-15 Thread Evandro Menezes via Gcc-patches
mention with regards to granularity? Yes, my intent for this patch is to enable modeling the SVE instructions on N1. The patch that implements it brings up some performance improvements, but it’s mostly flat, as expected. Thank you, -- Evandro Menezes > Em 15 de mai. de 2023, à(s) 04:49, Kyr

Re: [PATCH] aarch64: Add SVE instruction types

2023-05-16 Thread Evandro Menezes via Gcc-patches
> I think that was more down to my rushed model rather than anything else > though. > > Thanks, > Kyrill > > From: Evandro Menezes > Sent: Monday, May 15, 2023 9:13 PM > To: Kyrylo Tkachov > Cc: Richard Sandiford ; Evandro Menezes via > Gcc-patches ; evandro+.

Re: [PATCH] aarch64: Add SVE instruction types

2023-09-12 Thread Evandro Menezes via Gcc-patches
of memory ops through, TARGET_SCHED_ADJUST_PRIORITY, but it was innefective. I’m a bit at a loss what’s likely going on with the RA at this point. Any pointers? Thank you, -- Evandro Menezes > Em 16 de mai. de 2023, à(s) 03:36, Kyrylo Tkachov > escreveu: > > Hi Evandro, >

[PATCH] aarch64: Add the scheduling model for Neoverse N1

2023-04-18 Thread Evandro Menezes via Gcc-patches
This patch adds the scheduling model for Neoverse N1, based on the information from the "Arm Neoverse N1 Software Optimization Guide”. -- Evandro Menezes gcc/ChangeLog: * config/aarch64/aarch64-core

[PATCH] aarch64: Add the cost model for Neoverse N1

2023-04-18 Thread Evandro Menezes via Gcc-patches
This patch adds the cost model for Neoverse N1, based on the information from the "Arm Neoverse N1 Software Optimization Guide”. -- Evandro Menezes gcc/ChangeLog: * config/aarch64/aarch64-cores.def

Re: [PATCH] aarch64: Add the cost model for Neoverse N1

2023-04-24 Thread Evandro Menezes via Gcc-patches
Hi, Tamara. Does this work? Thank you, -- Evandro Menezes ◊ evan...@yahoo.com ◊ Austin, TX Άγιος ο Θεός ⁂ ܩܕܝܫܐ ܐܢ̱ܬ ܠܐ ܡܝܘܬܐ ⁂ Sanctus Deus > Em 24 de abr. de 2023, à(s) 12:37, Tamar Christina > escreveu: > > Hi Evandro, > > I wanted to give this patch a try, but the

Re: [PATCH] aarch64: Add the cost model for Neoverse N1

2023-04-24 Thread Evandro Menezes via Gcc-patches
Sorry, but it seems that, before sending, the email client is stripping leading spaces. I’m attaching the file here. -- Evandro Menezes ◊ evan...@yahoo.com ◊ Austin, TX Άγιος ο Θεός ⁂ ܩܕܝܫܐ ܐܢ̱ܬ ܠܐ ܡܝܘܬܐ ⁂ Sanctus Deus > Em 24 de abr. de 2023, à(s) 17:48, Evandro Menezes > escreveu: &

aarch64: Add scheduling model for Neoverse V1

2023-05-07 Thread Evandro Menezes via Gcc-patches
This patch adds the scheduling model for Neoverse V1, based on the information from the “Arm Neoverse V1 Software Optimization Guide” and on static and dynamic analysis of internal and public benchmarks. Results are forthcoming. -- Evandro Menezes 0001-aarch64-Add-scheduling-model-for

[PATCH] aarch64: Add the cost and scheduling models for Neoverse N1

2023-04-07 Thread Evandro Menezes via Gcc-patches
This patch adds the cost and scheduling models for Neoverse N1, based on the information from the "Arm Neoverse N1 Software Optimization Guide”. -- Evandro Menezes ◊ evan...@yahoo.com [PATCH] aarch64: Add the cost and scheduling models for Neoverse N1 gcc/ChangeLog: * config/aa

Re: [PATCH] aarch64: Add the cost and scheduling models for Neoverse N1

2023-04-17 Thread Evandro Menezes via Gcc-patches
Hi, Kyrylo. > Em 11 de abr. de 2023, à(s) 04:41, Kyrylo Tkachov > escreveu: > >> -Original Message- >> From: Gcc-patches > bounces+kyrylo.tkachov=arm@gcc.gnu.org >> <mailto:bounces+kyrylo.tkachov=arm@gcc.gnu.org>> On Behalf Of Evandro >

<    1   2