On Mon, Feb 15, 2016 at 11:24:53AM -0600, Evandro Menezes wrote:
> On 02/15/16 04:50, James Greenhalgh wrote:
> >On Mon, Feb 08, 2016 at 10:57:10AM +0000, James Greenhalgh wrote:
> >>On Mon, Feb 01, 2016 at 02:00:01PM +0000, James Greenhalgh wrote:
> >>>On Mon, Jan 25, 2016 at 11:20:46AM +0000, James Greenhalgh wrote:
> >>>>On Mon, Jan 11, 2016 at 12:04:43PM +0000, James Greenhalgh wrote:
> >>>>>Hi,
> >>>>>
> >>>>>I've seen a couple of large performance issues caused by expanding
> >>>>>the high-precision reciprocal square root for Cortex-A57, so I'd like
> >>>>>to turn it off by default.
> >>>>>
> >>>>>This is good for art (~2%) from Spec2000, bad (~3.5%) for fma3d from
> >>>>>Spec2000, good (~5.5%) for gromcas from Spec2006, and very good (>10%) 
> >>>>>for
> >>>>>some private microbenchmark kernels which stress the divide/sqrt/multiply
> >>>>>units. It therefore seems to me to be the correct choice to make across
> >>>>>a number of workloads.
> >>>>>
> >>>>>Bootstrapped and tested on aarch64-none-linux-gnu with no issues.
> >>>>>
> >>>>>OK?
> >>>>*Ping*
> >>>*pingx2*
> >>*ping^3*
> >*ping^4*
> >
> >Thanks,
> >James
> >
> >>>>>---
> >>>>>2015-12-11  James Greenhalgh  <james.greenha...@arm.com>
> >>>>>
> >>>>> * config/aarch64/aarch64.c (cortexa57_tunings): Remove
> >>>>> AARCH64_EXTRA_TUNE_RECIP_SQRT.
> >>>>>
> >>>>>diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> >>>>>index 1d5d898..999c9fc 100644
> >>>>>--- a/gcc/config/aarch64/aarch64.c
> >>>>>+++ b/gcc/config/aarch64/aarch64.c
> >>>>>@@ -484,8 +484,7 @@ static const struct tune_params cortexa57_tunings =
> >>>>>    0,   /* max_case_values.  */
> >>>>>    0,   /* cache_line_size.  */
> >>>>>    tune_params::AUTOPREFETCHER_WEAK,    /* autoprefetcher_model.  */
> >>>>>-  (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS
> >>>>>-   | AARCH64_EXTRA_TUNE_RECIP_SQRT)     /* tune_flags.  */
> >>>>>+  (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS)  /* tune_flags.  */
> >>>>>  };
> >>>>>  static const struct tune_params cortexa72_tunings =
> >
> 
> James,
> 
> There seem to be SPEC CPU2000fp validation issues on A57 when this
> flag is present too.  Though I evaluated the algorithm with a huge
> random set of values, always delivering accuracy around 1ulp, which
> should be enough for CPU2000fp (wit x86-64), I expected the
> benchmarks to pass.
> 
> My suspicion is that the Newton series on AArch64 is probably good
> only for SP.  Then, DP might require an extra round, probably
> exacerbating the performance penalty.
> 
> I'd like to try to split this tuning option into one for SP and
> another for DP.  Thoughts?

I haven't seen validation issues with the default expansion, but with
-mlow-precision-recip-sqrt I do see failures. I think this is to be
expected. I don't support splitting the low-precision flag to
"-mlow-precision-float-recip-sqrt" and "-mlow-precision-double-recip-sqrt",
I think that is pushing a particular set of Spec tuning flags over any
meaningful use case.

I could imagine a case for splitting the internal tuning flag to give
AARCH64_EXTRA_TUNE_SF_RECIP_SQRT and AARCH64_EXTRA_TUNE_DF_RECIP_SQRT, but
I'm not sure I understand the benefits of this? Certainly, I think your goals
for performance (turn on for 64-bit divide/sqrt) would contradict your goals
for accuracy (turn off for 64-bit divide/sqrt).

I'm happy with these flags as they are, but I might be missing a subtelty in
your argument?

Thanks,
James

Reply via email to