RE: [Patch AArch64] Use software sqrt expansion always for -mlow-precision-recip-sqrt

Kumar, Venkataramanan Mon, 11 Jan 2016 21:54:07 -0800

Hi James,

> -----Original Message-----
> From: James Greenhalgh [mailto:[email protected]]
> Sent: Monday, January 11, 2016 5:24 PM
> To: [email protected]
> Cc: [email protected]; [email protected];
> [email protected]; Kumar, Venkataramanan;
> [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: [Patch AArch64] Use software sqrt expansion always for -mlow-
> precision-recip-sqrt
> 
> 
> Hi,
> 
> I'd like to switch the logic around in aarch64.c such that -mlow-precision-
> recip-sqrt causes us to always emit the low-precision software expansion for
> reciprocal square root. I have two reasons to do this; first is consistency
> across -mcpu targets, second is enabling more -mcpu targets to use the flag
> for peak tuning.
> 
> I don't much like that the precision we use for -mlow-precision-recip-sqrt
> differs between cores (and possibly compiler revisions). Yes, we're under -
> ffast-math but I take this flag to mean the user explicitly wants the low-
> precision expansion, and we should not diverge from that based on an
> internal decision as to what is optimal for performance in the high-precision
> case. I'd prefer to keep things as predictable as possible, and here that
> means always emitting the low-precision expansion when asked.
> 
> Judging by the comments in the thread proposing the reciprocal square root
> optimisation, this will benefit all cores currently supported by GCC.
> To be clear, we would still not expand in the high-precision case for any 
> cores
> which do not explicitly ask for it. Currently that is Cortex-A57 and xgene,
> though I will be proposing a patch to remove Cortex-A57 from that list
> shortly.
> 
> Which gives my second motivation for this patch. -mlow-precision-recip-sqrt
> is intended as a tuning flag for situations where performance is more
> important than precision, but the current logic requires setting an internal
> flag which also changes the performance characteristics where high-precision
> is needed. This conflates two decisions the target might want to make, and
> reduces the applicability of an option targets might want to enable for
> performance. In particular, I'd still like to see -mlow-precision-recip-sqrt
> continue to emit the cheaper, low-precision sequence for floats under
> Cortex-A57.
> 
> Based on that reasoning, this patch makes the appropriate change to the
> logic. I've checked with the current -mcpu values to ensure that behaviour
> without -mlow-precision-recip-sqrt does not change, and that behaviour
> with -mlow-precision-recip-sqrt is to emit the low precision sequences.
> 
> I've also put this through bootstrap and test on aarch64-none-linux-gnu with
> no issues.
> 
> OK?
> 
> Thanks,
> James
>


Yes I like enabling this optimization for all cpus target via 
-mlow-precision-recip-sqrt .
 
If my understanding is correct for cortex-a57 we now need to use only 
-mlow-precision-recip-sqrt to emit software sqrt expansion?

In the below code 
---snip---
void
aarch64_emit_swrsqrt (rtx dst, rtx src)
{
............
............
  int iterations = double_mode ? 3 : 2;

  if (flag_mrecip_low_precision_sqrt)
    iterations--;
 ---snip---

Now cortex-a57 case we will always do  2 and 1 steps  for double and float  and 
 3 and 2 will never be used.     
Should we make it 2 and 1 as default? Or any target still needs to use 3 and 2. 

Ps: I remember reducing iterations benefited gromacs but caused some VE in 
other FP benchmarks.  

Regards,
Venkat.



> ---
> 2015-12-10  James Greenhalgh  <[email protected]>
> 
>       * config/aarch64/aarch64.c (use_rsqrt_p): Always use software
>       reciprocal sqrt for -mlow-precision-recip-sqrt.

RE: [Patch AArch64] Use software sqrt expansion always for -mlow-precision-recip-sqrt

Reply via email to