Hi James,
> -----Original Message-----
> From: James Greenhalgh [mailto:[email protected]]
> Sent: Monday, January 11, 2016 5:24 PM
> To: [email protected]
> Cc: [email protected]; [email protected];
> [email protected]; Kumar, Venkataramanan;
> [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: [Patch AArch64] Use software sqrt expansion always for -mlow-
> precision-recip-sqrt
>
>
> Hi,
>
> I'd like to switch the logic around in aarch64.c such that -mlow-precision-
> recip-sqrt causes us to always emit the low-precision software expansion for
> reciprocal square root. I have two reasons to do this; first is consistency
> across -mcpu targets, second is enabling more -mcpu targets to use the flag
> for peak tuning.
>
> I don't much like that the precision we use for -mlow-precision-recip-sqrt
> differs between cores (and possibly compiler revisions). Yes, we're under -
> ffast-math but I take this flag to mean the user explicitly wants the low-
> precision expansion, and we should not diverge from that based on an
> internal decision as to what is optimal for performance in the high-precision
> case. I'd prefer to keep things as predictable as possible, and here that
> means always emitting the low-precision expansion when asked.
>
> Judging by the comments in the thread proposing the reciprocal square root
> optimisation, this will benefit all cores currently supported by GCC.
> To be clear, we would still not expand in the high-precision case for any
> cores
> which do not explicitly ask for it. Currently that is Cortex-A57 and xgene,
> though I will be proposing a patch to remove Cortex-A57 from that list
> shortly.
>
> Which gives my second motivation for this patch. -mlow-precision-recip-sqrt
> is intended as a tuning flag for situations where performance is more
> important than precision, but the current logic requires setting an internal
> flag which also changes the performance characteristics where high-precision
> is needed. This conflates two decisions the target might want to make, and
> reduces the applicability of an option targets might want to enable for
> performance. In particular, I'd still like to see -mlow-precision-recip-sqrt
> continue to emit the cheaper, low-precision sequence for floats under
> Cortex-A57.
>
> Based on that reasoning, this patch makes the appropriate change to the
> logic. I've checked with the current -mcpu values to ensure that behaviour
> without -mlow-precision-recip-sqrt does not change, and that behaviour
> with -mlow-precision-recip-sqrt is to emit the low precision sequences.
>
> I've also put this through bootstrap and test on aarch64-none-linux-gnu with
> no issues.
>
> OK?
>
> Thanks,
> James
>
Yes I like enabling this optimization for all cpus target via
-mlow-precision-recip-sqrt .
If my understanding is correct for cortex-a57 we now need to use only
-mlow-precision-recip-sqrt to emit software sqrt expansion?
In the below code
---snip---
void
aarch64_emit_swrsqrt (rtx dst, rtx src)
{
............
............
int iterations = double_mode ? 3 : 2;
if (flag_mrecip_low_precision_sqrt)
iterations--;
---snip---
Now cortex-a57 case we will always do 2 and 1 steps for double and float and
3 and 2 will never be used.
Should we make it 2 and 1 as default? Or any target still needs to use 3 and 2.
Ps: I remember reducing iterations benefited gromacs but caused some VE in
other FP benchmarks.
Regards,
Venkat.
> ---
> 2015-12-10 James Greenhalgh <[email protected]>
>
> * config/aarch64/aarch64.c (use_rsqrt_p): Always use software
> reciprocal sqrt for -mlow-precision-recip-sqrt.