max when appropriate

Kyrill Tkachov Wed, 18 Jul 2018 02:51:02 -0700


On 18/07/18 10:44, Richard Biener wrote:

On Tue, Jul 17, 2018 at 3:46 PM Kyrill Tkachov
<kyrylo.tkac...@foss.arm.com> wrote:

Hi Richard,

On 17/07/18 14:27, Richard Biener wrote:

On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
<kyrylo.tkac...@foss.arm.com> wrote:

Hi all,

This is my first Fortran patch, so apologies if I'm missing something.
The current expansion of the min and max intrinsics explicitly expands
the comparisons between each argument to calculate the global min/max.
Some targets, like aarch64, have instructions that can calculate the min/max
of two real (floating-point) numbers with the proper NaN-handling semantics
(if both inputs are NaN, return Nan. If one is NaN, return the other) and those
are the semantics provided by the __builtin_fmin/max family of functions that 
expand
to these instructions.

This patch makes the frontend emit __builtin_fmin/max directly to compare each
pair of numbers when the numbers are floating-point, and use MIN_EXPR/MAX_EXPR 
otherwise
(integral types and -ffast-math) which should hopefully be easier to recognise 
in the

What is Fortrans requirement on min/max intrinsics?  Doesn't it only
require things that
are guaranteed by MIN/MAX_EXPR anyways?  The only restriction here is

The current implementation expands to:
      mvar = a1;
      if (a2 .op. mvar || isnan (mvar))
        mvar = a2;
      if (a3 .op. mvar || isnan (mvar))
        mvar = a3;
      ...
      return mvar;

That is, if one of the operands is a NaN it will return the other argument.
If both (all) are NaNs, it will return NaN. This is the same as the semantics 
of fmin/max
as far as I can tell.

/* Minimum and maximum values.  When used with floating point, if both
     operands are zeros, or if either operand is NaN, then it is unspecified
     which of the two operands is returned as the result.  */

which means MIN/MAX_EXPR are not strictly IEEE compliant with signed
zeros or NaNs.
Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if singed
zeros are significant.

True, MIN/MAX_EXPR would not be appropriate in that condition. I guarded their 
use
on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS (type) && 
!HONOR_NANS (type).

I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR
is a good idea,
this may both generate bigger code and be slower.

The patch will generate fmin/fmax calls (or the fminf,fminl variants) when 
mathfn_built_in advertises
them as available (does that mean they'll have a fast inline implementation?)

This doesn't mean anything given you make them available with your
patch ;)  So I expect it may
cause issues for !c99_runtime targets (and long double at least).


Urgh, that can cause headaches...

If the above doesn't hold and we can't use either MIN/MAX_EXPR of fmin/fmax 
then the patch falls back
to the existing expansion.

As said I would not use fmin/fmax calls here at all.


... Given the comments from Thomas and Janne, maybe we should just emit 
MIN/MAX_EXPRs here
since there is no language requirement on NaN/signed zero handling on these 
intrinsics?
That should make it simpler and more portable.

FWIW, this patch does improve performance on 521.wrf from SPEC2017 on aarch64.

You said that, yes.  Even without -ffast-math?


It improves at -O3 without -ffast-math in particular. With -ffast-math phiopt 
optimisation
is more aggressive and merges the conditionals into MIN/MAX_EXPRs 
(minmax_replacement in tree-ssa-phiopt.c)

Thanks,
Kyrill

Richard.

Thanks,
Kyrill

Richard.

midend and optimise. The previous approach of generating the open-coded version 
of that
is used when we don't have an appropriate __builtin_fmin/max available.
For example, for a configuration of x86_64-unknown-linux-gnu that I tested 
there was no
128-bit __built_fminl available.

With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions being 
generated at -O3
on aarch64 for 521.wrf from fprate SPEC2017 where none before were generated
(we were generating explicit comparisons and NaN checks). This gave a 2.4% 
improvement
in performance on a Cortex-A72.

Bootstrapped and tested on aarch64-none-linux-gnu and x86_64-unknown-linux-gnu.

Ok for trunk?
Thanks,
Kyrill

2018-07-17  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

       * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
       __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
       __builtin_fmaxl.
       * trans-intrinsic.c: Include builtins.h.
       (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or MIN/MAX_EXPR
       functions to calculate the min/max.

2018-07-17  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

       * gfortran.dg/max_fmaxf.f90: New test.
       * gfortran.dg/min_fminf.f90: Likewise.
       * gfortran.dg/minmax_integer.f90: Likewise.
       * gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
       * gfortran.dg/min_fminl_aarch64.f90: Likewise.

Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate

Reply via email to