max when appropriate

Richard Biener Wed, 18 Jul 2018 03:07:00 -0700

On Wed, Jul 18, 2018 at 11:50 AM Kyrill Tkachov
<[email protected]> wrote:
>
>
> On 18/07/18 10:44, Richard Biener wrote:
> > On Tue, Jul 17, 2018 at 3:46 PM Kyrill Tkachov
> > <[email protected]> wrote:
> >> Hi Richard,
> >>
> >> On 17/07/18 14:27, Richard Biener wrote:
> >>> On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov
> >>> <[email protected]> wrote:
> >>>> Hi all,
> >>>>
> >>>> This is my first Fortran patch, so apologies if I'm missing something.
> >>>> The current expansion of the min and max intrinsics explicitly expands
> >>>> the comparisons between each argument to calculate the global min/max.
> >>>> Some targets, like aarch64, have instructions that can calculate the 
> >>>> min/max
> >>>> of two real (floating-point) numbers with the proper NaN-handling 
> >>>> semantics
> >>>> (if both inputs are NaN, return Nan. If one is NaN, return the other) 
> >>>> and those
> >>>> are the semantics provided by the __builtin_fmin/max family of functions 
> >>>> that expand
> >>>> to these instructions.
> >>>>
> >>>> This patch makes the frontend emit __builtin_fmin/max directly to 
> >>>> compare each
> >>>> pair of numbers when the numbers are floating-point, and use 
> >>>> MIN_EXPR/MAX_EXPR otherwise
> >>>> (integral types and -ffast-math) which should hopefully be easier to 
> >>>> recognise in the
> >>> What is Fortrans requirement on min/max intrinsics?  Doesn't it only
> >>> require things that
> >>> are guaranteed by MIN/MAX_EXPR anyways?  The only restriction here is
> >> The current implementation expands to:
> >>       mvar = a1;
> >>       if (a2 .op. mvar || isnan (mvar))
> >>         mvar = a2;
> >>       if (a3 .op. mvar || isnan (mvar))
> >>         mvar = a3;
> >>       ...
> >>       return mvar;
> >>
> >> That is, if one of the operands is a NaN it will return the other argument.
> >> If both (all) are NaNs, it will return NaN. This is the same as the 
> >> semantics of fmin/max
> >> as far as I can tell.
> >>
> >>> /* Minimum and maximum values.  When used with floating point, if both
> >>>      operands are zeros, or if either operand is NaN, then it is 
> >>> unspecified
> >>>      which of the two operands is returned as the result.  */
> >>>
> >>> which means MIN/MAX_EXPR are not strictly IEEE compliant with signed
> >>> zeros or NaNs.
> >>> Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if 
> >>> singed
> >>> zeros are significant.
> >> True, MIN/MAX_EXPR would not be appropriate in that condition. I guarded 
> >> their use
> >> on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS (type) 
> >> && !HONOR_NANS (type).
> >>
> >>
> >>> I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR
> >>> is a good idea,
> >>> this may both generate bigger code and be slower.
> >> The patch will generate fmin/fmax calls (or the fminf,fminl variants) when 
> >> mathfn_built_in advertises
> >> them as available (does that mean they'll have a fast inline 
> >> implementation?)
> > This doesn't mean anything given you make them available with your
> > patch ;)  So I expect it may
> > cause issues for !c99_runtime targets (and long double at least).
>
> Urgh, that can cause headaches...
>
> >> If the above doesn't hold and we can't use either MIN/MAX_EXPR of 
> >> fmin/fmax then the patch falls back
> >> to the existing expansion.
> > As said I would not use fmin/fmax calls here at all.
>
> ... Given the comments from Thomas and Janne, maybe we should just emit 
> MIN/MAX_EXPRs here
> since there is no language requirement on NaN/signed zero handling on these 
> intrinsics?
> That should make it simpler and more portable.


That's fortran maintainers call.

> >> FWIW, this patch does improve performance on 521.wrf from SPEC2017 on 
> >> aarch64.
> > You said that, yes.  Even without -ffast-math?
>
> It improves at -O3 without -ffast-math in particular. With -ffast-math phiopt 
> optimisation
> is more aggressive and merges the conditionals into MIN/MAX_EXPRs 
> (minmax_replacement in tree-ssa-phiopt.c)

The question is will it be slower without -ffast-math, that is, when
fmin/max() calls are emitted rather
than inline conditionals.

I think a patch just using MAX/MIN_EXPR within the existing
constraints and otherwise falling back to
the current code would be more obvious and other changes should be
mande independently.

Richard.

> Thanks,
> Kyrill
>
> > Richard.
> >
> >> Thanks,
> >> Kyrill
> >>
> >>> Richard.
> >>>
> >>>> midend and optimise. The previous approach of generating the open-coded 
> >>>> version of that
> >>>> is used when we don't have an appropriate __builtin_fmin/max available.
> >>>> For example, for a configuration of x86_64-unknown-linux-gnu that I 
> >>>> tested there was no
> >>>> 128-bit __built_fminl available.
> >>>>
> >>>> With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions 
> >>>> being generated at -O3
> >>>> on aarch64 for 521.wrf from fprate SPEC2017 where none before were 
> >>>> generated
> >>>> (we were generating explicit comparisons and NaN checks). This gave a 
> >>>> 2.4% improvement
> >>>> in performance on a Cortex-A72.
> >>>>
> >>>> Bootstrapped and tested on aarch64-none-linux-gnu and 
> >>>> x86_64-unknown-linux-gnu.
> >>>>
> >>>> Ok for trunk?
> >>>> Thanks,
> >>>> Kyrill
> >>>>
> >>>> 2018-07-17  Kyrylo Tkachov  <[email protected]>
> >>>>
> >>>>        * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin,
> >>>>        __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf,
> >>>>        __builtin_fmaxl.
> >>>>        * trans-intrinsic.c: Include builtins.h.
> >>>>        (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or 
> >>>> MIN/MAX_EXPR
> >>>>        functions to calculate the min/max.
> >>>>
> >>>> 2018-07-17  Kyrylo Tkachov  <[email protected]>
> >>>>
> >>>>        * gfortran.dg/max_fmaxf.f90: New test.
> >>>>        * gfortran.dg/min_fminf.f90: Likewise.
> >>>>        * gfortran.dg/minmax_integer.f90: Likewise.
> >>>>        * gfortran.dg/max_fmaxl_aarch64.f90: Likewise.
> >>>>        * gfortran.dg/min_fminl_aarch64.f90: Likewise.
>

Re: [PATCH][Fortran] Use MIN/MAX_EXPR for intrinsics or __builtin_fmin/max when appropriate

Reply via email to