On Wed, Jul 18, 2018 at 11:50 AM Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> wrote: > > > On 18/07/18 10:44, Richard Biener wrote: > > On Tue, Jul 17, 2018 at 3:46 PM Kyrill Tkachov > > <kyrylo.tkac...@foss.arm.com> wrote: > >> Hi Richard, > >> > >> On 17/07/18 14:27, Richard Biener wrote: > >>> On Tue, Jul 17, 2018 at 2:35 PM Kyrill Tkachov > >>> <kyrylo.tkac...@foss.arm.com> wrote: > >>>> Hi all, > >>>> > >>>> This is my first Fortran patch, so apologies if I'm missing something. > >>>> The current expansion of the min and max intrinsics explicitly expands > >>>> the comparisons between each argument to calculate the global min/max. > >>>> Some targets, like aarch64, have instructions that can calculate the > >>>> min/max > >>>> of two real (floating-point) numbers with the proper NaN-handling > >>>> semantics > >>>> (if both inputs are NaN, return Nan. If one is NaN, return the other) > >>>> and those > >>>> are the semantics provided by the __builtin_fmin/max family of functions > >>>> that expand > >>>> to these instructions. > >>>> > >>>> This patch makes the frontend emit __builtin_fmin/max directly to > >>>> compare each > >>>> pair of numbers when the numbers are floating-point, and use > >>>> MIN_EXPR/MAX_EXPR otherwise > >>>> (integral types and -ffast-math) which should hopefully be easier to > >>>> recognise in the > >>> What is Fortrans requirement on min/max intrinsics? Doesn't it only > >>> require things that > >>> are guaranteed by MIN/MAX_EXPR anyways? The only restriction here is > >> The current implementation expands to: > >> mvar = a1; > >> if (a2 .op. mvar || isnan (mvar)) > >> mvar = a2; > >> if (a3 .op. mvar || isnan (mvar)) > >> mvar = a3; > >> ... > >> return mvar; > >> > >> That is, if one of the operands is a NaN it will return the other argument. > >> If both (all) are NaNs, it will return NaN. This is the same as the > >> semantics of fmin/max > >> as far as I can tell. > >> > >>> /* Minimum and maximum values. When used with floating point, if both > >>> operands are zeros, or if either operand is NaN, then it is > >>> unspecified > >>> which of the two operands is returned as the result. */ > >>> > >>> which means MIN/MAX_EXPR are not strictly IEEE compliant with signed > >>> zeros or NaNs. > >>> Thus the correct test would be !HONOR_SIGNED_ZEROS && !HONOR_NANS if > >>> singed > >>> zeros are significant. > >> True, MIN/MAX_EXPR would not be appropriate in that condition. I guarded > >> their use > >> on !HONOR_NANS (type) only. I'll update it to !HONOR_SIGNED_ZEROS (type) > >> && !HONOR_NANS (type). > >> > >> > >>> I'm not sure if using fmin/max calls when we cannot use MIN/MAX_EXPR > >>> is a good idea, > >>> this may both generate bigger code and be slower. > >> The patch will generate fmin/fmax calls (or the fminf,fminl variants) when > >> mathfn_built_in advertises > >> them as available (does that mean they'll have a fast inline > >> implementation?) > > This doesn't mean anything given you make them available with your > > patch ;) So I expect it may > > cause issues for !c99_runtime targets (and long double at least). > > Urgh, that can cause headaches... > > >> If the above doesn't hold and we can't use either MIN/MAX_EXPR of > >> fmin/fmax then the patch falls back > >> to the existing expansion. > > As said I would not use fmin/fmax calls here at all. > > ... Given the comments from Thomas and Janne, maybe we should just emit > MIN/MAX_EXPRs here > since there is no language requirement on NaN/signed zero handling on these > intrinsics? > That should make it simpler and more portable.
That's fortran maintainers call. > >> FWIW, this patch does improve performance on 521.wrf from SPEC2017 on > >> aarch64. > > You said that, yes. Even without -ffast-math? > > It improves at -O3 without -ffast-math in particular. With -ffast-math phiopt > optimisation > is more aggressive and merges the conditionals into MIN/MAX_EXPRs > (minmax_replacement in tree-ssa-phiopt.c) The question is will it be slower without -ffast-math, that is, when fmin/max() calls are emitted rather than inline conditionals. I think a patch just using MAX/MIN_EXPR within the existing constraints and otherwise falling back to the current code would be more obvious and other changes should be mande independently. Richard. > Thanks, > Kyrill > > > Richard. > > > >> Thanks, > >> Kyrill > >> > >>> Richard. > >>> > >>>> midend and optimise. The previous approach of generating the open-coded > >>>> version of that > >>>> is used when we don't have an appropriate __builtin_fmin/max available. > >>>> For example, for a configuration of x86_64-unknown-linux-gnu that I > >>>> tested there was no > >>>> 128-bit __built_fminl available. > >>>> > >>>> With this patch I'm seeing more than 7000 FMINNM/FMAXNM instructions > >>>> being generated at -O3 > >>>> on aarch64 for 521.wrf from fprate SPEC2017 where none before were > >>>> generated > >>>> (we were generating explicit comparisons and NaN checks). This gave a > >>>> 2.4% improvement > >>>> in performance on a Cortex-A72. > >>>> > >>>> Bootstrapped and tested on aarch64-none-linux-gnu and > >>>> x86_64-unknown-linux-gnu. > >>>> > >>>> Ok for trunk? > >>>> Thanks, > >>>> Kyrill > >>>> > >>>> 2018-07-17 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > >>>> > >>>> * f95-lang.c (gfc_init_builtin_functions): Define __builtin_fmin, > >>>> __builtin_fminf, __builtin_fminl, __builtin_fmax, __builtin_fmaxf, > >>>> __builtin_fmaxl. > >>>> * trans-intrinsic.c: Include builtins.h. > >>>> (gfc_conv_intrinsic_minmax): Emit __builtin_fmin/max or > >>>> MIN/MAX_EXPR > >>>> functions to calculate the min/max. > >>>> > >>>> 2018-07-17 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > >>>> > >>>> * gfortran.dg/max_fmaxf.f90: New test. > >>>> * gfortran.dg/min_fminf.f90: Likewise. > >>>> * gfortran.dg/minmax_integer.f90: Likewise. > >>>> * gfortran.dg/max_fmaxl_aarch64.f90: Likewise. > >>>> * gfortran.dg/min_fminl_aarch64.f90: Likewise. >