https://gcc.gnu.org/bugzilla/show_bug.cgi?id=22326
--- Comment #9 from luoxhu at gcc dot gnu.org --- (In reply to Andrew Pinski from comment #6) > (In reply to luoxhu from comment #4) > > float foo(float f, float x, float y) { > > return (fabs(f)*x+y); > > } > > > > the input of fabs is float type, so use fabsf is enough here, drafted a > > patch to avoid double promotion when generating gimple if fabs could be > > replaced by fabsf as argument[0] is float type. > > what about adding something to match.pd for: > ABS<(float_convert)f> into (float_convert)ABS<f> > This is only valid prompting and not reducing the precision. Thanks, this is already implemented in fold-const.c, though not using match.pd and fabsf really. fabs will always convert arguments to double type first in front-end. And there are 3 kind of cases for this issue: 1) "return fabs(x);" tree fold_unary_loc (location_t loc, enum tree_code code, tree type, tree op0) { ... case ABS_EXPR: /* Convert fabs((double)float) into (double)fabsf(float). */ if (TREE_CODE (arg0) == NOP_EXPR && TREE_CODE (type) == REAL_TYPE) { tree targ0 = strip_float_extensions (arg0); if (targ0 != arg0) return fold_convert_loc (loc, type, fold_build1_loc (loc, ABS_EXPR, TREE_TYPE (targ0), targ0)); } return NULL_TREE; ... } This piece of code could convert the code from "(float)fabs((double)x)" to "(float)(double)(float)fabs(x)", then match.pd could remove the useless convert. 2) "return fabs(x)*y;" Frontend will generate "(float) (fabs((double) x) * (double) y)" expression first, then fold-const.c:fold_unary_loc will Convert fabs((double)float) into (double)fabsf(float) and get "(float)((double)fabs(x) * (double)y)", finally, match.pd will convert (outertype)((innertype0)a+(innertype1)b) into ((newtype)a+(newtype)b) to remove the double conversion. 3)"return fabs(x)*y + z;" Frontend produces: (float) ((fabs((double) float) * (double) y) + (double z)) So what we need here is to match the MUL&ADD in match.pd as followed, any comments? +(simplify (convert (plus (mult (convert@3 (abs @0)) (convert@4 @1)) (convert@5 @2))) + (if (( flag_unsafe_math_optimizations + && types_match (type, float_type_node) + && types_match (TREE_TYPE(@0), float_type_node) + && types_match (TREE_TYPE(@1), float_type_node) + && types_match (TREE_TYPE(@2), float_type_node) + && element_precision (TREE_TYPE(@3)) > element_precision (TREE_TYPE (@0)) + && element_precision (TREE_TYPE(@4)) > element_precision (TREE_TYPE (@1)) + && element_precision (TREE_TYPE(@5)) > element_precision (TREE_TYPE (@2)) + && ! HONOR_NANS (type) + && ! HONOR_INFINITIES (type))) + (plus (mult (abs @0) @1) @2) )) + 1) and 2) won't generate double conversion, only 3) has frsp in fast-math mode, and it could be removed by above pattern. PS: convert_to_real_1 seems to me not quite related here? It converts (float)sqrt((double)x) where x is float into sqrtf(x), but with recursive call to convert_to_real_1 and build_call_expr with new mathfn_built_in, I suppose it a bit complicated to move them to match.pd? The optimization should be under fast-math mode, is flag_unsafe_math_optimizations enough to guard them?