On Tue, Nov 05, 2024 at 05:12:56PM +0800, Hongtao Liu wrote:
> Yes, there's a mismatch between scalar and vector code, I assume users
> may not care much about precision/NAN/INF/denormal behaviors for
> vector code.
> Just like we support
> #define RECIP_MASK_DEFAULT (RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_SQRT)
>  but turn off
> RECIP_MASK_DIV | RECIP_MASK_SQRT.

Users who don't care should be using -ffast-math.  Users who do care
should get proper behavior.

> > I don't know what exactly the hw instructions do, whether they perform
> > everything needed properly or just subset of it or none of it,
> 
> Subset of it, hw instruction doesn't raise exceptions and always round
> to nearest (even). Output denormals are always flushed to zero and
> input denormals are always treated as zero. MXCSR is not consulted nor
> updated.

Does it turn the sNaNs into infinities or qNaNs silently?
Given the rounding, flag_rounding_math should avoid the hw instructions,
and either HONOR_NANS or HONOR_SNANS should be used to predicate that.

> > but the permutation fallback IMHO definitely needs to be guarded with
> > the same flags as scalar code.
> > For HONOR_NANS case or flag_rounding_math, the generic code (see expr.cc)
> > uses the libgcc fallback.  Otherwise, generic code has
> >           /* If we don't expect qNaNs nor sNaNs and can assume rounding
> >              to nearest, we can expand the conversion inline as
> >              (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16.  */
> > and the backend has
> > TARGET_SSE2 && flag_unsafe_math_optimizations && !HONOR_NANS (BFmode)
> > shift (i.e. just the permutation).
> > Note, even that (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16
> > is doable in vectors.
> 
> If you're concerned about that, I'll commit another patch to align the
> condition of the vector expander with scalar ones for both extendmn2
> and truncmn2.

For the fallback, for HONOR_NANS or flag_rounding_math we just shouldn't
use the fallback at all.  For flag_unsafe_math_optimizations, we can just
use the simple permutation, i.ew. fromi >> 16, otherwise can use that
(fromi + 0x7fff + ((fromi >> 16) & 1) followed by the permutation.

        Jakub

Reply via email to