On Tue, Nov 05, 2024 at 05:12:56PM +0800, Hongtao Liu wrote:
> Yes, there's a mismatch between scalar and vector code, I assume users
> may not care much about precision/NAN/INF/denormal behaviors for
> vector code.
> Just like we support
> #define RECIP_MASK_DEFAULT (RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_SQRT)
> but turn off
> RECIP_MASK_DIV | RECIP_MASK_SQRT.
Users who don't care should be using -ffast-math. Users who do care
should get proper behavior.
> > I don't know what exactly the hw instructions do, whether they perform
> > everything needed properly or just subset of it or none of it,
>
> Subset of it, hw instruction doesn't raise exceptions and always round
> to nearest (even). Output denormals are always flushed to zero and
> input denormals are always treated as zero. MXCSR is not consulted nor
> updated.
Does it turn the sNaNs into infinities or qNaNs silently?
Given the rounding, flag_rounding_math should avoid the hw instructions,
and either HONOR_NANS or HONOR_SNANS should be used to predicate that.
> > but the permutation fallback IMHO definitely needs to be guarded with
> > the same flags as scalar code.
> > For HONOR_NANS case or flag_rounding_math, the generic code (see expr.cc)
> > uses the libgcc fallback. Otherwise, generic code has
> > /* If we don't expect qNaNs nor sNaNs and can assume rounding
> > to nearest, we can expand the conversion inline as
> > (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16. */
> > and the backend has
> > TARGET_SSE2 && flag_unsafe_math_optimizations && !HONOR_NANS (BFmode)
> > shift (i.e. just the permutation).
> > Note, even that (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16
> > is doable in vectors.
>
> If you're concerned about that, I'll commit another patch to align the
> condition of the vector expander with scalar ones for both extendmn2
> and truncmn2.
For the fallback, for HONOR_NANS or flag_rounding_math we just shouldn't
use the fallback at all. For flag_unsafe_math_optimizations, we can just
use the simple permutation, i.ew. fromi >> 16, otherwise can use that
(fromi + 0x7fff + ((fromi >> 16) & 1) followed by the permutation.
Jakub