On Tue, Nov 05, 2024 at 05:12:56PM +0800, Hongtao Liu wrote: > Yes, there's a mismatch between scalar and vector code, I assume users > may not care much about precision/NAN/INF/denormal behaviors for > vector code. > Just like we support > #define RECIP_MASK_DEFAULT (RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_SQRT) > but turn off > RECIP_MASK_DIV | RECIP_MASK_SQRT.
Users who don't care should be using -ffast-math. Users who do care should get proper behavior. > > I don't know what exactly the hw instructions do, whether they perform > > everything needed properly or just subset of it or none of it, > > Subset of it, hw instruction doesn't raise exceptions and always round > to nearest (even). Output denormals are always flushed to zero and > input denormals are always treated as zero. MXCSR is not consulted nor > updated. Does it turn the sNaNs into infinities or qNaNs silently? Given the rounding, flag_rounding_math should avoid the hw instructions, and either HONOR_NANS or HONOR_SNANS should be used to predicate that. > > but the permutation fallback IMHO definitely needs to be guarded with > > the same flags as scalar code. > > For HONOR_NANS case or flag_rounding_math, the generic code (see expr.cc) > > uses the libgcc fallback. Otherwise, generic code has > > /* If we don't expect qNaNs nor sNaNs and can assume rounding > > to nearest, we can expand the conversion inline as > > (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16. */ > > and the backend has > > TARGET_SSE2 && flag_unsafe_math_optimizations && !HONOR_NANS (BFmode) > > shift (i.e. just the permutation). > > Note, even that (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16 > > is doable in vectors. > > If you're concerned about that, I'll commit another patch to align the > condition of the vector expander with scalar ones for both extendmn2 > and truncmn2. For the fallback, for HONOR_NANS or flag_rounding_math we just shouldn't use the fallback at all. For flag_unsafe_math_optimizations, we can just use the simple permutation, i.ew. fromi >> 16, otherwise can use that (fromi + 0x7fff + ((fromi >> 16) & 1) followed by the permutation. Jakub