Re: [RFC PATCH 8/8] softfloat: implement addsub_floats128 using Uint128 and new style code

Richard Henderson Tue, 20 Oct 2020 11:50:19 -0700

On 10/20/20 9:37 AM, Alex Bennée wrote:
> +static inline FloatParts128 unpack128_raw(FloatFmt fmt, Uint128 raw)
> +{
> +    const int sign_pos = fmt.frac_size + fmt.exp_size;
> +
> +    return (FloatParts128) {
> +        .cls = float_class_unclassified,
> +        .sign = extract128(raw, sign_pos, 1),
> +        .exp = extract128(raw, fmt.frac_size, fmt.exp_size),
> +        .frac = extract128(raw, 0, fmt.frac_size),
> +    };
> +}


This use of extract128 for sign and exp will not work for 32-bit.  You can't
just automatically truncate from __uint128_t to int in that case.

I don't think we should necessarily create this function, but rather leave it at

> +static inline FloatParts128 float128_unpack_raw(float128 f)
> +{
> +    return unpack128_raw(float128_params, uint128_make128(f.low, f.high));
> +}

... this one, and construct the FloatParts128 directly from the float128
components.  E.g.

    int f_size = float128_params.frac_size;
    int e_size = float128_params.exp_size;
    return (FloatParts128) {
       .sign = extract64(f.high, f_size + e_size - 64, 1);
       .exp = extract64(f.high, f_size - 64, e_size);
       .frac = extract128(int128_make128(f.low, f.high),
                          0, f_size);
    };

I don't want to over-generalize this just yet.


> +static inline Uint128 pack128_raw(FloatFmt fmt, FloatParts128 p)
> +{
> +    const int sign_pos = fmt.frac_size + fmt.exp_size;
> +    Uint128 ret = deposit128(p.frac, fmt.frac_size, fmt.exp_size, p.exp);
> +    return deposit128(ret, sign_pos, 1, p.sign);
> +}

Likewise, omit this and only have

> +static inline float128 float128_pack_raw(FloatParts128 p)
> +{
> +    Uint128 out = pack128_raw(float128_params, p);
> +    return make_float128(uint128_gethi(out), uint128_getlo(out));
> +}

this.


> +/* Almost exactly the same as sf_canonicalize except 128 bit */
> +static FloatParts128 sf128_canonicalize(FloatParts128 part, const FloatFmt 
> *parm,
> +                                        float_status *status)

I think we may have reached the point of diminishing returns on structure
returns.  This is a 196-bit struct, and will not be passed in registers
anymore.  It might be better to do

static void sf128_canonicalize(FloatParts128 *part,
                               const FloatFmt *parm,
                               float_status *status)

and modify the FloatParts128 in place.

> +    bool frac_is_zero = uint128_eq(part.frac, uint128_zero());

With Int128, we'd use !int128_nz().

> +/* As above but wider */
> +static FloatParts128 round128_canonical(FloatParts128 p, float_status *s,
> +                                        const FloatFmt *parm)
> +{
> +    /* Do these by hand rather than widening the FloatFmt structure */
> +    const Uint128 frac_lsb = uint128_lshift(1, DECOMPOSED128_BINARY_POINT - 
> parm->frac_size);

You can't pass constant 1 on 32-bit.
Maybe add a int128_makepow2(exp) function to make this easier?

> +        case float_round_nearest_even:
> +            overflow_norm = false;
> +            inc = ((frac & roundeven_mask) != frac_lsbm1 ? frac_lsbm1 : 0);

Can't use & or != on 32-bit.

> +            inc = frac & frac_lsb ? 0 : round_mask;
...
> +            if (frac & round_mask) {
...
> +                frac += inc;
> +                if (frac & DECOMPOSED128_OVERFLOW_BIT) {
> +                    frac >>= 1;
...
> +            frac >>= frac_shift;
...
> +                    frac = -1;
...
> +            if (frac & round_mask) {
> +                    inc = ((uint128_and(frac, roundeven_mask)) != frac_lsbm1
...
> +            if (exp == 0 && frac == 0) {
...
> +        frac = 0;
...
> +        frac = 0;

and more.  There are lots more later.

This is going to get ugly fast.  We need another solution.

> +static bool parts128_is_snan_frac(Uint128 frac, float_status *status)
> +{
> +    if (no_signaling_nans(status)) {
> +        return false;
> +    } else {
> +        bool msb = extract128(frac, DECOMPOSED128_BINARY_POINT - 1, 1);

Doesn't work for 32-bit.  Again, extract128 by itself is not the right
interface.  Do we in fact want to share code with the normal parts_is_snan_frac
by just passing in the high-part?


r~

Re: [RFC PATCH 8/8] softfloat: implement addsub_floats128 using Uint128 and new style code

Reply via email to