On 10/20/20 9:37 AM, Alex Bennée wrote:
> +static inline FloatParts128 unpack128_raw(FloatFmt fmt, Uint128 raw)
> +{
> + const int sign_pos = fmt.frac_size + fmt.exp_size;
> +
> + return (FloatParts128) {
> + .cls = float_class_unclassified,
> + .sign = extract128(raw, sign_pos, 1),
> + .exp = extract128(raw, fmt.frac_size, fmt.exp_size),
> + .frac = extract128(raw, 0, fmt.frac_size),
> + };
> +}
This use of extract128 for sign and exp will not work for 32-bit. You can't
just automatically truncate from __uint128_t to int in that case.
I don't think we should necessarily create this function, but rather leave it at
> +static inline FloatParts128 float128_unpack_raw(float128 f)
> +{
> + return unpack128_raw(float128_params, uint128_make128(f.low, f.high));
> +}
... this one, and construct the FloatParts128 directly from the float128
components. E.g.
int f_size = float128_params.frac_size;
int e_size = float128_params.exp_size;
return (FloatParts128) {
.sign = extract64(f.high, f_size + e_size - 64, 1);
.exp = extract64(f.high, f_size - 64, e_size);
.frac = extract128(int128_make128(f.low, f.high),
0, f_size);
};
I don't want to over-generalize this just yet.
> +static inline Uint128 pack128_raw(FloatFmt fmt, FloatParts128 p)
> +{
> + const int sign_pos = fmt.frac_size + fmt.exp_size;
> + Uint128 ret = deposit128(p.frac, fmt.frac_size, fmt.exp_size, p.exp);
> + return deposit128(ret, sign_pos, 1, p.sign);
> +}
Likewise, omit this and only have
> +static inline float128 float128_pack_raw(FloatParts128 p)
> +{
> + Uint128 out = pack128_raw(float128_params, p);
> + return make_float128(uint128_gethi(out), uint128_getlo(out));
> +}
this.
> +/* Almost exactly the same as sf_canonicalize except 128 bit */
> +static FloatParts128 sf128_canonicalize(FloatParts128 part, const FloatFmt
> *parm,
> + float_status *status)
I think we may have reached the point of diminishing returns on structure
returns. This is a 196-bit struct, and will not be passed in registers
anymore. It might be better to do
static void sf128_canonicalize(FloatParts128 *part,
const FloatFmt *parm,
float_status *status)
and modify the FloatParts128 in place.
> + bool frac_is_zero = uint128_eq(part.frac, uint128_zero());
With Int128, we'd use !int128_nz().
> +/* As above but wider */
> +static FloatParts128 round128_canonical(FloatParts128 p, float_status *s,
> + const FloatFmt *parm)
> +{
> + /* Do these by hand rather than widening the FloatFmt structure */
> + const Uint128 frac_lsb = uint128_lshift(1, DECOMPOSED128_BINARY_POINT -
> parm->frac_size);
You can't pass constant 1 on 32-bit.
Maybe add a int128_makepow2(exp) function to make this easier?
> + case float_round_nearest_even:
> + overflow_norm = false;
> + inc = ((frac & roundeven_mask) != frac_lsbm1 ? frac_lsbm1 : 0);
Can't use & or != on 32-bit.
> + inc = frac & frac_lsb ? 0 : round_mask;
...
> + if (frac & round_mask) {
...
> + frac += inc;
> + if (frac & DECOMPOSED128_OVERFLOW_BIT) {
> + frac >>= 1;
...
> + frac >>= frac_shift;
...
> + frac = -1;
...
> + if (frac & round_mask) {
> + inc = ((uint128_and(frac, roundeven_mask)) != frac_lsbm1
...
> + if (exp == 0 && frac == 0) {
...
> + frac = 0;
...
> + frac = 0;
and more. There are lots more later.
This is going to get ugly fast. We need another solution.
> +static bool parts128_is_snan_frac(Uint128 frac, float_status *status)
> +{
> + if (no_signaling_nans(status)) {
> + return false;
> + } else {
> + bool msb = extract128(frac, DECOMPOSED128_BINARY_POINT - 1, 1);
Doesn't work for 32-bit. Again, extract128 by itself is not the right
interface. Do we in fact want to share code with the normal parts_is_snan_frac
by just passing in the high-part?
r~