On 2023/08/15 12:38, Tsukasa OI wrote: > On 2023/08/14 21:51, Jin Ma wrote: >> Hi Tsukasa, >> What a coincidence, I also implemented zfa extension, which also includes >> fli related instructions :) > > Hi, I'm glad to know that someone is working on this extension more > comprehensively (especially when "someone" is an experienced GCC > contributor). I prefer your patch set in general and glad to learn from > your patch set and your response that my approach was not *that* bad as > I expected. > > When a new extension gets available, I will be more confident making a > patch set for GCC (as I already do in GNU Binutils). > >> >> links: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627294.html >> >>> + if (!TARGET_HARD_FLOAT || !TARGET_ZFA) >>> + return result; >>> + switch (GET_MODE (x)) >>> + { >>> + case HFmode: >>> + /* Not only 'Zfhmin', either 'Zfh' or 'Zvfh' is required. */ >>> + if (!TARGET_ZFH && !TARGET_ZVFH) >> >> When Zvfh means that zfh is also on, so there may be no need to judge >> the TARGET_ZVFH here. By the way,the format here seems wrong, maybe 'tab' >> is needed for alignment? > > For indentation, I believe this is okay considering 3 indent (soft tab) > from the top (meaning 6 spaces). > > For specification requirements, I think I'm correct. > > The spec says that 'Zvfh' depends on 'Zve32f' and 'Zfhmin'. 'Zfhmin' is > a conversion-only 'Zfh' subset ('Zve32f' doesn't require any > FP16-related extensions). > > Note that "fli.h" requires 'Zfa' and ('Zfh' and/or 'Zvfh'). > > So, 'Zfh' alone will not be sufficient to check requirements to the > "fli.h" instruction. So, checking TARGET_ZFH || TARGET_ZVFH (for > existence of the "fli.h") should be correct and I think your patch needs > to be changed "in the long term". > > "In the long term" means that, current GNU Binutils has a bug which > "fli.h" requires 'Zfa' and 'Zfh' ('Zfa' and 'Zvfh' does not work). > My initial 'Zfa' proposal (improved by Christoph Müllner and upstreamed > into master) intentionally ignored this case because I assumed that > approval/ratification of 'Zvfh' will take some time and we have a time > to fix before a release of Binutils following approval of both 'Zfa' and > 'Zvfh' (it turned out to be wrong). > > cf. <https://sourceware.org/pipermail/binutils/2023-August/129006.html> > > So, "fixing" this part (on your patch) alone will not make the program > work (on the simulator) because current buggy GNU Binutils won't accept > it. I'm working on it on the GNU Binutils side.
Okay, the bug is fixed on GNU Binutils (master) and waiting approval from the release maintainer (for binutils-2_41-branch). Thanks, Tsukasa > >> >>> + return result; >>> + break; >>> + case SFmode: break; >>> + case DFmode: break; >> >> Maybe we still have to judge TARGET_DOUBLE_FLOAT? > > Indeed. I just missed that. > >> >>> + default: return result; >>> + } >>> + >>> + if (!CONST_DOUBLE_P (x)) >>> + return result; >> >> I think it might be better to judge whether x satisfies the CONST_DOUBLE_P >> before switch (GET_MODE (x)) above. > > That's correct. I think that's a part of leftover when I'm experimenting. > >> >>> + >>> + r = *CONST_DOUBLE_REAL_VALUE (x); >>> + >>> + if (REAL_VALUE_ISNAN (r)) >>> + { >>> + long reprs[2] = { 0 }; >>> + /* Compare with canonical NaN. */ >>> + switch (GET_MODE (x)) >>> + { >>> + case HFmode: >>> + reprs[0] = real_to_target (NULL, &r, >>> + float_mode_for_size (16).require ()); >>> + /* 0x7e00: Canonical NaN for binary16. */ >>> + if (reprs[0] != 0x7e00) >>> + return result; >>> + break; >>> + case SFmode: >>> + reprs[0] = real_to_target (NULL, &r, >>> + float_mode_for_size (32).require ()); >>> + /* 0x7fc00000: Canonical NaN for binary32. */ >>> + if (reprs[0] != 0x7fc00000) >>> + return result; >>> + break; >>> + case DFmode: >>> + real_to_target (reprs, &r, float_mode_for_size (64).require ()); >>> + if (FLOAT_WORDS_BIG_ENDIAN) >>> + std::swap (reprs[0], reprs[1]); >>> + /* 0x7ff80000_00000000: Canonical NaN for binary64. */ >>> + if (reprs[0] != 0 || reprs[1] != 0x7ff80000) >>> + return result; >>> + break; >>> + default: >>> + gcc_unreachable (); >>> + } >>> + result.type = RISCV_FLOAT_CONST_NAN; >>> + result.valid = true; >>> + return result; >>> + } >>> + else if (REAL_VALUE_ISINF (r)) >>> + { >>> + if (REAL_VALUE_NEGATIVE (r)) >>> + return result; >>> + result.type = RISCV_FLOAT_CONST_INF; >>> + result.valid = true; >>> + return result; >>> + } >>> + >>> + bool sign = REAL_VALUE_NEGATIVE (r); >>> + result.sign = sign; >>> + >>> + r = real_value_abs (&r); >>> + /* GCC internally does not use IEEE754-like encoding (where normalized >>> + significands are in the range [1, 2). GCC uses [0.5, 1) (see >>> real.cc). >>> + So, this exponent_p1 variable equals IEEE754 unbiased exponent + 1. >>> */ >>> + int exponent_p1 = REAL_EXP (&r); >>> + >>> + /* For the mantissa, we expand into two HOST_WIDE_INTS, apart from the >>> + highest (sign) bit, with a fixed binary point at bit point_pos. >>> + m1 holds the low part of the mantissa, m2 the high part. >>> + WARNING: If we ever have a representation using more than 2 * H_W_I - >>> 1 >>> + bits for the mantissa, this can fail (low bits will be lost). */ >>> + bool fail = false; >>> + real_ldexp (&m, &r, (2 * HOST_BITS_PER_WIDE_INT - 1) - exponent_p1); >>> + wide_int w = real_to_integer (&m, &fail, HOST_BITS_PER_WIDE_INT * 2); >>> + if (fail) >>> + return result; >>> + >>> + /* If the low part of the mantissa has bits set we cannot represent >>> + the value. */ >>> + if (w.ulow () != 0) >>> + return result; >>> + /* We have rejected the lower HOST_WIDE_INT, so update our >>> + understanding of how many bits lie in the mantissa and >>> + look only at the high HOST_WIDE_INT. */ >>> + unsigned HOST_WIDE_INT mantissa = w.elt (1); >>> + >>> + /* We cannot represent the value 0.0. */ >>> + if (mantissa == 0) >>> + return result; >>> + >>> + /* We can only represent values with a mantissa of the form 1.xx. */ >>> + unsigned HOST_WIDE_INT mask >>> + = ((unsigned HOST_WIDE_INT) 1 << (HOST_BITS_PER_WIDE_INT - 4)) - 1; >>> + if ((mantissa & mask) != 0) >>> + return result; >>> + mantissa >>= HOST_BITS_PER_WIDE_INT - 4; >>> + /* Now the lowest 3-bits of mantissa should form (1.xx)b. */ >>> + gcc_assert (mantissa & (1u << 2)); >>> + /* Mask out the highest bit. */ >>> + mantissa &= ~(1u << 2); >>> + >>> + if (mantissa == 0) >>> + { >>> + /* We cannot represent any values but -1.0. */ >>> + if (exponent_p1 != 1 && sign) >>> + return result; >>> + switch (exponent_p1) >>> + { >>> + case -15: /* 1.0 * 2^(-16) */ >>> + case -14: /* 1.0 * 2^(-15) */ >>> + case -7: /* 1.0 * 2^(- 8) */ >>> + case -6: /* 1.0 * 2^(- 7) */ >>> + case 8: /* 1.0 * 2^(+ 7) */ >>> + case 9: /* 1.0 * 2^(+ 8) */ >>> + case 16: /* 1.0 * 2^(+15) */ >>> + case 17: /* 1.0 * 2^(+16) */ >>> + break; >>> + default: >>> + if (exponent_p1 >= -3 && exponent_p1 <= 5) >>> + /* 1.0 * 2^[-4,4] */ >>> + break; >>> + switch (GET_MODE (x)) >>> + { >>> + case HFmode: /* IEEE 754 binary16. */ >>> + /* Minimum positive normal == 1.0 * 2^(-14) */ >>> + if (exponent_p1 != -13) return result; >>> + break; >>> + case SFmode: /* IEEE 754 binary32. */ >>> + /* Minimum positive normal == 1.0 * 2^(-126) */ >>> + if (exponent_p1 != -125) return result; >>> + break; >>> + case DFmode: /* IEEE 754 binary64. */ >>> + /* Minimum positive normal == 1.0 * 2^(-1022) */ >>> + if (exponent_p1 != -1021) return result; >>> + break; >>> + default: >>> + gcc_unreachable (); >>> + } >>> + result.type = RISCV_FLOAT_CONST_MIN; >>> + result.valid = true; >>> + return result; >>> + } >>> + } >>> + else >>> + { >>> + if (sign) >>> + return result; >>> + if (exponent_p1 < -1 || exponent_p1 > 2) >>> + return result; >>> + /* The value is (+1.xx)b * 2^[-2,1]. >>> + But we cannot represent (+1.11)b * 2^1 (that is 3.5). */ >>> + if (exponent_p1 == 2 && mantissa == 3) >>> + return result; >>> + } >>> + >>> + result.valid = true; >>> + result.mantissa_below_point = mantissa; >>> + result.biased_exponent = exponent_p1 + 15; >>> + >>> + return result; >>> +} >>> + >> >> This code is great and completely different from the way I implemented it. >> I'm not sure which one is better, but my idea is that the fli instruction >> corresponds to three tables (HF, SF and DF), all of which represent >> specific values. the library in gcc's real.h can very well convert >> the corresponding values into the values in the table, so it is only >> necessary to perform a simple binary search to look up the tables. > > Yup. My approach (based on AArch64's VMOV.F32 constraint checking code) > is more generic but I think constants with single FLI instruction don't > need to be that generic. > > If multi-instruction FLI sequence gets realistic, this kind of generic > approach (handling finite constants precisely) will be helpful (multi > FLI sequence with addition might need some additional measures to avoid > underflow, though). But for now, I think your approach is better and > simpler. > >> >> @@ -1362,17 +1545,14 @@ riscv_const_insns (rtx x) >> constant incurs a literal-pool access. Allow this in >> order to increase vectorization possibilities. */ >> int n = riscv_const_insns (elt); >> - if (CONST_DOUBLE_P (elt)) >> - return 1 + 4; /* vfmv.v.f + memory access. */ >>> + /* We need as many insns as it takes to load the constant >>> + into a GPR and one vmv.v.x. */ >>> + if (n != 0) >>> + return 1 + n; >>> + else if (CONST_DOUBLE_P (elt)) >>> + return 1 + 4; /* vfmv.v.f + memory access. */ >> else >> - { >> - /* We need as many insns as it takes to load the constant >> - into a GPR and one vmv.v.x. */ >> - if (n != 0) >> - return 1 + n; >> - else >> - return 1 + 4; /*vmv.v.x + memory access. */ >> - } >>> + return 1 + 4; /* vmv.v.x + memory access. */ >> } >> } >> >> I don't seem to understand here, if n = = 0, always return 1 + 4? >> If so, it could be >> if (n != 0) >> return 1 + n; >> else >> return 1 + 4; >> >> @@ -5117,6 +5313,36 @@ riscv_print_operand (FILE *file, rtx op, int letter) >> output_address (mode, XEXP (op, 0)); >> break; >> >>> + case CONST_DOUBLE: >>> + { >>> + struct riscv_float_fli_const flt = riscv_get_float_fli_const (op); >>> + if (flt.valid) >>> + { >>> + switch (flt.type) >>> + { >>> + case RISCV_FLOAT_CONST_MIN: >>> + fputs ("min", file); >>> + break; >>> + case RISCV_FLOAT_CONST_INF: >>> + fputs ("inf", file); >>> + break; >>> + case RISCV_FLOAT_CONST_NAN: >>> + fputs ("nan", file); >>> + break; >>> + default: >>> + /* Use simpler (and bit-perfect) printer. */ >>> + if (flt.sign) >>> + fputc ('-', file); >>> + fprintf (file, "0x1.%cp%+d", >>> + "048c"[flt.mantissa_below_point], >>> + (int) flt.biased_exponent - 16); >>> + break; >>> + } >>> + break; >>> + } >>> + } >>> + /* Fall through. */ >> >> Display floating-point values at the assembly level can refer llvm >> https://reviews.llvm.org/D145645. > > Thanks for the link. I personally prefer hexfloats to avoid precision > problems as possible and that's how GNU Binutils prints the FLI > constants. But that makes sense (and I feel decimals are okay). > >> >> It may also be necessary to deal with riscv_split_64bit_move_p >> and riscv_legitimize_const_move for rv32, otherwise the mov of >> DFmode on rv32 will be split into high 32-bit mov and low 32-bit >> mov, thus unable to generate fli instructions. > > Thanks for letting me know. I'm fighting against GCC's large code base > for future contribution and that's a lot of help for me. > > Thanks, > Tsukasa > >> >> Thanks, >> Jin >>