Tamar Christina <tamar.christ...@arm.com> writes: > Thanks for the review, > Will get started on it but one question... > >> -----Original Message----- >> From: Richard Sandiford <richard.sandif...@arm.com> >> Sent: Monday, September 30, 2024 6:33 PM >> To: Tamar Christina <tamar.christ...@arm.com> >> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw >> <richard.earns...@arm.com>; Marcus Shawcroft >> <marcus.shawcr...@arm.com>; ktkac...@gcc.gnu.org >> Subject: Re: [PATCH 2/2]AArch64: support encoding integer immediates using >> floating point moves >> >> Tamar Christina <tamar.christ...@arm.com> writes: >> > Hi All, >> > >> > This patch extends our immediate SIMD generation cases to support >> > generating >> > integer immediates using floating point operation if the integer immediate >> > maps >> > to an exact FP value. >> > >> > As an example: >> > >> > uint32x4_t f1() { >> > return vdupq_n_u32(0x3f800000); >> > } >> > >> > currently generates: >> > >> > f1: >> > adrp x0, .LC0 >> > ldr q0, [x0, #:lo12:.LC0] >> > ret >> > >> > i.e. a load, but with this change: >> > >> > f1: >> > fmov v0.4s, 1.0e+0 >> > ret >> > >> > Such immediates are common in e.g. our Math routines in glibc because they >> > are >> > created to extract or mark part of an FP immediate as masks. >> >> I agree this is a good thing to do. The current code is too beholden >> to the original vector mode. This patch relaxes it so that it isn't >> beholden to the original mode's class (integer vs. float), but it would >> still be beholden to the original mode's element size. >> >> It looks like an alternative would be to remove: >> >> scalar_float_mode elt_float_mode; >> if (n_elts == 1 >> && is_a <scalar_float_mode> (elt_mode, &elt_float_mode)) >> { >> rtx elt = CONST_VECTOR_ENCODED_ELT (op, 0); >> if (aarch64_float_const_zero_rtx_p (elt) >> || aarch64_float_const_representable_p (elt)) >> { >> if (info) >> *info = simd_immediate_info (elt_float_mode, elt); >> return true; >> } >> } >> >> and instead insert code: >> >> /* Get the repeating 8-byte value as an integer. No endian correction >> is needed here because bytes is already in lsb-first order. */ >> unsigned HOST_WIDE_INT val64 = 0; >> for (unsigned int i = 0; i < 8; i++) >> val64 |= ((unsigned HOST_WIDE_INT) bytes[i % nbytes] >> << (i * BITS_PER_UNIT)); >> >> ---> here >> >> if (vec_flags & VEC_SVE_DATA) >> return aarch64_sve_valid_immediate (val64, info); >> else >> return aarch64_advsimd_valid_immediate (val64, info, which); >> >> that tries to reduce val64 to the smallest repeating pattern, >> then tries to interpret that pattern as a float. The reduction step >> could reuse the first part of aarch64_sve_valid_immediate, which >> calculates the narrowest repeating integer mode: >> >> scalar_int_mode mode = DImode; >> unsigned int val32 = val64 & 0xffffffff; >> if (val32 == (val64 >> 32)) >> { >> mode = SImode; >> unsigned int val16 = val32 & 0xffff; >> if (val16 == (val32 >> 16)) >> { >> mode = HImode; >> unsigned int val8 = val16 & 0xff; >> if (val8 == (val16 >> 8)) >> mode = QImode; >> } >> } >> >> This would give us the candidate integer mode, to which we could >> apply float_mode_for_size (...).exists, as in the patch. >> > > I was doubting whether it's safe to use this or not. That's why I listed > the modes using a switch statement. Namely I'm concerned about the > multiple float 16 format. It looks like from looking at the source of > float_mode_for_size that it just returns the first float mode, so makes it > pretty sensitive to the order of definition in aarch64/aarch64-modes.def. > > Is it safe to assume that storage only formats like BF16 will always be > listed after general compute types?
Ah yeah, fair point. In that case, I agree it'd be better to be explicit. Thanks, Richard