Tamar Christina <tamar.christ...@arm.com> writes:
> Thanks for the review,
> Will get started on it but one question...
>
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandif...@arm.com>
>> Sent: Monday, September 30, 2024 6:33 PM
>> To: Tamar Christina <tamar.christ...@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw
>> <richard.earns...@arm.com>; Marcus Shawcroft
>> <marcus.shawcr...@arm.com>; ktkac...@gcc.gnu.org
>> Subject: Re: [PATCH 2/2]AArch64: support encoding integer immediates using
>> floating point moves
>> 
>> Tamar Christina <tamar.christ...@arm.com> writes:
>> > Hi All,
>> >
>> > This patch extends our immediate SIMD generation cases to support 
>> > generating
>> > integer immediates using floating point operation if the integer immediate 
>> > maps
>> > to an exact FP value.
>> >
>> > As an example:
>> >
>> > uint32x4_t f1() {
>> >     return vdupq_n_u32(0x3f800000);
>> > }
>> >
>> > currently generates:
>> >
>> > f1:
>> >         adrp    x0, .LC0
>> >         ldr     q0, [x0, #:lo12:.LC0]
>> >         ret
>> >
>> > i.e. a load, but with this change:
>> >
>> > f1:
>> >         fmov    v0.4s, 1.0e+0
>> >         ret
>> >
>> > Such immediates are common in e.g. our Math routines in glibc because they 
>> > are
>> > created to extract or mark part of an FP immediate as masks.
>> 
>> I agree this is a good thing to do.  The current code is too beholden
>> to the original vector mode.  This patch relaxes it so that it isn't
>> beholden to the original mode's class (integer vs. float), but it would
>> still be beholden to the original mode's element size.
>> 
>> It looks like an alternative would be to remove:
>> 
>>   scalar_float_mode elt_float_mode;
>>   if (n_elts == 1
>>       && is_a <scalar_float_mode> (elt_mode, &elt_float_mode))
>>     {
>>       rtx elt = CONST_VECTOR_ENCODED_ELT (op, 0);
>>       if (aarch64_float_const_zero_rtx_p (elt)
>>        || aarch64_float_const_representable_p (elt))
>>      {
>>        if (info)
>>          *info = simd_immediate_info (elt_float_mode, elt);
>>        return true;
>>      }
>>     }
>> 
>> and instead insert code:
>> 
>>   /* Get the repeating 8-byte value as an integer.  No endian correction
>>      is needed here because bytes is already in lsb-first order.  */
>>   unsigned HOST_WIDE_INT val64 = 0;
>>   for (unsigned int i = 0; i < 8; i++)
>>     val64 |= ((unsigned HOST_WIDE_INT) bytes[i % nbytes]
>>            << (i * BITS_PER_UNIT));
>> 
>> ---> here
>> 
>>   if (vec_flags & VEC_SVE_DATA)
>>     return aarch64_sve_valid_immediate (val64, info);
>>   else
>>     return aarch64_advsimd_valid_immediate (val64, info, which);
>> 
>> that tries to reduce val64 to the smallest repeating pattern,
>> then tries to interpret that pattern as a float.  The reduction step
>> could reuse the first part of aarch64_sve_valid_immediate, which
>> calculates the narrowest repeating integer mode:
>> 
>>   scalar_int_mode mode = DImode;
>>   unsigned int val32 = val64 & 0xffffffff;
>>   if (val32 == (val64 >> 32))
>>     {
>>       mode = SImode;
>>       unsigned int val16 = val32 & 0xffff;
>>       if (val16 == (val32 >> 16))
>>      {
>>        mode = HImode;
>>        unsigned int val8 = val16 & 0xff;
>>        if (val8 == (val16 >> 8))
>>          mode = QImode;
>>      }
>>     }
>> 
>> This would give us the candidate integer mode, to which we could
>> apply float_mode_for_size (...).exists, as in the patch.
>> 
>
> I was doubting whether it's safe to use this or not.  That's why I listed
> the modes using a switch statement.  Namely I'm concerned about the
> multiple float 16 format.  It looks like from looking at the source of
> float_mode_for_size that it just returns the first float mode, so makes it
> pretty sensitive to the order of definition in aarch64/aarch64-modes.def.
>
> Is it safe to assume that storage only formats like BF16 will always be
> listed after general compute types?

Ah yeah, fair point.  In that case, I agree it'd be better to be explicit.

Thanks,
Richard

Reply via email to