Re: [PATCH 3/16][ARM] Add float16x4_t intrinsics

Ramana Radhakrishnan Wed, 08 Jul 2015 01:36:06 -0700

I haven't seen the patch yet but here are my thoughts on where this should be 
going.


On 07/07/15 18:17, Alan Lawrence wrote:
> Kyrill Tkachov wrote:
>> On 07/07/15 17:34, Alan Lawrence wrote:
>>> Kyrill Tkachov wrote:
>>>> On 07/07/15 14:09, Kyrill Tkachov wrote:
>>>>> Hi Alan,
>>>>>
>>>>> On 07/07/15 13:34, Alan Lawrence wrote:
>>>>>> As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01335.html
>>>>> For some context, the reference for these is at:
>>>>> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf
>>>>>
>>>>> This patch is ok once you and Charles decide on how to proceed with the 
>>>>> two prerequisites.
>>>> On second thought, the ACLE document at 
>>>> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf
>>>>
>>>> says in 12.2.1:
>>>> "float16 types are only available when the __fp16 type is defined, i.e. 
>>>> when supported by the hardware"
>>> However, we support __fp16 whenever the user specifies -mfp16-format=ieee or
>>> -mfp16-format=alternative, regardless of whether we have hardware support 
>>> or not.
>>>
>>> (Without hardware support, gcc generates calls to  __gnu_f2h_ieee or
>>> __gnu_f2h_alternative instead of vcvtb.f16.f32, and  __gnu_h2f_ieee or
>>> __gnu_h2f_alternative instead of vcvtb.f32.f16. However, there is no way to
>>> support __fp16 just using those hardware instructions without caring about 
>>> which
>>> format is in use.)
>>
>> Hmmm... In my opinion intrinsics should aim to map to instructions rather 
>> than go away and
>> call library functions, but this is the existing functionality
>> that current users might depend on :(
> 
> Sorry - to clarify: currently we generate __gnu_f2h_ieee / __gnu_h2f_ieee, to 
> convert between single __fp16 and 'float' values, when there is no HW. 
> General operations on scalar __fp16 values are performed by converting to 
> float, performing operations on float, and converting back. The __fp16 type 
> is available and "usable" without HW support, but only when -mfp16-format is 
> specified.
> 
> (The existing) intrinsics operating on float16x[48] vectors (converting 
> to/from float32x4) are *not* available without hardware support; these 
> intrinsics *are* available without specifying -mfp16-format.
> 
> ACLE (4.1.2) allows toolchains to provide __fp16 when not implemented in HW, 
> even if this is not required.

The type should exist with the presence of the SIMD unit and all the intrinsics 
that treat this as a bag of bits should just work (TM). The only intrinsics to 
be guarded by mfpu=neon-fp16 should really be the intrinsics for the 
instructions that interpret the 16 bits as float16 types.

> 
>> CC'ing the ARM maintainers and Tejas for an ACLE perspective.
>> I think that we'd want to gate the definition of __fp16 on hardware 
>> availability as well
>> (the -mfpu option) rather than just arm_fp16_format but I'm not sure of the 
>> impact this will have
>> on existing users.

This is just a storage format in the scalar world and the ACLE allows folks to 
have fp16 support without hardware. There are helper routines for that which 
were put in in the first place for this purpose.

> 
> Sure....but do we require -mfpu *and* -mfp16-format? s/and/or/ ?   Do we 
> require -mfp16-format for float16x[48] intrinsics, or allow format-agnostic 
> code (as HW support allows us to!)?
> 

I'd say we require the mfpu option for the intrinsics that interpret the 
float16 type but there is no bearing on the float16 format being chosen for 
this purpose, the reason being that the actual instruction being emitted takes 
care of doing the right thing as per the format specified by the AHP bit in the 
FPSCR - This is unlike the scalar case where the compiler *needs* to know the 
fp16-format that the user intended to use in order to call the correct 
emulation function.

Thus in summary - 

1. -mfpu=neon implies the presence of the float16x(4/8) types and all the 
intrinsics that treat these values as bags of bits.
2. -mfpu=neon-fp16 implies the presence of the vcvt* intrinsics that are needed 
for the float16 types.

Thoughts ?

regards
Ramana





> 
> Cheers, Alan
>

Re: [PATCH 3/16][ARM] Add float16x4_t intrinsics

Reply via email to