On Sat, Jan 11, 2025 at 01:21:13PM +0000, Iain Sandoe wrote:
> Hi,
> 
> I originally made this patch for the Darwin Arm64 development branch,
> however in discussions on IRC, it seems that it is also relevant to
> Linux - since there are implementations running on Apple hardware with
> the M1..3 CPUs.  It might also be helpful to the resolution of
> PR113257 - although it is not a solution on its own.
> 
> Bootstrapped and tested manually (that it gives the expected .arch lines)
> on aarch64-linux.
> 
> OK for trunk?
> thanks
> Iain
> 
> --- 8< ---
> 
> This covers the M1-M3 cores used in Apple desktop hardware that is also
> sometimes used with Linux as the OS.
> 
> It does not cover the wider range that might be used in iOS and other
> embedded platform versions.
> 
> Some of the content is estimates/best guesses - based on the following
> public sources of information:
>  * XNU (only for the Apple Implementer ID)
>  * sysctl -a | grep hw on various M1, M2 and machines
>  * AArch64.td from the Apple Open Source repo for LLVM.
>  * What XCode-14 clang passes to cc1.
> 
> Unfortunately, these sources are in conflict; in particular the clang-claimed
> feature set disagrees with the output of sysctl -a, and the base Arm revs.
> claimed in some cases miss features that ARM DDI 0487J.a lists as mandatory
> for the rev.
> 
> This latter point might not be actually significant - but for the sake of
> caution I've made the spec use the lower arch rev + the additional features
> that are consistently claimed by both sysctl and clang.
> 
> GCC does not seem to have a scheduler that is similar to the "Cyclone" one
> in LLVM - so I've guessed to use cortex57 (but, maybe we miss 8-issue, it's
> not clear - and my experience with the scheduler is ≈ 0).
> 
> Likewise we do not (yet) have specific cost models, so choose the generic
> Armv8 one.
> 
> Thus, the choices here are intended to be conservative.
> 
>  * Currently, we do not seem to have any way to specify that M2/M3 has support
>   for FEAT_BTI, but because of missing feaures is not compliant with the Arm
>   base rev that implies this.

Since FEAT_BTI only adds hint instructions, I don't think any part of the
compiler actually checks for whether the feature is supported.  Whether or not
to emit FEAT_BTI instructions is controlled by a different compiler option.

>  * Proper version numbers are not readily available.
>  * Since we have FIRESTORM/ICESTORM and similar pairs for the performance and
>    efficiency cores on various machines, perhaps we should be using a 
> big.LITTLE
>    configuration; OTOH currently, I have no idea if that is usable in any way
>    with the hardware as configured.
> 
> gcc/ChangeLog:
> 
>       * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add apple-a12,
>       apple-m1, apple-m2, apple-m3.
>       * config/aarch64/aarch64-tune.md: Regenerate.
> 
> Signed-off-by: Iain Sandoe <i...@sandoe.co.uk>
> ---
>  gcc/config/aarch64/aarch64-cores.def | 12 ++++++++++++
>  gcc/config/aarch64/aarch64-tune.md   |  2 +-
>  2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-cores.def 
> b/gcc/config/aarch64/aarch64-cores.def
> index caf61437d18..0bd3e80cf7f 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -173,6 +173,18 @@ AARCH64_CORE("cortex-a76.cortex-a55",  
> cortexa76cortexa55, cortexa53, V8_2A,  (F
>  AARCH64_CORE("cortex-r82", cortexr82, cortexa53, V8R, (), cortexa53, 0x41, 
> 0xd15, -1)
>  AARCH64_CORE("cortex-r82ae", cortexr82ae, cortexa53, V8R, (), cortexa53, 
> 0x41, 0xd14, -1)
>  
> +/* Apple (A12 and M) cores based on Armv8.
> +   Apple implementer ID from xnu,
> +   Guesses for part # and suitable scheduler ident, generic_armv8_a for 
> costs.
> +   A12 seems mostly 8.3,
> +   M1 seems to be 8.4 + extras (see comments in option-extensions about 
> f16fml),
> +   M2 mostly 8.5 but with missing mandatory features.
> +   M3 is essentially the same as M2 for the features declared here.  */
> +AARCH64_CORE("apple-a12", applea12, cortexa53, V8_3A,  (), generic_armv8_a, 
> 0x61, 0x12, -1)
> +AARCH64_CORE("apple-m1", applem1, cortexa57, V8_4A,  (F16, SB, SSBS), 
> generic_armv8_a, 0x61, 0x23, -1)
> +AARCH64_CORE("apple-m2", applem2, cortexa57, V8_4A,  (I8MM, BF16, F16, SB, 
> SSBS), generic_armv8_a, 0x61, 0x23, -1)
> +AARCH64_CORE("apple-m3", applem3, cortexa57, V8_4A,  (I8MM, BF16, F16, SB, 
> SSBS), generic_armv8_a, 0x61, 0x23, -1)
> +

Comparing to LLVM's AArch64Processors.td, this seems to be missing a few things:
- Crpyto extensions (SHA2 and AES, and SHA3 from apple-m1);
- New flags I just added (FRINTTS and FLAGM2 from apple-m1);
- PREDRES (from apple-m1)

If that's accurate, then I think you could list apple-m1 as V8_5A (although
LLVM only specifies V8_4A), and apple-m2 and apple-m3 as V8_6A (same as LLVM).
The only other difference from the increased architecture version would be to
enable a few more sysreg names (and our system register gating is an
inconsistent mess anyway).

Which of these features are missing from which of your sources?  I think we
should ideally align our feature enablement choices with LLVM if possible,
though there may be good reasons to disagree with their current choice.

>  /* Armv9.0-A Architecture Processors.  */
>  
>  /* Arm ('A') cores. */
> 
> -- 
> 2.39.2 (Apple Git-143)
> 

Reply via email to