On Sat, Jan 11, 2025 at 01:21:13PM +0000, Iain Sandoe wrote: > Hi, > > I originally made this patch for the Darwin Arm64 development branch, > however in discussions on IRC, it seems that it is also relevant to > Linux - since there are implementations running on Apple hardware with > the M1..3 CPUs. It might also be helpful to the resolution of > PR113257 - although it is not a solution on its own. > > Bootstrapped and tested manually (that it gives the expected .arch lines) > on aarch64-linux. > > OK for trunk? > thanks > Iain > > --- 8< --- > > This covers the M1-M3 cores used in Apple desktop hardware that is also > sometimes used with Linux as the OS. > > It does not cover the wider range that might be used in iOS and other > embedded platform versions. > > Some of the content is estimates/best guesses - based on the following > public sources of information: > * XNU (only for the Apple Implementer ID) > * sysctl -a | grep hw on various M1, M2 and machines > * AArch64.td from the Apple Open Source repo for LLVM. > * What XCode-14 clang passes to cc1. > > Unfortunately, these sources are in conflict; in particular the clang-claimed > feature set disagrees with the output of sysctl -a, and the base Arm revs. > claimed in some cases miss features that ARM DDI 0487J.a lists as mandatory > for the rev. > > This latter point might not be actually significant - but for the sake of > caution I've made the spec use the lower arch rev + the additional features > that are consistently claimed by both sysctl and clang. > > GCC does not seem to have a scheduler that is similar to the "Cyclone" one > in LLVM - so I've guessed to use cortex57 (but, maybe we miss 8-issue, it's > not clear - and my experience with the scheduler is ≈ 0). > > Likewise we do not (yet) have specific cost models, so choose the generic > Armv8 one. > > Thus, the choices here are intended to be conservative. > > * Currently, we do not seem to have any way to specify that M2/M3 has support > for FEAT_BTI, but because of missing feaures is not compliant with the Arm > base rev that implies this.
Since FEAT_BTI only adds hint instructions, I don't think any part of the compiler actually checks for whether the feature is supported. Whether or not to emit FEAT_BTI instructions is controlled by a different compiler option. > * Proper version numbers are not readily available. > * Since we have FIRESTORM/ICESTORM and similar pairs for the performance and > efficiency cores on various machines, perhaps we should be using a > big.LITTLE > configuration; OTOH currently, I have no idea if that is usable in any way > with the hardware as configured. > > gcc/ChangeLog: > > * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add apple-a12, > apple-m1, apple-m2, apple-m3. > * config/aarch64/aarch64-tune.md: Regenerate. > > Signed-off-by: Iain Sandoe <i...@sandoe.co.uk> > --- > gcc/config/aarch64/aarch64-cores.def | 12 ++++++++++++ > gcc/config/aarch64/aarch64-tune.md | 2 +- > 2 files changed, 13 insertions(+), 1 deletion(-) > > diff --git a/gcc/config/aarch64/aarch64-cores.def > b/gcc/config/aarch64/aarch64-cores.def > index caf61437d18..0bd3e80cf7f 100644 > --- a/gcc/config/aarch64/aarch64-cores.def > +++ b/gcc/config/aarch64/aarch64-cores.def > @@ -173,6 +173,18 @@ AARCH64_CORE("cortex-a76.cortex-a55", > cortexa76cortexa55, cortexa53, V8_2A, (F > AARCH64_CORE("cortex-r82", cortexr82, cortexa53, V8R, (), cortexa53, 0x41, > 0xd15, -1) > AARCH64_CORE("cortex-r82ae", cortexr82ae, cortexa53, V8R, (), cortexa53, > 0x41, 0xd14, -1) > > +/* Apple (A12 and M) cores based on Armv8. > + Apple implementer ID from xnu, > + Guesses for part # and suitable scheduler ident, generic_armv8_a for > costs. > + A12 seems mostly 8.3, > + M1 seems to be 8.4 + extras (see comments in option-extensions about > f16fml), > + M2 mostly 8.5 but with missing mandatory features. > + M3 is essentially the same as M2 for the features declared here. */ > +AARCH64_CORE("apple-a12", applea12, cortexa53, V8_3A, (), generic_armv8_a, > 0x61, 0x12, -1) > +AARCH64_CORE("apple-m1", applem1, cortexa57, V8_4A, (F16, SB, SSBS), > generic_armv8_a, 0x61, 0x23, -1) > +AARCH64_CORE("apple-m2", applem2, cortexa57, V8_4A, (I8MM, BF16, F16, SB, > SSBS), generic_armv8_a, 0x61, 0x23, -1) > +AARCH64_CORE("apple-m3", applem3, cortexa57, V8_4A, (I8MM, BF16, F16, SB, > SSBS), generic_armv8_a, 0x61, 0x23, -1) > + Comparing to LLVM's AArch64Processors.td, this seems to be missing a few things: - Crpyto extensions (SHA2 and AES, and SHA3 from apple-m1); - New flags I just added (FRINTTS and FLAGM2 from apple-m1); - PREDRES (from apple-m1) If that's accurate, then I think you could list apple-m1 as V8_5A (although LLVM only specifies V8_4A), and apple-m2 and apple-m3 as V8_6A (same as LLVM). The only other difference from the increased architecture version would be to enable a few more sysreg names (and our system register gating is an inconsistent mess anyway). Which of these features are missing from which of your sources? I think we should ideally align our feature enablement choices with LLVM if possible, though there may be good reasons to disagree with their current choice. > /* Armv9.0-A Architecture Processors. */ > > /* Arm ('A') cores. */ > > -- > 2.39.2 (Apple Git-143) >