All: Thank you all for looking at this - there are a large number of moving parts and I could easily be making incorrect assumptions. FWIW the highest weighting in the inputs I have are given to DDI0487L_a_a-profile and the query output from the actual desktops.
----- Please note that the Darwin assembler is Apple’s LLVM backend (invoked via clang -cc1as) and that means that whatever GCC deduces for the feature set and outputs into the asm in the .arch line has to mean the same to LLVM as it does to GCC. For example: I ran into a problem (that might or might not still exist) where specifying crc on top of a 8.4 spec dropped the base rev assumed back to 8.x where crc was introduced (that could be just a bug, but I have to live with it). So I need to check carefully when adding/subtracting features. (agreed the two toolchains should do the same thing - but if they don’t then I need a work- around). It might be nice, at some point, to have a controlled assembler for GCC - but at the moment 99.99+% of my downstream are using xcode to provide the ‘binutils’. ===== Kyrill: >> Some of the content is estimates/best guesses - based on the following >> public sources of information: >> * XNU (only for the Apple Implementer ID) >> * sysctl -a | grep hw on various M1, M2 and machines >> * AArch64.td from the Apple Open Source repo for LLVM. >> * What XCode-14 clang passes to cc1. >> > > How about the llvm/lib/TargetParser/Host.cpp in upstream LLVM for the part > numbers? > I see it has different values for the M1,M2,M3 ones that you have in your > patch. Looking at a recent version of that I see the host-side values that we use when doing the native query. These are obtained from the OS - not the chip directly. (its a sysctl call). What I do not see is the manufacturer/chip pairs that you get in /proc/cpuinfo. (of course I could be blind :) ) >> gcc/ChangeLog: >> >> * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add apple-a12, >> apple-m1, apple-m2, apple-m3. >> * config/aarch64/aarch64-tune.md: Regenerate. > > These need entries in the documentation too. Ack .. I will handle this once we settle things a bit. ===== Andrew: >> >> * Currently, we do not seem to have any way to specify that M2/M3 has support >> for FEAT_BTI, but because of missing feaures is not compliant with the Arm >> base rev that implies this. > > Since FEAT_BTI only adds hint instructions, I don't think any part of the > compiler actually checks for whether the feature is supported. Whether or not > to emit FEAT_BTI instructions is controlled by a different compiler option. I guess the question then is how do we enable it for apple-m2+ and not for the m1 (or are you saying it does not matter, since the lower revs would just treat the hint as NOP)? >> +/* Apple (A12 and M) cores based on Armv8. >> + Apple implementer ID from xnu, >> + Guesses for part # and suitable scheduler ident, generic_armv8_a for >> costs. >> + A12 seems mostly 8.3, >> + M1 seems to be 8.4 + extras (see comments in option-extensions about >> f16fml), >> + M2 mostly 8.5 but with missing mandatory features. >> + M3 is essentially the same as M2 for the features declared here. */ >> +AARCH64_CORE("apple-a12", applea12, cortexa53, V8_3A, (), generic_armv8_a, >> 0x61, 0x12, -1) >> +AARCH64_CORE("apple-m1", applem1, cortexa57, V8_4A, (F16, SB, SSBS), >> generic_armv8_a, 0x61, 0x23, -1) >> +AARCH64_CORE("apple-m2", applem2, cortexa57, V8_4A, (I8MM, BF16, F16, SB, >> SSBS), generic_armv8_a, 0x61, 0x23, -1) >> +AARCH64_CORE("apple-m3", applem3, cortexa57, V8_4A, (I8MM, BF16, F16, SB, >> SSBS), generic_armv8_a, 0x61, 0x23, -1) >> + > > Comparing to LLVM's AArch64Processors.td, this seems to be missing a few > things: > - Crpyto extensions (SHA2 and AES, and SHA3 from apple-m1); I do not see FEAT_SHA2 listed in either the Arm doc, or the output from the sysctl. FEAT_AES: 1 FEAT_SHA3: 1 So I’ve added those to the three entries. > - New flags I just added (FRINTTS and FLAGM2 from apple-m1); FEAT_FRINTTS: 1 FEAT_FlagM2: 1 So I;ve added those. > - PREDRES (from apple-m1) I cannot find FEAT_PREDRES … … however we do have FEAT_SPECRES: 0 AFAICT from the Arm doc DDI0487L_a_a-profile this is mandatory for 8.5 and is the reason i left it at 8.4, > If that's accurate, then I think you could list apple-m1 as V8_5A (although > LLVM only specifies V8_4A), and apple-m2 and apple-m3 as V8_6A (same as LLVM). > The only other difference from the increased architecture version would be to > enable a few more sysreg names (and our system register gating is an > inconsistent mess anyway). I am going to try a bootstrap and test cycle with the changes above (still based on 8.4 for now) and see how the output looks. thanks again for looking at this. Iain