On Sun, Jan 19, 2025 at 09:14:17PM +0000, Iain Sandoe wrote:
> All:
> 
> Thank you all for looking at this - there are a large number of moving parts 
> and I could
> easily be making incorrect assumptions.  FWIW the highest weighting in the 
> inputs I have
> are given to DDI0487L_a_a-profile and the query output from the actual 
> desktops.
> 
> -----
> 
> Please note that the Darwin assembler is Apple’s LLVM backend (invoked via 
> clang -cc1as)
> and that means that whatever GCC deduces for the feature set and outputs into 
> the asm
> in the .arch line has to mean the same to LLVM as it does to GCC.
> 
> For example:
> I ran into a problem (that might or might not still exist) where specifying 
> crc on top of a
> 8.4 spec dropped the base rev assumed back to 8.x where crc was introduced 
> (that could
> be just a bug, but I have to live with it).

I remembered reading about this issue before; after some searching I discoved
that it was via another email from you in 2023:
https://gcc.gnu.org/pipermail/gcc/2023-October/242748.html

So I did some more digging.  I believe the CRC workaround was added to handle
misspecified CPUs in Binutils.  In particular:

- +crc only existed from Feb 2013 (e60bb1dd3)
- cortex-a53 and cortex-a57 were missing CRC from Jan 2013 (95830fd17)
  to Nov 2014 (02c135512)
- thunderx was missing CRC from Oct 2014 (55fbd9927) to Apr 2015 (faade8513)
- armv8.1-a was missing CRC from Jun 2015 (88f0ea342) to Dec 2015 (af117b3cf)
- armv8.2-a was missing CRC from Nov 2015 (acb787b03) to Dec 2015 (af117b3cf)
- Binutils 2.25 was released Jan 2015, and Binutils 2.26 was released Jan 2016.

I think the misspecified CPUs existed in one or more Binutils releases, but the
misspecified architectures never made it into a released version.  This means
that we could tighten up the CRC workaround to only apply when the base
architecture version is armv8-a, which satisfies both buggy assemblers.

> 
> So I need to check carefully when adding/subtracting features.
> (agreed the two toolchains should do the same thing - but if they don’t then 
> I need a work-
>  around).
> 
> It might be nice, at some point, to have a controlled assembler for GCC - but 
> at the moment
> 99.99+% of my downstream are using xcode to provide the ‘binutils’.
> 
> =====
> 
> Kyrill:
> 
> >> Some of the content is estimates/best guesses - based on the following
> >> public sources of information:
> >> * XNU (only for the Apple Implementer ID)
> >> * sysctl -a | grep hw on various M1, M2 and machines
> >> * AArch64.td from the Apple Open Source repo for LLVM.
> >> * What XCode-14 clang passes to cc1.
> >> 
> > 
> > How about the llvm/lib/TargetParser/Host.cpp in upstream LLVM for the part 
> > numbers?
> > I see it has different values for the M1,M2,M3 ones that you have in your 
> > patch.
> 
> Looking at a recent version of that I see the host-side values that we use 
> when 
> doing the native query.  These are obtained from the OS - not the chip 
> directly.
> (its a sysctl call).
> 
> What I do not see is the manufacturer/chip pairs that you get in 
> /proc/cpuinfo.
> (of course I could be blind :) )

I tried looking myself, and the only relevant stuff I found uses hw.cpufamily
instead.

> 
> >> gcc/ChangeLog:
> >> 
> >> * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add apple-a12,
> >> apple-m1, apple-m2, apple-m3.
> >> * config/aarch64/aarch64-tune.md: Regenerate.
> > 
> > These need entries in the documentation too.
> 
> Ack .. I will handle this once we settle things a bit.
> 
> =====
> 
> Andrew:
> >> 
> >> * Currently, we do not seem to have any way to specify that M2/M3 has 
> >> support
> >>  for FEAT_BTI, but because of missing feaures is not compliant with the Arm
> >>  base rev that implies this.
> > 
> > Since FEAT_BTI only adds hint instructions, I don't think any part of the
> > compiler actually checks for whether the feature is supported.  Whether or 
> > not
> > to emit FEAT_BTI instructions is controlled by a different compiler option.
> 
> I guess the question then is how do we enable it for apple-m2+ and not for the
> m1 (or are you saying it does not matter, since the lower revs would just 
> treat
> the hint as NOP)?

I'm saying it doesn't matter.

> 
> >> +/* Apple (A12 and M) cores based on Armv8.
> >> +   Apple implementer ID from xnu,
> >> +   Guesses for part # and suitable scheduler ident, generic_armv8_a for 
> >> costs.
> >> +   A12 seems mostly 8.3,
> >> +   M1 seems to be 8.4 + extras (see comments in option-extensions about 
> >> f16fml),
> >> +   M2 mostly 8.5 but with missing mandatory features.
> >> +   M3 is essentially the same as M2 for the features declared here.  */
> >> +AARCH64_CORE("apple-a12", applea12, cortexa53, V8_3A,  (), 
> >> generic_armv8_a, 0x61, 0x12, -1)
> >> +AARCH64_CORE("apple-m1", applem1, cortexa57, V8_4A,  (F16, SB, SSBS), 
> >> generic_armv8_a, 0x61, 0x23, -1)
> >> +AARCH64_CORE("apple-m2", applem2, cortexa57, V8_4A,  (I8MM, BF16, F16, 
> >> SB, SSBS), generic_armv8_a, 0x61, 0x23, -1)
> >> +AARCH64_CORE("apple-m3", applem3, cortexa57, V8_4A,  (I8MM, BF16, F16, 
> >> SB, SSBS), generic_armv8_a, 0x61, 0x23, -1)
> >> +

Are the Apple cpus actually big/little implementations, in which case there
will be two different part ids?  If so, then perhaps this should combine the
two part numbers using the AARCH64_BIG_LITTLE macro.  I'm not totally sure,
however, since I don't see other recent implementation handled in this manner.

> > 
> > Comparing to LLVM's AArch64Processors.td, this seems to be missing a few 
> > things:
> > - Crpyto extensions (SHA2 and AES, and SHA3 from apple-m1);
> 
> I do not see FEAT_SHA2 listed in either the Arm doc, or the output from the 
> sysctl.
> FEAT_AES: 1
> FEAT_SHA3: 1
> So I’ve added those to the three entries.

There some architecture feature names that are effectively aliases in the spec,
although identifying this requires reading the restrictions of the id register
fields (and at least one version of the spec accidentally omitted one of the
dependencies).  In summary:
- +sha2 = FEAT_SHA1 and FEAT_SHA256
- +aes = FEAT_AES and FEAT_PMULL
- +sha3 = FEAT_SHA512 and FEAT_SHA3

> 
> > - New flags I just added (FRINTTS and FLAGM2 from apple-m1);
> FEAT_FRINTTS: 1
> FEAT_FlagM2: 1
> So I;ve added those.
> 
> > - PREDRES (from apple-m1)
> 
> I cannot find FEAT_PREDRES …
> … however we do have 
> FEAT_SPECRES: 0

FEAT_SPECRES in the architecture spec is the same as the +predres toolchain
flag.  LLVM seems to think the is supported from apple-m1.

> 
> AFAICT from the Arm doc DDI0487L_a_a-profile this is mandatory for 8.5 and is 
> the reason i left it at 8.4,
> 
> > If that's accurate, then I think you could list apple-m1 as V8_5A (although
> > LLVM only specifies V8_4A), and apple-m2 and apple-m3 as V8_6A (same as 
> > LLVM).
> > The only other difference from the increased architecture version would be 
> > to
> > enable a few more sysreg names (and our system register gating is an
> > inconsistent mess anyway).
> 
> I am going to try a bootstrap and test cycle with the changes above (still 
> based on 8.4 for now) and 
> see how the output looks.
> 
> thanks again for looking at this.
> Iain
> 
> 

Reply via email to