> -----Original Message-----
> From: Iain Sandoe <i...@sandoe.co.uk>
> Sent: Monday, January 20, 2025 6:15 PM
> To: Andrew Carlotti <andrew.carlo...@arm.com>
> Cc: Kyrylo Tkachov <ktkac...@nvidia.com>; GCC Patches <gcc-
> patc...@gcc.gnu.org>; Tamar Christina <tamar.christ...@arm.com>; Richard
> Sandiford <richard.sandif...@arm.com>; Sam James <s...@gentoo.org>
> Subject: Re: [PATCH] aarch64: Provide initial specifications for Apple CPU 
> cores.
> 
> 
> 
> > On 20 Jan 2025, at 17:38, Andrew Carlotti <andrew.carlo...@arm.com> wrote:
> >
> > On Sun, Jan 19, 2025 at 09:14:17PM +0000, Iain Sandoe wrote:
> >>
> 
> >> Please note that the Darwin assembler is Apple’s LLVM backend (invoked via
> clang -cc1as)
> >> and that means that whatever GCC deduces for the feature set and outputs 
> >> into
> the asm
> >> in the .arch line has to mean the same to LLVM as it does to GCC.
> >>
> >> For example:
> >> I ran into a problem (that might or might not still exist) where 
> >> specifying crc on
> top of a
> >> 8.4 spec dropped the base rev assumed back to 8.x where crc was introduced
> (that could
> >> be just a bug, but I have to live with it).
> >
> > I remembered reading about this issue before; after some searching I 
> > discoved
> > that it was via another email from you in 2023:
> > https://gcc.gnu.org/pipermail/gcc/2023-October/242748.html
> >
> > So I did some more digging.  I believe the CRC workaround was added to 
> > handle
> > misspecified CPUs in Binutils.  In particular:
> >
> > - +crc only existed from Feb 2013 (e60bb1dd3)
> 
> This is a “solved problem” on the Darwni development branch ( I made a 
> configure
> check to back out of the fix if it isn’t needed) … but see below…
> 
> >>> How about the llvm/lib/TargetParser/Host.cpp in upstream LLVM for the part
> numbers?
> >>> I see it has different values for the M1,M2,M3 ones that you have in your
> patch.
> >>
> >> Looking at a recent version of that I see the host-side values that we use 
> >> when
> >> doing the native query.  These are obtained from the OS - not the chip 
> >> directly.
> >> (its a sysctl call).
> >>
> >> What I do not see is the manufacturer/chip pairs that you get in 
> >> /proc/cpuinfo.
> >> (of course I could be blind :) )
> >
> > I tried looking myself, and the only relevant stuff I found uses 
> > hw.cpufamily
> > instead.
> 
> I pulled an updated version and I see that we now have the opposite issue the
> Linux section now lists manufacturer=x61 (apple) and then several chip values
> that map onto the m1, 2 and 3.  I guess that’s a new use-case, I don’t see 
> anything
> in the def file that caters to many-chip-id => one core mappings
> 
> >>>>
> >>>> * Currently, we do not seem to have any way to specify that M2/M3 has
> support
> >>>> for FEAT_BTI, but because of missing feaures is not compliant with the 
> >>>> Arm
> >>>> base rev that implies this.
> >>>
> >>> Since FEAT_BTI only adds hint instructions, I don't think any part of the
> >>> compiler actually checks for whether the feature is supported.  Whether 
> >>> or not
> >>> to emit FEAT_BTI instructions is controlled by a different compiler 
> >>> option.
> >>
> >> I guess the question then is how do we enable it for apple-m2+ and not for 
> >> the
> >> m1 (or are you saying it does not matter, since the lower revs would just 
> >> treat
> >> the hint as NOP)?
> >
> > I'm saying it doesn't matter.
> 
> ack.
> 
> >>>> +/* Apple (A12 and M) cores based on Armv8.
> >>>> +   Apple implementer ID from xnu,
> >>>> +   Guesses for part # and suitable scheduler ident, generic_armv8_a for 
> >>>> costs.
> >>>> +   A12 seems mostly 8.3,
> >>>> +   M1 seems to be 8.4 + extras (see comments in option-extensions about
> f16fml),
> >>>> +   M2 mostly 8.5 but with missing mandatory features.
> >>>> +   M3 is essentially the same as M2 for the features declared here.  */
> >>>> +AARCH64_CORE("apple-a12", applea12, cortexa53, V8_3A,  (),
> generic_armv8_a, 0x61, 0x12, -1)
> >>>> +AARCH64_CORE("apple-m1", applem1, cortexa57, V8_4A,  (F16, SB, SSBS),
> generic_armv8_a, 0x61, 0x23, -1)
> >>>> +AARCH64_CORE("apple-m2", applem2, cortexa57, V8_4A,  (I8MM, BF16,
> F16, SB, SSBS), generic_armv8_a, 0x61, 0x23, -1)
> >>>> +AARCH64_CORE("apple-m3", applem3, cortexa57, V8_4A,  (I8MM, BF16,
> F16, SB, SSBS), generic_armv8_a, 0x61, 0x23, -1)
> >>>> +
> >
> > Are the Apple cpus actually big/little implementations,
> 
> Yes
> 
> > in which case there will be two different part ids?  If so, then perhaps 
> > this should
> combine the
> > two part numbers using the AARCH64_BIG_LITTLE macro.  I'm not totally sure,
> > however, since I don't see other recent implementation handled in this 
> > manner.
> 

We've not really had any.  Many of our recent CPUs in this space are big medium 
Little
and there are many ways you can combine them and in which they are combined.

As such we've omitted them, mainly because at the end of the day you can't 
(easily)
Run gcc directly on these devices anyway and we would have to pick a core for 
the
codegen anyway.

These cores are somewhat different as you have Asahi, where GCC does run.
Without specifying AARCH64_BIG_LITTLE macro with both cores (see some examples 
in
the cores file) we'd simply not detect the system during -march=native.

> me either - and for the short-term I am happy to treat them as one core (the
> feature sets
> must match so it does not seem to make code-gen differences).

I'd agree, the only thing missing out here is that for native detection we'd 
loose
that they were Armv8.4-a, or Armv8.3-a cores, but that really has little 
material effect
anymore in GCC.

I would say if we can find both core IDs we should use them, otherwise this is 
already
an improvement on the situation.

Thanks,
Tamar

> 
> >>>
> >>> Comparing to LLVM's AArch64Processors.td, this seems to be missing a few
> things:
> >>> - Crpyto extensions (SHA2 and AES, and SHA3 from apple-m1);
> >>
> >> I do not see FEAT_SHA2 listed in either the Arm doc, or the output from the
> sysctl.
> >> FEAT_AES: 1
> >> FEAT_SHA3: 1
> >> So I’ve added those to the three entries.
> >
> > There some architecture feature names that are effectively aliases in the 
> > spec,
> > although identifying this requires reading the restrictions of the id 
> > register
> > fields (and at least one version of the spec accidentally omitted one of the
> > dependencies).  In summary:
> > - +sha2 = FEAT_SHA1 and FEAT_SHA256
> > - +aes = FEAT_AES and FEAT_PMULL
> > - +sha3 = FEAT_SHA512 and FEAT_SHA3
> 
> thanks - that was not obvious.
> 
> However, if I add any of these to the 8.4 spec, LLVM’s back end (at least the 
> ones
> via xcode) drops the arch rev down and we fail to build libgcc because of 
> missing
> support for fp16.
> 
> This is likely a bug - but I don’t really know how to describe it at the 
> moment - and
> it won’t make any difference to the assemblers already in the wild - so I 
> will leave
> these out of the list for now.
> 
> >>> - New flags I just added (FRINTTS and FLAGM2 from apple-m1);
> >> FEAT_FRINTTS: 1
> >> FEAT_FlagM2: 1
> >> So I;ve added those.
> 
> The build with these added succeeded with no change in test results.
> 
> >>
> >>> - PREDRES (from apple-m1)
> >>
> >> I cannot find FEAT_PREDRES …
> >> … however we do have
> >> FEAT_SPECRES: 0
> >
> > FEAT_SPECRES in the architecture spec is the same as the +predres toolchain
> > flag.  LLVM seems to think the is supported from apple-m1.
> 
> The Linux (cfarm103) for M1 says:
> 
> Features      : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp
> asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm
> dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint
> 
> I do not see mention of predres or specres there.
> (which seems to agree with the output of sysctl under XNU).
> 
> Advice welcome - I guess we could say “well the apple toolchains are sending 
> v8.5
> to llvm regardless of this, so it does not matter if GCC does the same”.  
> OTOH - one
> reason for posting this patch is for Linux hosted on the same h.w and that 
> will,
> presumably, be using GNU binutils …
> 
> thanks again.
> Iain
> 

Reply via email to