On Mon, Jan 20, 2025 at 06:14:57PM +0000, Iain Sandoe wrote:
> 
> 
> > On 20 Jan 2025, at 17:38, Andrew Carlotti <andrew.carlo...@arm.com> wrote:
> > 
> > On Sun, Jan 19, 2025 at 09:14:17PM +0000, Iain Sandoe wrote:
> >> 
> 
> >> Please note that the Darwin assembler is Apple’s LLVM backend (invoked via 
> >> clang -cc1as)
> >> and that means that whatever GCC deduces for the feature set and outputs 
> >> into the asm
> >> in the .arch line has to mean the same to LLVM as it does to GCC.
> >> 
> >> For example:
> >> I ran into a problem (that might or might not still exist) where 
> >> specifying crc on top of a
> >> 8.4 spec dropped the base rev assumed back to 8.x where crc was introduced 
> >> (that could
> >> be just a bug, but I have to live with it).
> > 
> > I remembered reading about this issue before; after some searching I 
> > discoved
> > that it was via another email from you in 2023:
> > https://gcc.gnu.org/pipermail/gcc/2023-October/242748.html
> > 
> > So I did some more digging.  I believe the CRC workaround was added to 
> > handle
> > misspecified CPUs in Binutils.  In particular:
> > 
> > - +crc only existed from Feb 2013 (e60bb1dd3)
> 
> This is a “solved problem” on the Darwni development branch ( I made a 
> configure 
> check to back out of the fix if it isn’t needed) … but see below…
> 
> >>> How about the llvm/lib/TargetParser/Host.cpp in upstream LLVM for the 
> >>> part numbers?
> >>> I see it has different values for the M1,M2,M3 ones that you have in your 
> >>> patch.
> >> 
> >> Looking at a recent version of that I see the host-side values that we use 
> >> when 
> >> doing the native query.  These are obtained from the OS - not the chip 
> >> directly.
> >> (its a sysctl call).
> >> 
> >> What I do not see is the manufacturer/chip pairs that you get in 
> >> /proc/cpuinfo.
> >> (of course I could be blind :) )
> > 
> > I tried looking myself, and the only relevant stuff I found uses 
> > hw.cpufamily
> > instead.
> 
> I pulled an updated version and I see that we now have the opposite issue the
> Linux section now lists manufacturer=x61 (apple) and then several chip values
> that map onto the m1, 2 and 3.  I guess that’s a new use-case, I don’t see 
> anything
> in the def file that caters to many-chip-id => one core mappings

I think this is exactly what the BIG_LITTLE handling is for.

> 
> >>>> 
> >>>> * Currently, we do not seem to have any way to specify that M2/M3 has 
> >>>> support
> >>>> for FEAT_BTI, but because of missing feaures is not compliant with the 
> >>>> Arm
> >>>> base rev that implies this.
> >>> 
> >>> Since FEAT_BTI only adds hint instructions, I don't think any part of the
> >>> compiler actually checks for whether the feature is supported.  Whether 
> >>> or not
> >>> to emit FEAT_BTI instructions is controlled by a different compiler 
> >>> option.
> >> 
> >> I guess the question then is how do we enable it for apple-m2+ and not for 
> >> the
> >> m1 (or are you saying it does not matter, since the lower revs would just 
> >> treat
> >> the hint as NOP)?
> > 
> > I'm saying it doesn't matter.
> 
> ack.
> 
> >>>> +/* Apple (A12 and M) cores based on Armv8.
> >>>> +   Apple implementer ID from xnu,
> >>>> +   Guesses for part # and suitable scheduler ident, generic_armv8_a for 
> >>>> costs.
> >>>> +   A12 seems mostly 8.3,
> >>>> +   M1 seems to be 8.4 + extras (see comments in option-extensions about 
> >>>> f16fml),
> >>>> +   M2 mostly 8.5 but with missing mandatory features.
> >>>> +   M3 is essentially the same as M2 for the features declared here.  */
> >>>> +AARCH64_CORE("apple-a12", applea12, cortexa53, V8_3A,  (), 
> >>>> generic_armv8_a, 0x61, 0x12, -1)
> >>>> +AARCH64_CORE("apple-m1", applem1, cortexa57, V8_4A,  (F16, SB, SSBS), 
> >>>> generic_armv8_a, 0x61, 0x23, -1)
> >>>> +AARCH64_CORE("apple-m2", applem2, cortexa57, V8_4A,  (I8MM, BF16, F16, 
> >>>> SB, SSBS), generic_armv8_a, 0x61, 0x23, -1)
> >>>> +AARCH64_CORE("apple-m3", applem3, cortexa57, V8_4A,  (I8MM, BF16, F16, 
> >>>> SB, SSBS), generic_armv8_a, 0x61, 0x23, -1)
> >>>> +
> > 
> > Are the Apple cpus actually big/little implementations,
> 
> Yes
> 
> > in which case there will be two different part ids?  If so, then perhaps 
> > this should combine the
> > two part numbers using the AARCH64_BIG_LITTLE macro.  I'm not totally sure,
> > however, since I don't see other recent implementation handled in this 
> > manner.
> 
> me either - and for the short-term I am happy to treat them as one core (the 
> feature sets
> must match so it does not seem to make code-gen differences).
> 
> >>> 
> >>> Comparing to LLVM's AArch64Processors.td, this seems to be missing a few 
> >>> things:
> >>> - Crpyto extensions (SHA2 and AES, and SHA3 from apple-m1);
> >> 
> >> I do not see FEAT_SHA2 listed in either the Arm doc, or the output from 
> >> the sysctl.
> >> FEAT_AES: 1
> >> FEAT_SHA3: 1
> >> So I’ve added those to the three entries.
> > 
> > There some architecture feature names that are effectively aliases in the 
> > spec,
> > although identifying this requires reading the restrictions of the id 
> > register
> > fields (and at least one version of the spec accidentally omitted one of the
> > dependencies).  In summary:
> > - +sha2 = FEAT_SHA1 and FEAT_SHA256
> > - +aes = FEAT_AES and FEAT_PMULL
> > - +sha3 = FEAT_SHA512 and FEAT_SHA3
> 
> thanks - that was not obvious.
> 
> However, if I add any of these to the 8.4 spec, LLVM’s back end (at least the 
> ones
> via xcode) drops the arch rev down and we fail to build libgcc because of 
> missing
> support for fp16.
> 
> This is likely a bug - but I don’t really know how to describe it at the 
> moment - and
> it won’t make any difference to the assemblers already in the wild - so I 
> will leave
> these out of the list for now.

Ah - I thought the bug only affected features that were mandatory (or at least
"default") at a given architecture version.  So either xcode/LLVM is assuming
the crypto features are always enabled (from some architecture version), or the
bug has a wider scope for other reasons.  Since the crypto features aren't
included in any base architecture version in GCC/Binutils, then this would be
much harder to work around.

> 
> >>> - New flags I just added (FRINTTS and FLAGM2 from apple-m1);
> >> FEAT_FRINTTS: 1
> >> FEAT_FlagM2: 1
> >> So I;ve added those.
> 
> The build with these added succeeded with no change in test results.
> 
> >> 
> >>> - PREDRES (from apple-m1)
> >> 
> >> I cannot find FEAT_PREDRES …
> >> … however we do have 
> >> FEAT_SPECRES: 0
> > 
> > FEAT_SPECRES in the architecture spec is the same as the +predres toolchain
> > flag.  LLVM seems to think the is supported from apple-m1.
> 
> The Linux (cfarm103) for M1 says:
> 
> Features      : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp 
> asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm 
> dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint
> 
> I do not see mention of predres or specres there.
> (which seems to agree with the output of sysctl under XNU).

There's no hwcap or cpuinfo name for FEAT_SPECRES, so it wouldn't show up
there.  I don't know why this feature isn't advertised by the kernel.

> 
> Advice welcome - I guess we could say “well the apple toolchains are sending 
> v8.5 to llvm regardless of this, so it does not matter if GCC does the same”. 
>  OTOH - one reason for posting this patch is for Linux hosted on the same h.w 
> and that will, presumably, be using GNU binutils … 
> 
> thanks again.
> Iain
> 
> 

Reply via email to