On Mon, Jan 20, 2025 at 06:14:57PM +0000, Iain Sandoe wrote:
>
>
> > On 20 Jan 2025, at 17:38, Andrew Carlotti <[email protected]> wrote:
> >
> > On Sun, Jan 19, 2025 at 09:14:17PM +0000, Iain Sandoe wrote:
> >>
>
> >> Please note that the Darwin assembler is Apple’s LLVM backend (invoked via
> >> clang -cc1as)
> >> and that means that whatever GCC deduces for the feature set and outputs
> >> into the asm
> >> in the .arch line has to mean the same to LLVM as it does to GCC.
> >>
> >> For example:
> >> I ran into a problem (that might or might not still exist) where
> >> specifying crc on top of a
> >> 8.4 spec dropped the base rev assumed back to 8.x where crc was introduced
> >> (that could
> >> be just a bug, but I have to live with it).
> >
> > I remembered reading about this issue before; after some searching I
> > discoved
> > that it was via another email from you in 2023:
> > https://gcc.gnu.org/pipermail/gcc/2023-October/242748.html
> >
> > So I did some more digging. I believe the CRC workaround was added to
> > handle
> > misspecified CPUs in Binutils. In particular:
> >
> > - +crc only existed from Feb 2013 (e60bb1dd3)
>
> This is a “solved problem” on the Darwni development branch ( I made a
> configure
> check to back out of the fix if it isn’t needed) … but see below…
>
> >>> How about the llvm/lib/TargetParser/Host.cpp in upstream LLVM for the
> >>> part numbers?
> >>> I see it has different values for the M1,M2,M3 ones that you have in your
> >>> patch.
> >>
> >> Looking at a recent version of that I see the host-side values that we use
> >> when
> >> doing the native query. These are obtained from the OS - not the chip
> >> directly.
> >> (its a sysctl call).
> >>
> >> What I do not see is the manufacturer/chip pairs that you get in
> >> /proc/cpuinfo.
> >> (of course I could be blind :) )
> >
> > I tried looking myself, and the only relevant stuff I found uses
> > hw.cpufamily
> > instead.
>
> I pulled an updated version and I see that we now have the opposite issue the
> Linux section now lists manufacturer=x61 (apple) and then several chip values
> that map onto the m1, 2 and 3. I guess that’s a new use-case, I don’t see
> anything
> in the def file that caters to many-chip-id => one core mappings
I think this is exactly what the BIG_LITTLE handling is for.
>
> >>>>
> >>>> * Currently, we do not seem to have any way to specify that M2/M3 has
> >>>> support
> >>>> for FEAT_BTI, but because of missing feaures is not compliant with the
> >>>> Arm
> >>>> base rev that implies this.
> >>>
> >>> Since FEAT_BTI only adds hint instructions, I don't think any part of the
> >>> compiler actually checks for whether the feature is supported. Whether
> >>> or not
> >>> to emit FEAT_BTI instructions is controlled by a different compiler
> >>> option.
> >>
> >> I guess the question then is how do we enable it for apple-m2+ and not for
> >> the
> >> m1 (or are you saying it does not matter, since the lower revs would just
> >> treat
> >> the hint as NOP)?
> >
> > I'm saying it doesn't matter.
>
> ack.
>
> >>>> +/* Apple (A12 and M) cores based on Armv8.
> >>>> + Apple implementer ID from xnu,
> >>>> + Guesses for part # and suitable scheduler ident, generic_armv8_a for
> >>>> costs.
> >>>> + A12 seems mostly 8.3,
> >>>> + M1 seems to be 8.4 + extras (see comments in option-extensions about
> >>>> f16fml),
> >>>> + M2 mostly 8.5 but with missing mandatory features.
> >>>> + M3 is essentially the same as M2 for the features declared here. */
> >>>> +AARCH64_CORE("apple-a12", applea12, cortexa53, V8_3A, (),
> >>>> generic_armv8_a, 0x61, 0x12, -1)
> >>>> +AARCH64_CORE("apple-m1", applem1, cortexa57, V8_4A, (F16, SB, SSBS),
> >>>> generic_armv8_a, 0x61, 0x23, -1)
> >>>> +AARCH64_CORE("apple-m2", applem2, cortexa57, V8_4A, (I8MM, BF16, F16,
> >>>> SB, SSBS), generic_armv8_a, 0x61, 0x23, -1)
> >>>> +AARCH64_CORE("apple-m3", applem3, cortexa57, V8_4A, (I8MM, BF16, F16,
> >>>> SB, SSBS), generic_armv8_a, 0x61, 0x23, -1)
> >>>> +
> >
> > Are the Apple cpus actually big/little implementations,
>
> Yes
>
> > in which case there will be two different part ids? If so, then perhaps
> > this should combine the
> > two part numbers using the AARCH64_BIG_LITTLE macro. I'm not totally sure,
> > however, since I don't see other recent implementation handled in this
> > manner.
>
> me either - and for the short-term I am happy to treat them as one core (the
> feature sets
> must match so it does not seem to make code-gen differences).
>
> >>>
> >>> Comparing to LLVM's AArch64Processors.td, this seems to be missing a few
> >>> things:
> >>> - Crpyto extensions (SHA2 and AES, and SHA3 from apple-m1);
> >>
> >> I do not see FEAT_SHA2 listed in either the Arm doc, or the output from
> >> the sysctl.
> >> FEAT_AES: 1
> >> FEAT_SHA3: 1
> >> So I’ve added those to the three entries.
> >
> > There some architecture feature names that are effectively aliases in the
> > spec,
> > although identifying this requires reading the restrictions of the id
> > register
> > fields (and at least one version of the spec accidentally omitted one of the
> > dependencies). In summary:
> > - +sha2 = FEAT_SHA1 and FEAT_SHA256
> > - +aes = FEAT_AES and FEAT_PMULL
> > - +sha3 = FEAT_SHA512 and FEAT_SHA3
>
> thanks - that was not obvious.
>
> However, if I add any of these to the 8.4 spec, LLVM’s back end (at least the
> ones
> via xcode) drops the arch rev down and we fail to build libgcc because of
> missing
> support for fp16.
>
> This is likely a bug - but I don’t really know how to describe it at the
> moment - and
> it won’t make any difference to the assemblers already in the wild - so I
> will leave
> these out of the list for now.
Ah - I thought the bug only affected features that were mandatory (or at least
"default") at a given architecture version. So either xcode/LLVM is assuming
the crypto features are always enabled (from some architecture version), or the
bug has a wider scope for other reasons. Since the crypto features aren't
included in any base architecture version in GCC/Binutils, then this would be
much harder to work around.
>
> >>> - New flags I just added (FRINTTS and FLAGM2 from apple-m1);
> >> FEAT_FRINTTS: 1
> >> FEAT_FlagM2: 1
> >> So I;ve added those.
>
> The build with these added succeeded with no change in test results.
>
> >>
> >>> - PREDRES (from apple-m1)
> >>
> >> I cannot find FEAT_PREDRES …
> >> … however we do have
> >> FEAT_SPECRES: 0
> >
> > FEAT_SPECRES in the architecture spec is the same as the +predres toolchain
> > flag. LLVM seems to think the is supported from apple-m1.
>
> The Linux (cfarm103) for M1 says:
>
> Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp
> asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm
> dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint
>
> I do not see mention of predres or specres there.
> (which seems to agree with the output of sysctl under XNU).
There's no hwcap or cpuinfo name for FEAT_SPECRES, so it wouldn't show up
there. I don't know why this feature isn't advertised by the kernel.
>
> Advice welcome - I guess we could say “well the apple toolchains are sending
> v8.5 to llvm regardless of this, so it does not matter if GCC does the same”.
> OTOH - one reason for posting this patch is for Linux hosted on the same h.w
> and that will, presumably, be using GNU binutils …
>
> thanks again.
> Iain
>
>