On Mon, Jan 20, 2025 at 06:14:57PM +0000, Iain Sandoe wrote: > > > > On 20 Jan 2025, at 17:38, Andrew Carlotti <andrew.carlo...@arm.com> wrote: > > > > On Sun, Jan 19, 2025 at 09:14:17PM +0000, Iain Sandoe wrote: > >> > > >> Please note that the Darwin assembler is Apple’s LLVM backend (invoked via > >> clang -cc1as) > >> and that means that whatever GCC deduces for the feature set and outputs > >> into the asm > >> in the .arch line has to mean the same to LLVM as it does to GCC. > >> > >> For example: > >> I ran into a problem (that might or might not still exist) where > >> specifying crc on top of a > >> 8.4 spec dropped the base rev assumed back to 8.x where crc was introduced > >> (that could > >> be just a bug, but I have to live with it). > > > > I remembered reading about this issue before; after some searching I > > discoved > > that it was via another email from you in 2023: > > https://gcc.gnu.org/pipermail/gcc/2023-October/242748.html > > > > So I did some more digging. I believe the CRC workaround was added to > > handle > > misspecified CPUs in Binutils. In particular: > > > > - +crc only existed from Feb 2013 (e60bb1dd3) > > This is a “solved problem” on the Darwni development branch ( I made a > configure > check to back out of the fix if it isn’t needed) … but see below… > > >>> How about the llvm/lib/TargetParser/Host.cpp in upstream LLVM for the > >>> part numbers? > >>> I see it has different values for the M1,M2,M3 ones that you have in your > >>> patch. > >> > >> Looking at a recent version of that I see the host-side values that we use > >> when > >> doing the native query. These are obtained from the OS - not the chip > >> directly. > >> (its a sysctl call). > >> > >> What I do not see is the manufacturer/chip pairs that you get in > >> /proc/cpuinfo. > >> (of course I could be blind :) ) > > > > I tried looking myself, and the only relevant stuff I found uses > > hw.cpufamily > > instead. > > I pulled an updated version and I see that we now have the opposite issue the > Linux section now lists manufacturer=x61 (apple) and then several chip values > that map onto the m1, 2 and 3. I guess that’s a new use-case, I don’t see > anything > in the def file that caters to many-chip-id => one core mappings
I think this is exactly what the BIG_LITTLE handling is for. > > >>>> > >>>> * Currently, we do not seem to have any way to specify that M2/M3 has > >>>> support > >>>> for FEAT_BTI, but because of missing feaures is not compliant with the > >>>> Arm > >>>> base rev that implies this. > >>> > >>> Since FEAT_BTI only adds hint instructions, I don't think any part of the > >>> compiler actually checks for whether the feature is supported. Whether > >>> or not > >>> to emit FEAT_BTI instructions is controlled by a different compiler > >>> option. > >> > >> I guess the question then is how do we enable it for apple-m2+ and not for > >> the > >> m1 (or are you saying it does not matter, since the lower revs would just > >> treat > >> the hint as NOP)? > > > > I'm saying it doesn't matter. > > ack. > > >>>> +/* Apple (A12 and M) cores based on Armv8. > >>>> + Apple implementer ID from xnu, > >>>> + Guesses for part # and suitable scheduler ident, generic_armv8_a for > >>>> costs. > >>>> + A12 seems mostly 8.3, > >>>> + M1 seems to be 8.4 + extras (see comments in option-extensions about > >>>> f16fml), > >>>> + M2 mostly 8.5 but with missing mandatory features. > >>>> + M3 is essentially the same as M2 for the features declared here. */ > >>>> +AARCH64_CORE("apple-a12", applea12, cortexa53, V8_3A, (), > >>>> generic_armv8_a, 0x61, 0x12, -1) > >>>> +AARCH64_CORE("apple-m1", applem1, cortexa57, V8_4A, (F16, SB, SSBS), > >>>> generic_armv8_a, 0x61, 0x23, -1) > >>>> +AARCH64_CORE("apple-m2", applem2, cortexa57, V8_4A, (I8MM, BF16, F16, > >>>> SB, SSBS), generic_armv8_a, 0x61, 0x23, -1) > >>>> +AARCH64_CORE("apple-m3", applem3, cortexa57, V8_4A, (I8MM, BF16, F16, > >>>> SB, SSBS), generic_armv8_a, 0x61, 0x23, -1) > >>>> + > > > > Are the Apple cpus actually big/little implementations, > > Yes > > > in which case there will be two different part ids? If so, then perhaps > > this should combine the > > two part numbers using the AARCH64_BIG_LITTLE macro. I'm not totally sure, > > however, since I don't see other recent implementation handled in this > > manner. > > me either - and for the short-term I am happy to treat them as one core (the > feature sets > must match so it does not seem to make code-gen differences). > > >>> > >>> Comparing to LLVM's AArch64Processors.td, this seems to be missing a few > >>> things: > >>> - Crpyto extensions (SHA2 and AES, and SHA3 from apple-m1); > >> > >> I do not see FEAT_SHA2 listed in either the Arm doc, or the output from > >> the sysctl. > >> FEAT_AES: 1 > >> FEAT_SHA3: 1 > >> So I’ve added those to the three entries. > > > > There some architecture feature names that are effectively aliases in the > > spec, > > although identifying this requires reading the restrictions of the id > > register > > fields (and at least one version of the spec accidentally omitted one of the > > dependencies). In summary: > > - +sha2 = FEAT_SHA1 and FEAT_SHA256 > > - +aes = FEAT_AES and FEAT_PMULL > > - +sha3 = FEAT_SHA512 and FEAT_SHA3 > > thanks - that was not obvious. > > However, if I add any of these to the 8.4 spec, LLVM’s back end (at least the > ones > via xcode) drops the arch rev down and we fail to build libgcc because of > missing > support for fp16. > > This is likely a bug - but I don’t really know how to describe it at the > moment - and > it won’t make any difference to the assemblers already in the wild - so I > will leave > these out of the list for now. Ah - I thought the bug only affected features that were mandatory (or at least "default") at a given architecture version. So either xcode/LLVM is assuming the crypto features are always enabled (from some architecture version), or the bug has a wider scope for other reasons. Since the crypto features aren't included in any base architecture version in GCC/Binutils, then this would be much harder to work around. > > >>> - New flags I just added (FRINTTS and FLAGM2 from apple-m1); > >> FEAT_FRINTTS: 1 > >> FEAT_FlagM2: 1 > >> So I;ve added those. > > The build with these added succeeded with no change in test results. > > >> > >>> - PREDRES (from apple-m1) > >> > >> I cannot find FEAT_PREDRES … > >> … however we do have > >> FEAT_SPECRES: 0 > > > > FEAT_SPECRES in the architecture spec is the same as the +predres toolchain > > flag. LLVM seems to think the is supported from apple-m1. > > The Linux (cfarm103) for M1 says: > > Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp > asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm > dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint > > I do not see mention of predres or specres there. > (which seems to agree with the output of sysctl under XNU). There's no hwcap or cpuinfo name for FEAT_SPECRES, so it wouldn't show up there. I don't know why this feature isn't advertised by the kernel. > > Advice welcome - I guess we could say “well the apple toolchains are sending > v8.5 to llvm regardless of this, so it does not matter if GCC does the same”. > OTOH - one reason for posting this patch is for Linux hosted on the same h.w > and that will, presumably, be using GNU binutils … > > thanks again. > Iain > >