> On 7 Apr 2025, at 10:21, Tamar Christina <tamar.christ...@arm.com> wrote: > >> -----Original Message----- >> From: Kyrylo Tkachov <ktkac...@nvidia.com> >> Sent: Monday, March 31, 2025 1:43 PM >> To: i...@sandoe.co.uk >> Cc: Tamar Christina <tamar.christ...@arm.com>; GCC Patches <gcc- >> patc...@gcc.gnu.org>; Alice Carlotti <alice.carlo...@arm.com>; Richard >> Sandiford >> <richard.sandif...@arm.com>; s...@gentoo.org >> Subject: Re: [PATCH v2] aarch64, Darwin: Initial implementation of Apple >> cores >> [PR113257]. >> >> Hi Iain, >> >>> On 22 Mar 2025, at 15:31, Iain Sandoe <iains....@gmail.com> wrote: >>> >>> 0. Sorry this has taken some time to close off; partly because of waiting >>> for input, but mostly that I've been stretched with other work. >>> 1. As per the commit message, the apparent non-conformance with 8.5/6 >>> because FEAT_SPECRES returns 0, is a result of the query operating >>> at user priv. The cores are confirmed to support this for priv. >>> code. >>> 2. I added entries for the apple-m1,2,3 cores in invoke.texi. >>> 3. Following Andrew's suggestion and with some measurements by Tamar >>> and me, figured out the LITTLE.big chip ids (at least for a sub- >>> set). >>> >>> This has been in use for a while on aarch64-darwin branches and I've >>> checked manually that it gives the right .arch lines on cfarm185. >>> >>> OK for trunk? (if so, when?) >>> thanks >>> Iain >>> >>> --- 8< --- >>> >>> After discussion with the open source support team at Apple, we have >>> established that the cores conform to the 8.5 and 8.6 requirements. >>> One of the mandatory features (FEAT_SPECRES) is not exposed (or >>> available) in user-space code but is supported for privileged code. >>> >>> The values for chip IDs and the LITTLE.big variants have been taken >>> from lists in the XNU and LLVM sources. >>> >>> PR target/113257 >>> >>> gcc/ChangeLog: >>> >>> * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add Apple-a12, >>> Apple-M1, Apple-M2, Apple-M3 with expanded names to allow for the >>> LITTLE.big versions. >>> * config/aarch64/aarch64-tune.md: Regenerate. >>> * doc/invoke.texi: Add apple-m1,2 and 3 cores to the ones listed >>> for arch and tune selections. >>> >>> Signed-off-by: Iain Sandoe <i...@sandoe.co.uk> >>> --- >>> gcc/config/aarch64/aarch64-cores.def | 16 ++++++++++++++++ >>> gcc/config/aarch64/aarch64-tune.md | 2 +- >>> gcc/doc/invoke.texi | 5 +++-- >>> 3 files changed, 20 insertions(+), 3 deletions(-) >>> >>> diff --git a/gcc/config/aarch64/aarch64-cores.def >> b/gcc/config/aarch64/aarch64-cores.def >>> index 0e22d72976e..7f204fd0ac9 100644 >>> --- a/gcc/config/aarch64/aarch64-cores.def >>> +++ b/gcc/config/aarch64/aarch64-cores.def >>> @@ -173,6 +173,22 @@ AARCH64_CORE("cortex-a76.cortex-a55", >> cortexa76cortexa55, cortexa53, V8_2A, (F >>> AARCH64_CORE("cortex-r82", cortexr82, cortexa53, V8R, (), cortexa53, 0x41, >> 0xd15, -1) >>> AARCH64_CORE("cortex-r82ae", cortexr82ae, cortexa53, V8R, (), cortexa53, >> 0x41, 0xd14, -1) >>> >>> +/* Apple (A12 and M) cores. >>> + Known part numbers as listed in other public sources. >>> + Placeholders for schedulers, generic_armv8_a for costs. >>> + A12 seems mostly 8.3, M1 is 8.5 without BTI, M2 and M3 are 8.6 >>> + From measurements made so far the odd-number core IDs are performance. >> */ >>> +AARCH64_CORE("apple-a12", applea12, cortexa53, V8_3A, (), >> generic_armv8_a, 0x61, 0x12, -1) >>> +AARCH64_CORE("apple-m1", applem1_0, cortexa57, V8_5A, (), >> generic_armv8_a, 0x61, AARCH64_BIG_LITTLE (0x21, 0x20), -1) >>> +AARCH64_CORE("apple-m1", applem1_1, cortexa57, V8_5A, (), >> generic_armv8_a, 0x61, AARCH64_BIG_LITTLE (0x23, 0x22), -1) >>> +AARCH64_CORE("apple-m1", applem1_2, cortexa57, V8_5A, (), >> generic_armv8_a, 0x61, AARCH64_BIG_LITTLE (0x25, 0x24), -1) >>> +AARCH64_CORE("apple-m1", applem1_3, cortexa57, V8_5A, (), >> generic_armv8_a, 0x61, AARCH64_BIG_LITTLE (0x29, 0x28), -1) >>> +AARCH64_CORE("apple-m2", applem2_0, cortexa57, V8_6A, (), >> generic_armv8_a, 0x61, AARCH64_BIG_LITTLE (0x31, 0x30), -1) >>> +AARCH64_CORE("apple-m2", applem2_1, cortexa57, V8_6A, (), >> generic_armv8_a, 0x61, AARCH64_BIG_LITTLE (0x33, 0x32), -1) >>> +AARCH64_CORE("apple-m2", applem2_2, cortexa57, V8_6A, (), >> generic_armv8_a, 0x61, AARCH64_BIG_LITTLE (0x35, 0x34), -1) >>> +AARCH64_CORE("apple-m2", applem2_3, cortexa57, V8_6A, (), >> generic_armv8_a, 0x61, AARCH64_BIG_LITTLE (0x39, 0x38), -1) >>> +AARCH64_CORE("apple-m3", applem3_0, cortexa57, V8_6A, (), >> generic_armv8_a, 0x61, AARCH64_BIG_LITTLE (0x49, 0x48), -1) >> >> I don’t think we have precedent of different MIDR part numbers resolving to >> the >> same -mcpu string, but I think it should all work as expected. > > Indeed, I think for the current usage it should work fine. > >> As long as you and Tamar are happy with the feature set here no objections >> from >> me. > > FWIW no objections from me. This should unblock folks 😊 > > Thanks, > Tamar > >> Looks ok to me for GCC 15 with a documentation comment below… >> >>> + >>> /* Armv9.0-A Architecture Processors. */ >>> >>> /* Arm ('A') cores. */ >>> diff --git a/gcc/config/aarch64/aarch64-tune.md >> b/gcc/config/aarch64/aarch64-tune.md >>> index 56a914f12b9..982074c2c21 100644 >>> --- a/gcc/config/aarch64/aarch64-tune.md >>> +++ b/gcc/config/aarch64/aarch64-tune.md >>> @@ -1,5 +1,5 @@ >>> ;; -*- buffer-read-only: t -*- >>> ;; Generated automatically by gentune.sh from aarch64-cores.def >>> (define_attr "tune" >>> - >> "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunder >> xt88,thunderxt88p1,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt8 >> 3,ampere1,ampere1a,ampere1b,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,t >> hunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae >> ,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cor >> texx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeo >> ntx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,fujitsu_monaka,tsv >> 110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,oryon1,cortexa57c >> ortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75c >> ortexa55,cortexa76cortexa55,cortexr82,cortexr82ae,cortexa510,cortexa520,corte >> xa520ae,cortexa710,cortexa715,cortexa720,cortexa720ae,cortexa725,cortexx2,c >> ortexx3,cortexx4,cortexx925,neoversen2,cobalt100,neoversen3,neoversev2,grace >> ,neoversev3,neoversev3ae,demeter,olympus,generic,generic_armv8_a,generic_ar >> mv9_a" >>> + >> "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunder >> xt88,thunderxt88p1,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt8 >> 3,ampere1,ampere1a,ampere1b,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,t >> hunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae >> ,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cor >> texx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeo >> ntx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,fujitsu_monaka,tsv >> 110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,oryon1,cortexa57c >> ortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75c >> ortexa55,cortexa76cortexa55,cortexr82,cortexr82ae,applea12,applem1_0,apple >> m1_1,applem1_2,applem1_3,applem2_0,applem2_1,applem2_2,applem2_3,app >> lem3_0,cortexa510,cortexa520,cortexa520ae,cortexa710,cortexa715,cortexa720, >> cortexa720ae,cortexa725,cortexx2,cortexx3,cortexx4,cortexx925,neoversen2,cob >> alt100,neoversen3,neoversev2,grace,neoversev3,neoversev3ae,demeter,olympus >> ,generic,generic_armv8_a,generic_armv9_a" >>> (const (symbol_ref "((enum attr_tune) aarch64_tune)"))) >>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi >>> index 515d91ac2e3..f8f712d1877 100644 >>> --- a/gcc/doc/invoke.texi >>> +++ b/gcc/doc/invoke.texi >>> @@ -21763,7 +21763,8 @@ performance of the code. Permissible values for >> this option are: >>> @samp{cortex-x2}, @samp{cortex-x3}, @samp{cortex-x4}, @samp{cortex- >> a510}, >>> @samp{cortex-a520}, @samp{cortex-a520ae}, @samp{cortex-a710}, >> @samp{cortex-a715}, >>> @samp{cortex-a720}, @samp{cortex-a720ae}, @samp{ampere1}, >> @samp{ampere1a}, >>> -@samp{ampere1b}, @samp{cobalt-100} and @samp{native}. >>> +@samp{ampere1b}, @samp{cobalt-100}, @samp{apple-m1}, @samp{apple- >> m2}, >>> +@samp{apple-m3} and @samp{native}. >>> >>> The values @samp{cortex-a57.cortex-a53}, @samp{cortex-a72.cortex-a53}, >>> @samp{cortex-a73.cortex-a35}, @samp{cortex-a73.cortex-a53}, >>> @@ -23842,7 +23843,7 @@ Permissible names are: @samp{arm7tdmi}, >> @samp{arm7tdmi-s}, @samp{arm710t}, >>> @samp{neoverse-n1}, @samp{neoverse-n2}, @samp{neoverse-v1}, >> @samp{xscale}, >>> @samp{iwmmxt}, @samp{iwmmxt2}, @samp{ep9312}, @samp{fa526}, >> @samp{fa626}, >>> @samp{fa606te}, @samp{fa626te}, @samp{fmp626}, @samp{fa726te}, >> @samp{star-mc1}, >>> -@samp{xgene1}. >>> +@samp{xgene1}, @samp{apple-m1}, @samp{apple-m2}, @samp{apple-m3}. >> >> This looks like the section for (32-bit) arm rather than aarch64. >> >> >>> >>> Additionally, this option can specify that GCC should tune the performance >>> of the code for a big.LITTLE system. Permissible names are: >> >> There is a similar section about big.LITTLE in the aarch64 section where the >> b.L >> options you add in the patch should be listed.
Ok, then ok for trunk with the documentation point fixed. Thanks, Kyrill >> >>> -- >>> 2.39.2 (Apple Git-143) >>> >