[clang] [llvm] [AArch64] Add support for Qualcomm Oryon processor (PR #91022)

Alex Bradbury via cfe-commits Thu, 23 May 2024 06:44:51 -0700

================
@@ -0,0 +1,1664 @@
+//=- AArch64SchedOryon.td - Nuvia Inc Oryon CPU 001 ---*- tablegen -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file defines the scheduling model for Nuvia Inc Oryon
+// family of processors.
+//
+//===----------------------------------------------------------------------===//
+
+//===----------------------------------------------------------------------===//
+// Pipeline Description.
+
+def OryonModel : SchedMachineModel {
+  let IssueWidth            =  14; // 14 micro-ops dispatched at a time. 
IXU=6, LSU=4, VXU=4
+  let MicroOpBufferSize     = 376; // 192 (48x4) entries in micro-op re-order 
buffer in VXU.
+                                   // 120 ((20+20)x3) entries in micro-op 
re-order buffer in IXU
+                                   // 64  (16+16)x2 re-order buffer in LSU
+                                   // total 373
+  let LoadLatency           =   4; // 4 cycle Load-to-use from L1D$
+                                   // LSU=5 NEON load
+  let MispredictPenalty     =  13; // 13 cycles for mispredicted branch.
+  // Determined via a mix of micro-arch details and experimentation.
+  let LoopMicroOpBufferSize =   0; // Do not have a LoopMicroOpBuffer
----------------
asb wrote:


Thanks Joel. FWIW there's been some development in this setting since I posted 
this. 54e52aa5ebe68de122a3fe6064e0abef97f6b8e0 reduced the 
LoopMicroOpBufferSize for Zen to 96 (Zen3) and 108 (Zen4) so they're no longer 
outliers with much larger values than anyone else.

I totally agree that in-order cores will be more sensitive to scheduling 
changes in general. I'll note that as runtime and partial loop unrolling are 
disabled unless this is set, I'd see the impact as slightly more than just 
scheduling. In some cases further optimisations trigger on the (partially) 
unrolled IR which can lead to better code. e.g. setting this reduced dynamic 
instruction count on some SPEC benchmarks for a RISC-V OoO scheduling model. If 
you do experiment with the value I'd be interested in your findings.

https://github.com/llvm/llvm-project/pull/91022
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64] Add support for Qualcomm Oryon processor (PR #91022)

Reply via email to