[clang] [llvm] [AArch64] Add support for Qualcomm Oryon processor (PR #91022)

Alex Bradbury via cfe-commits Wed, 15 May 2024 21:53:45 -0700

================
@@ -0,0 +1,1664 @@
+//=- AArch64SchedOryon.td - Nuvia Inc Oryon CPU 001 ---*- tablegen -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file defines the scheduling model for Nuvia Inc Oryon
+// family of processors.
+//
+//===----------------------------------------------------------------------===//
+
+//===----------------------------------------------------------------------===//
+// Pipeline Description.
+
+def OryonModel : SchedMachineModel {
+  let IssueWidth            =  14; // 14 micro-ops dispatched at a time. 
IXU=6, LSU=4, VXU=4
+  let MicroOpBufferSize     = 376; // 192 (48x4) entries in micro-op re-order 
buffer in VXU.
+                                   // 120 ((20+20)x3) entries in micro-op 
re-order buffer in IXU
+                                   // 64  (16+16)x2 re-order buffer in LSU
+                                   // total 373
+  let LoadLatency           =   4; // 4 cycle Load-to-use from L1D$
+                                   // LSU=5 NEON load
+  let MispredictPenalty     =  13; // 13 cycles for mispredicted branch.
+  // Determined via a mix of micro-arch details and experimentation.
+  let LoopMicroOpBufferSize =   0; // Do not have a LoopMicroOpBuffer
----------------
asb wrote:


Although it may not be microarchitecturally accurate, I wonder if you've 
benchmarked setting LoopMicroOpBufferSize to a non-zero value. Unless targets 
override it, partial and runtime loop unrolling aren't enabled unless 
LoopMicroOpBufferSize is non-zero (and this is the only way it's currently 
queried and used). In AArch64's case, they only override that decision f or 
in-order scheduling models. If you look at other models that set this value 
in-tree you'll see it's become somewhat divorced from microarchitectual reality 
- e.g. a number of the AArch64 models setting it based on instruction queue 
size or noting they just copied the value from the A57 model. On the X86 side, 
it's set to 50-72 for the modern Intel X86 and even up to 512 for Zen (noting 
it should be higher, but compile time impact is too high).

https://github.com/llvm/llvm-project/pull/91022
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64] Add support for Qualcomm Oryon processor (PR #91022)

Reply via email to