Issue |
146564
|
Summary |
Zen3 scheduler model for the latency of VEXTRACTF128rri is probably incorrect
|
Labels |
new issue
|
Assignees |
|
Reporter |
TiborGY
|
See also discussion at https://discourse.llvm.org/t/are-the-latencies-of-vextractf128-correct-for-zen2-3-in-mca/86422
LLVM MCA relies on LLVM's scheduler models to predict cycle counts. This is the predicted timeline graph for a small snippet on Zen3:
```
[0,0] DeeeeeeeeER . . vmovapd (%rdi), %ymm0
[0,1] D=eeeeeeeeeeER . . vsubpd (%rsi), %ymm0, %ymm0
[0,2] D===========eeeER . vmulpd %ymm0, %ymm0, %ymm0
[0,3] D==============eeeeER vextractf128 $1, %ymm0, %xmm1
[0,4] D==============eE---R vmovhlps %xmm0, %xmm0, %xmm2
```
As you can see, `vextractf128` is predicted to have 4 cycles of latency. This however is inconsistent with both Agner Fogs latency tables (which list 3 cycles) and my own measurements with llvm-exegesis.
```
./llvm-exegesis -mode=latency -opcode-name=VEXTRACTF128rri -mcpu=znver3 --benchmark-repeat-count=100000 -min-instructions=1000 --repetition-mode=loop
---
mode: latency
key:
instructions:
- 'VEXTRACTF128rri XMM0 YMM0 i_0x1'
config: ''
register_initial_values:
- 'YMM0=0x0'
cpu_name: znver3
llvm_triple: x86_64-unknown-linux-gnu
min_instructions: 1000
measurements:
- { key: latency, value: 3.15, per_snippet_value: 3.15, validation_counters: {} }
error: ''
info: Repeating a single explicitly serial instruction
assembled_snippet: 4883EC20C7042400000000C744240400000000C744240800000000C744240C00000000C744241000000000C744241400000000C744241800000000C744241C00000000C5FE6F04244883C42049B80200000000000000662E0F1F840000000000C4E37D19C001C4E37D19C0014983C0FF75EEC3
...
```
Confusingly, AMD's official instruction latency table for Zen3 (Family_19h_Instruction_Latencies_version_1-00.xlsx, AMD Publication No. 56665 Revision 3.00 November 2020) lists `vextractf128` to have 4 cycles of latency. Perhaps I am misinterpreting my measurement results, but I cannot see how that figure could be correct. My confidence in the accuracy of the official latency table is further eroded by the fact that the two `vextractf128` variants are both listed with empty operand fields.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs