On 12/23/2025 6:03 AM, Milan Tripkovic wrote:
This patch adjusts the instruction scheduling and cost model for the Zbc
(CLMUL) extension on the Spacemit X60 core.
The tuning was evaluated using three configurations (CLMUL2, CLMUL3,
and the baseline CLMUL5) across a variety of hashing and encryption kernels.

| Test Case    | Met |    CLMUL2   |    CLMUL3   |  v5 (Patch) | v5/v2 | v5/v3 |
|--------------|-----|-------------|-------------|-------------|-------|-------|
| AES Hash     | ms  |     1,676.94|     1,614.35|     1,614.71| -3.71%| +0.02%|
| [5bsv8qnh1]  | cyc |2,683,008,256|2,582,866,539|2,583,483,082| -3.71%| +0.02%|
|--------------|-----|-------------|-------------|-------------|-------|-------|
| Sweet Spot   | ms  |     2,631.45|     2,631.35|     2,743.37| +4.25%| +4.26%|
| [xe9sncK87]  | cyc |4,210,294,394|4,210,083,291|4,389,331,992| +4.25%| +4.26%|
|--------------|-----|-------------|-------------|-------------|-------|-------|
| 128b Mult    | ms  |     1,639.32|     1,727.10|     1,754.90| +7.05%| +1.61%|
| [aYezqcx4n]  | cyc |2,622,877,789|2,763,259,605|2,807,789,209| +7.05%| +1.61%|
|--------------|-----|-------------|-------------|-------------|-------|-------|
| CRC32 Fold   | ms  |     2,056.78|     1,947.46|     2,120.03| +3.07%| +8.86%|
| [bdWoW9ezv]  | cyc |3,290,804,189|3,115,896,432|3,391,980,303| +3.07%| +8.86%|
|--------------|-----|-------------|-------------|-------------|-------|-------|
| Wegman Hash  | ms  |     2,160.13|     2,161.12|     2,204.77| +2.07%| +2.02%|
| [aa8aGerbe]  | cyc |3,456,154,927|3,457,644,543|3,527,588,691| +2.07%| +2.02%|
|--------------|-----|-------------|-------------|-------------|-------|-------|
Links:
[5bsv8qnh1] https://godbolt.org/z/5bsv8qnh1
[xe9sncK87] https://godbolt.org/z/xe9sncK87
[aYezqcx4n] https://godbolt.org/z/aYezqcx4n
[bdWoW9ezv] https://godbolt.org/z/bdWoW9ezv
[aa8aGerbe] https://godbolt.org/z/aa8aGerbe

Based on the benchmark results, CLMUL3 is proposed as the new default
tuning for this core.

2025-12-23  Milan Tripkovic  <[email protected]>

gcc/ChangeLog:

         * config/riscv/spacemit-x60.md (spacemit_x60_clmul): clmul value 
changed

Seems sensible -- we don't have any good documentation on this design, based on everything I've seen I would expect the latency to be 3c-5c.  A carryless multiply is simpler and thus could run a bit faster than standard multiplies.


Presumably these numbers were reasonably stable -- we've seen pretty large run-to-run jitter on this design.   Anyway, I've pushed this to the trunk under the agreement we had in the patchwork meeting a month or so ago WRT the spacemit scheduler/cost model.

jeff

Reply via email to