On Sun, 22 Feb 2026 16:29:54 +0100 Daniel Gregory <[email protected]> wrote:
> The RISC-V Zbc extension adds instructions for carry-less multiplication > we can use to implement CRC in hardware. This patch set contains two new > implementations: > > - one in lib/hash/rte_crc_riscv64.h that uses a Barrett reduction to > implement the four rte_hash_crc_* functions > - one in lib/net/net_crc_zbc.c that uses repeated single-folds to reduce > the buffer until it is small enough for a Barrett reduction to > implement rte_crc16_ccitt_zbc_handler and rte_crc32_eth_zbc_handler > > My approach is largely based on the Intel's "Fast CRC Computation Using > PCLMULQDQ Instruction" white paper > https://www.researchgate.net/publication/263424619_Fast_CRC_computation > and a post about "Optimizing CRC32 for small payload sizes on x86" > https://mary.rs/lab/crc32/ > > Whether these new implementations are enabled is controlled by new > build-time and run-time detection of the RISC-V extensions present in > the compiler and on the target system. > > I have carried out some performance comparisons between the generic > table implementations and the new hardware implementations. Listed below > is the number of cycles it takes to compute the CRC hash for buffers of > various sizes (as reported by rte_get_timer_cycles()). These results > were collected on a Kendryte K230 and averaged over 20 samples: > > |Buffer | CRC32-ETH (lib/net) | CRC32C (lib/hash) | > |Size (MB) | Table | Hardware | Table | Hardware | > |----------|----------|----------|----------|----------| > | 1 | 155168 | 11610 | 73026 | 18385 | > | 2 | 311203 | 22998 | 145586 | 35886 | > | 3 | 466744 | 34370 | 218536 | 53939 | > | 4 | 621843 | 45536 | 291574 | 71944 | > | 5 | 777908 | 56989 | 364152 | 89706 | > | 6 | 932736 | 68023 | 437016 | 107726 | > | 7 | 1088756 | 79236 | 510197 | 125426 | > | 8 | 1243794 | 90467 | 583231 | 143614 | > > These results suggest a speed-up of lib/net by thirteen times, and of > lib/hash by four times. > > I have also run the hash_functions_autotest benchmark in dpdk_test, > which measures the performance of the lib/hash implementation on small > buffers, getting the following times: > > | Key Length | Time (ticks/op) | > | (bytes) | Table | Hardware | > |------------|----------|----------| > | 1 | 0.47 | 0.85 | > | 2 | 0.57 | 0.87 | > | 4 | 0.99 | 0.88 | > | 8 | 1.35 | 0.88 | > | 9 | 1.20 | 1.09 | > | 13 | 1.76 | 1.35 | > | 16 | 1.87 | 1.02 | > | 32 | 2.96 | 0.98 | > | 37 | 3.35 | 1.45 | > | 40 | 3.49 | 1.12 | > | 48 | 4.02 | 1.25 | > | 64 | 5.08 | 1.54 | > > v4: > - rebase on 26.03-rc1 > - RISC64 -> RISCV64 in test_hash.c (Stephen Hemminger) > - Added section to release notes (Stephen Hemminger) > - SPDX-License_Identifier -> SPDX-License-Identifier in > rte_crc_riscv64.h (Stephen Hemminger) > - Fix header guard in rte_crc_riscv64.h (Stephen Hemminger) > - assert -> RTE_ASSERT in rte_crc_riscv64.h (Stephen Hemminger) > - Fix copyright statement in net_crc_zbc.c (Stephen Hemminger) > - Make crc context structs static in net_crc_zbc.c (Stephen Hemminger) > - prefer the optimised crc when zbc present over jhash in rte_fbk_hash.c > v3: > - rebase on 24.07 > - replace crc with CRC in commits (check-git-log.sh) > v2: > - replace compile flag with build-time (riscv extension macros) and > run-time detection (linux hwprobe syscall) (Stephen Hemminger) > - add qemu target that supports zbc (Stanislaw Kardach) > - fix spelling error in commit message > - fix a bug in the net/ implementation that would cause segfaults on > small unaligned buffers > - refactor net/ implementation to move variable declarations to top of > functions > - enable the optimisation in a couple other places optimised crc is > preferred to jhash > - l3fwd-power > - cuckoo-hash > > Daniel Gregory (10): > config/riscv: detect presence of Zbc extension > hash: implement CRC using riscv carryless multiply > net: implement CRC using riscv carryless multiply > config/riscv: add qemu crossbuild target > examples/l3fwd: use accelerated CRC on riscv > ipfrag: use accelerated CRC on riscv > examples/l3fwd-power: use accelerated CRC on riscv > hash: use accelerated CRC on riscv > member: use accelerated CRC on riscv > doc: implement CRC using riscv carryless multiply > > .mailmap | 2 +- > MAINTAINERS | 2 + > app/test/test_crc.c | 10 + > app/test/test_hash.c | 7 + > config/riscv/meson.build | 33 +++ > config/riscv/riscv64_qemu_linux_gcc | 17 ++ > .../linux_gsg/cross_build_dpdk_for_riscv.rst | 5 + > doc/guides/rel_notes/release_26_03.rst | 8 + > examples/l3fwd-power/main.c | 2 +- > examples/l3fwd/l3fwd_em.c | 2 +- > lib/eal/riscv/include/rte_cpuflags.h | 2 + > lib/eal/riscv/rte_cpuflags.c | 112 +++++++--- > lib/hash/meson.build | 1 + > lib/hash/rte_crc_riscv64.h | 90 ++++++++ > lib/hash/rte_cuckoo_hash.c | 3 + > lib/hash/rte_fbk_hash.c | 3 + > lib/hash/rte_hash_crc.c | 13 +- > lib/hash/rte_hash_crc.h | 6 +- > lib/ip_frag/ip_frag_internal.c | 6 +- > lib/member/member.h | 2 +- > lib/net/meson.build | 4 + > lib/net/net_crc.h | 11 + > lib/net/net_crc_zbc.c | 194 ++++++++++++++++++ > lib/net/rte_net_crc.c | 30 ++- > lib/net/rte_net_crc.h | 3 + > 25 files changed, 526 insertions(+), 42 deletions(-) > create mode 100644 config/riscv/riscv64_qemu_linux_gcc > create mode 100644 lib/hash/rte_crc_riscv64.h > create mode 100644 lib/net/net_crc_zbc.c > Since don't have riscv hardware or detailed CPU knowledge, turned to AI review for help. Looks like more work is needed. Not all AI comments are correct; please look at in detail. Summary of Findings The patch series correctly introduces RISC-V Zbc (carryless multiplication) support for CRC acceleration in both hash and net libraries, including runtime detection via the Linux hwprobe interface. However, I identified a significant alignment bug in the buffer processing logic and a missing header that will likely break the build on some systems. --- Correctness Bugs (High Priority) 1. Alignment Logic Error in crc32_eth_calc_zbc (lib/net/net_crc_zbc.c) The code attempts to align the input buffer to an 8-byte boundary before performing 64-bit wide operations, but the calculation is incorrect: 1 + /* Barrett reduce until buffer aligned to 8-byte word */ 2 + uint32_t misalign = (size_t)data & 7; 3 + if (misalign != 0 && misalign <= data_len) { 4 + crc = crc32_repeated_barrett_zbc(data, misalign, crc, params); 5 + data += misalign; 6 + data_len -= misalign; 7 + } Issue: If data is 0x...001 (1-byte misaligned), misalign is 1. After processing 1 byte and incrementing data, the new address is 0x...002, which is still not 8-byte aligned. This leads to misaligned 64-bit loads (*(const uint64_t *)data) later in the function, which can cause performance degradation or traps depending on the RISC-V implementation. Recommendation: The number of bytes to process to reach alignment should be (8 - ((uintptr_t)data & 7)) & 7. 2. Missing Header for cpu_set_t (lib/eal/riscv/rte_cpuflags.c) In Patch 01/10, the function rte_cpu_hwprobe_ima_ext introduces the use of cpu_set_t: 1 + /* empty set of cpus returns extensions present on all cpus */ 2 + cpu_set_t *cpus = NULL; Issue: lib/eal/riscv/rte_cpuflags.c does not include <sched.h>, which is required for the cpu_set_t definition. This will cause a compilation error. Recommendation: Add #include <sched.h> to the EAL CPU flags implementation. 3. Algorithm Overriding in rte_hash_crc_set_alg (lib/hash/rte_hash_crc.c) The implementation overrides the user's requested algorithm if the Zbc extension is detected: 1 + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_RISCV_EXT_ZBC)) 2 + rte_hash_crc32_alg = CRC32_RISCV64; Issue: If a user explicitly calls rte_hash_crc_set_alg(CRC32_SW), the code will still force the use of CRC32_RISCV64. While this matches the existing (and arguably flawed) pattern for ARM64, it violates the expectation that set_alg respects the user's choice. --- Style and Process Compliance 4. Inconsistent Assertion Macros - Patch 02/10 uses RTE_ASSERT in lib/hash/rte_crc_riscv64.h. - Patch 03/10 uses standard C assert() in lib/net/net_crc_zbc.c. Recommendation: Use RTE_ASSERT consistently across the codebase per DPDK standards. 5. Implicit Pointer Comparison Nits While most comparisons are explicit, the project style generally favors if (ptr == NULL) over if (!ptr). There are a few instances in Patch 01 (Meson logic) and Patch 02 (intrinsics checks) that could be more explicit, though they mostly follow the local conventions of those specific files. 6. Barrett Reduction Constants The constants mu and p in lib/hash/rte_crc_riscv64.h are lowercase. While these are static const variables, global/static constants in DPDK are often written in ALL_UPPERCASE. --- Additional Observations - Toolchain Requirements: The Meson logic correctly identifies that GCC 14.1+ or Clang 18.1+ is required for the Zbc intrinsics. - Cross-compilation: The addition of a QEMU cross-build target is a helpful addition for testing these extensions in virtualized environments. - Hardware Probing: The use of the Linux hwprobe syscall is the correct modern approach for RISC-V extension discovery.

