On 5/24/24 2:41 AM, Mariam Arutunian wrote:
Hello!
This patch set detects bitwise CRC implementation loops (with branches)
in the GIMPLE optimizers and replaces them with more optimal CRC
implementations in RTL. These patches introduce new internal functions,
built-in functions, and expanders for CRC generation, leveraging
hardware instructions where available. Additionally, various tests are
included to check CRC detection and generation.
Thanks so much for getting this process started. It's a bit quicker
than I was ready, but no worries.
2.
Architecture-Specific Expanders:
* Expanders are added for RISC-V, aarch64, and i386 architectures.
* These expanders generate CRCs using either carry-less
multiplication instructions or direct CRC instructions, based on
the target architecture's capabilities.
Also note for the wider audience, this work can also generate table
lookup based CRC implementations. This has proven exceedingly helpful
during the testing phase as we were able to run this code on a wide
variety of the embedded targets to shake out target dependencies.
On Ventana's V1 design the clmul variant was a small, but clear winner
over the table lookup. Obviously the bitwise implementation found in
coremark was the worst performing.
On our V2 design clmul outperforms the table lookup by a wide margin,
largely due to reduced latency of clmul.
Jeff