On 5/24/24 2:41 AM, Mariam Arutunian wrote:
Hello!

This patch set detects bitwise CRC implementation loops (with branches) in the GIMPLE optimizers and replaces them with more optimal CRC implementations in RTL. These patches introduce new internal functions, built-in functions, and expanders for CRC generation, leveraging hardware instructions where available. Additionally, various tests are included to check CRC detection and generation.
Thanks so much for getting this process started. It's a bit quicker than I was ready, but no worries.



 2.

    Architecture-Specific Expanders:

      * Expanders are added for RISC-V, aarch64, and i386 architectures.
      * These expanders generate CRCs using either carry-less
        multiplication instructions or direct CRC instructions, based on
        the target architecture's capabilities.
Also note for the wider audience, this work can also generate table lookup based CRC implementations. This has proven exceedingly helpful during the testing phase as we were able to run this code on a wide variety of the embedded targets to shake out target dependencies.

On Ventana's V1 design the clmul variant was a small, but clear winner over the table lookup. Obviously the bitwise implementation found in coremark was the worst performing.

On our V2 design clmul outperforms the table lookup by a wide margin, largely due to reduced latency of clmul.


Jeff

Reply via email to