Jacob Navia via Gcc <gcc@gcc.gnu.org> writes:

> We have 2 loads, and 1 operation + a store. 4 instructions compared to
> 46 operations for the « gcc way » (16 loads of a byte, 14 x 2 OR
> operations and 8 shifts to split the result and 8 stores of a byte
> each.

The sample code seems to have a couple of errors; I fixed it up and put
it on godbolt: https://godbolt.org/z/obbr7K7dx

Let me know if the fixups were wrong. The issue should probably be
reported on Bugzilla as a missed-optimization bug.


/Benny


Reply via email to