https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50481
Alexander Kleinsorge <aleks at physik dot tu-berlin.de> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |aleks at physik dot tu-berlin.de --- Comment #13 from Alexander Kleinsorge <aleks at physik dot tu-berlin.de> --- for single bytes (uint8), there could be a faster way (x86 + x64). there are only logical ops and shifts, nothing else. static inline uint8 byte_rev(uint8 v) { const uint64 BREV64 = ~0x084c2a6e195d3b7fLLu; // verify this number (LUT like) uint8 a = (BREV64) >> ((v % 16u) * 4u); // from low uint8 b = (BREV64) >> ((v / 16u) * 4u); // from high return (a * 16u) | (b % 16u); }