https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84547
Bug ID: 84547 Summary: Suboptimal code for masked shifts (ARM64) Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: nruslan_devel at yahoo dot com Target Milestone: --- Partially related to the Bug 84431 (see description of the problem there) but observed on ARM64 instead of x86/x86-64. (Not sure about ARM32.) Test example: __uint128_t func(__uint128_t a, unsigned shift) { return a << (shift & 63); } aarch64-linux-gnu-gcc-7 -Wall -O2 -S test.c GCC generates: func: and w2, w2, 63 mov w4, 63 sub w5, w4, w2 lsr x4, x0, 1 sub w3, w2, #64 lsl x1, x1, x2 cmp w3, 0 lsr x4, x4, x5 orr x1, x4, x1 lsl x4, x0, x3 lsl x0, x0, x2 csel x1, x4, x1, ge csel x0, x0, xzr, lt ret While clang/llvm generates better code: func: // @func // BB#0: and w8, w2, #0x3f lsr x9, x0, #1 eor x11, x8, #0x3f lsl x10, x1, x8 lsr x9, x9, x11 orr x1, x10, x9 lsl x0, x0, x8 ret Another interesting case when __builtin_unreachable() is used: __uint128_t func(__uint128_t a, unsigned shift) { if (shift > 63) __builtin_unreachable(); return a << shift; } But in this case, neither clang/llvm, nor gcc seem to be able to optimize code well.