https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83272
--- Comment #3 from Mason <slash.tmp at free dot fr> --- I think Jakub is right about an interaction between movzbl and shrb. unsigned long long foo1(unsigned char *p) { return *p; } foo1: movzbl (%rdi), %eax ret I.e. gcc "knows" that movzbl clears the upper bits of RAX. However... unsigned long long foo2(unsigned char *p) { return *p / 2; } foo2: movzbl (%rdi), %eax shrb %al movzbl %al, %eax ret gcc seems to think that shrb might "mess up" some bits in RAX, which then need to be cleared again through an extra movzbl. "shr %al" is encoded as "d0 e8" while "shr %eax" is "d1 e8" The former is not more efficient than the latter. As Marc Glisse pointed out, a solution is convincing gcc to store the intermediate results in a register: unsigned long long foo3(unsigned char *p) { unsigned int temp = *p; return temp / 2; } foo3: movzbl (%rdi), %eax shrl %eax ret gcc "knows" that shrl does not "disturb" RAX, so no extra movzbl required. According to the "integer promotions" rules, *p is promoted to int in the expression (*p / 2) used in foo2. However... unsigned long long foo4(unsigned char *p) { int temp = *p; return temp / 2; } foo4: movzbl (%rdi), %eax sarl %eax cltq ret An 8-bit unsigned char promoted to int or unsigned int has the same value. I expect the same code generated for foo2, foo3, foo4. Yet we get 3 different code snippets.