https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95566
Bug ID: 95566 Summary: x86 instruction selection --- some REX prefixes unnecessary Product: gcc Version: 10.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: zero at smallinteger dot com Target Milestone: --- Created attachment 48696 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48696&action=edit sample code Consider the code attached, compiled with gcc -O3 sample.c -o sample Gcc produces unrolled loop code that follows the pattern below. movzx ecx, WORD PTR [rsp-62] cmp rdx, rcx Here, rdx has the value of k >> 48. The top 32 bits of rdx are zero after the shift, so the entirety of k >> 48 is in edx. Thus, the cmp instructions could be cmp edx, ecx instead. This difference avoids the REX prefix, and thus the instructions are shorter. After sufficient unrolling (or with e.g. more complex comparisons that depend on k >> 48), shorter instructions without the REX prefix will be better even accounting for the partial register dependency (or an instruction to break the dependency). The Intel optimization manual says shorter instructions are better. The attachment is the entirety of sample.c. I did not include other files because this attachment appears to qualify for that exemption due to excuse (ii): the attached test case is small and does not include any other file. I originally found this behavior looking at the disassembly of gcc (Gentoo 9.2.0-r2 p3) 9.2.0. I verified the same behavior with gcc 10.1 and gcc trunk at godbolt.