https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81614
Jan Hubicka <hubicka at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hubicka at gcc dot gnu.org --- Comment #6 from Jan Hubicka <hubicka at gcc dot gnu.org> --- There are two flags (which I believe was introduced by me) - partial-reg-stall which models behavior of PentiumPro where partial writes to register were cheap as long as the partially written register was never used in wider mode as the stall. - partial-reg-dependency which models later CPUs where partial writes are handled as read-mody-write instructions and thus slow even if the result is used only in same width as write. This was design of Athlons. The first flag avoids random optimizations which replace full sized instruction by part size (for example xol $1, eax is not changed to xorb to save size). Still we could generate partial register stalls out of combine. The second is trying to make sure we always read full register (by movzx). We set those as: DEF_TUNE (X86_TUNE_PARTIAL_REG_STALL, "partial_reg_stall", m_PPRO) DEF_TUNE (X86_TUNE_PARTIAL_REG_DEPENDENCY, "partial_reg_dependency", m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_SILVERMONT | m_INTEL | m_KNL | m_AMD_MULTIPLE | m_GENERIC) I would say that it makes no sense to have both X86_TUNE_PARTIAL_REG_STALL and X86_TUNE_PARTIAL_REG_DEPENDENCY set on one chip. According to Fog's manual indeed Core and later chips can rename partial registers again so they should be moved to X86_TUNE_PARTIAL_REG_STALL category and we should try to fix possible regressions. In the testcase given, for X86_TUNE_PARTIAL_REG_DEPENDENCY we ought to emit the dependency breaking instruction to clear full register before partial write when optimizing for speed. All AMD chips since Athlon are however X86_TUNE_PARTIAL_REG_DEPENDENCY design so for generic we will need to check what are the tradeoffs. I would say that X86_TUNE_PARTIAL_REG_DEPENDENCY is in general more conservative and works well for (X86_TUNE_PARTIAL_REG_STALL chips as the cases we produce partial write (sete) are relatively rare.