https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110024
--- Comment #3 from d_vampile <d_vampile at 163 dot com> --- (In reply to Andrew Pinski from comment #2) > Which core is showing the difference here? > Because some cores I know of, loading/storing using the FP registers is > actually one cycle slower than using GPRs. Yes, you're right; This submission is due to my careless post wrong assembly code location; The performance is better when the X0 register is used before the modification. The question, however, is why this modification causes the register to select D0 and performance degradation. In addition, I will continue to follow up in the new submission, look forward to your reply. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110026