https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99434
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to cqwrteur from comment #2) > (In reply to Andrew Pinski from comment #1) > > This is just a register allocation issue dealing with mulx and TImode. > > > > If mulq was used instead (that is without -march=native), all of the > > functions are done correctly. > > I do not think so. I think GCC generally did things like this wrong. I have > even found out how to produce different wrong results deterministically. > > For example like this > https://godbolt.org/z/PbobYG > > Any time it deals with things like >>32 or >>64, it produces a slower result. > This even compiles without -march=native. This is still a register allocation issue, this time dealing with DImode on 32bit. GCC has a known issue with register allocation when dealing with values stored into two registers. See PR 21150, PR 43644, PR 50339, etc.