https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86609
Bug ID: 86609 Summary: Reassociate (int) round sequences Product: gcc Version: unknown Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ktkachov at gcc dot gnu.org Target Milestone: --- Not entirely sure what to call this optimisation. Consider: int f(double x, double *p) { double r = __builtin_round (x); *p = r; return (int) r; } For aarch64 GCC with -O2 -fno-math-errno generates: f: frinta d0, d0 // 7 [c=8 l=4] rounddf2 str d0, [x0] // 8 [c=4 l=4] *movdf_aarch64/7 fcvtzs w0, d0 // 14 [c=8 l=4] fix_truncdfsi2 ret // 29 [c=0 l=4] *do_return The problem here is that the two FRINT* operations cannot be done in parallel. Clang can break the chain like so: f: // @f // %bb.0: // %entry frinta d1, d0 fcvtas w8, d0 str d1, [x0] mov w0, w8 ret Note how the two expensive operations are now independent. I think in C terms this means transforming the above to: int f2 (double x, double *p) { double r = __builtin_round (x); *p = r; return (int)__builtin_iround (x); } Would this be something for the reassociation pass to do?