https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63671
--- Comment #8 from Jan Hubicka <hubicka at gcc dot gnu.org> --- According to perf, the culprint is inlining of _ZNK22UniformRectilinearMeshI10MeshTraitsILi3Ed21UniformRectilinearTag12CartesianTagLi3EEE12cellPositionERK3LocILi3EE that does get only partially inlined with devirtualization on, becuase we hit the inline unit growth limit. Inline summary for UniformRectilinearMesh<MeshTraits>::PointType_t UniformRectilinearMesh<MeshTraits>::cellPosition(const Loc_t&) const [with MeshTraits = MeshTraits<3, double, UniformRec self time: 66 global time: 0 self size: 23 global size: 0 min size: 0 self stack: 0 global stack: 0 size:17.500000, time:56.275000, predicate:(true) size:4.500000, time:6.275000, predicate:(not inlined) size:0.500000, time:1.500000, predicate:(op0[ref offset: 0] changed) && (not inlined) size:0.500000, time:1.500000, predicate:(op0[ref offset: 0] changed) calls: so this is leaf function with quite small body. It fails to inline as: Estimated badness is -96240, frequency 79.20. Badness calculation for void Adv5::Z::MomentumfluxZ<Dim>::operator()(const F1&, const F2&, const F3&, const Loc<Dim>&) const [with F1 = Field<UniformRectilinearMesh<MeshTraits<3, doub size growth 12, time 58 inline hints: declared_inline -96240: guessed profile. frequency 79.200000, benefit 8.713335%, time w/o inlining 20899, time w inlining 19078 overall growth 138 (current) 162 (original) this is quite small badness but still enought to make the inline appear late. The function body is as follows: { int i; int i; int _7; double _8; double _9; struct UniformRectilinearMeshData * _12; const struct Domain * _13; int _15; double _16; double _17; double _18; double _19; double * _22; const Element_t _23; <bb 2>: goto <bb 6>; <bb 3>: _22 = &MEM[(struct VectorEngine *)point_5(D)].x_m[i_3]; if (_22 != 0B) goto <bb 4>; else goto <bb 5>; <bb 4>: MEM[(This_t *)point_5(D)].x_m[i_3] = 0.0; <bb 5>: i_14 = i_3 + 1; <bb 6>: # i_3 = PHI <0(2), i_14(5)> if (i_3 <= 2) goto <bb 3>; else goto <bb 7>; <bb 7>: # i_11 = PHI <0(6)> goto <bb 9>; <bb 8>: _12 = MEM[(const struct RefCountedPtr *)this_6(D)].ptr_m; _8 = MEM[(const double &)_12 + 104].x_m[i_1]; _9 = MEM[(const double &)_12 + 128].x_m[i_1]; _7 = MEM[(const struct Domain *)loc_10(D)].D.78652.domain_m[i_1].D.77613.D.46542.D.46423.domain_m; _13 = &MEM[(const struct OneDomain_t *)_12 + 32B].D.120542.domain_m[i_1].D.115812.D.45237; _23 = MEM[(int *)_13]; _15 = _7 - _23; _16 = (double) _15; _17 = _16 + 5.0e-1; _18 = _9 * _17; _19 = _8 + _18; MEM[(double &)point_5(D)].x_m[i_1] = _19; i_21 = i_1 + 1; <bb 9>: # i_1 = PHI <i_11(7), i_21(8)> if (i_1 <= 2) goto <bb 8>; else goto <bb 10>; <bb 10>: return point_5(D); } One obvious nonsense is: <bb 3>: _22 = &MEM[(struct VectorEngine *)point_5(D)].x_m[i_3]; if (_22 != 0B) goto <bb 4>; else goto <bb 5>; Redundant non-zero checks seems very common in C++ code - I ran across many of those autogenerated by multiple inheritance or brought in by inlining. Scheduling VRP early makes the function to be fully inlined again, but does not fully solve the regression: