[Bug ipa/63671] [5 Regression] 21% tramp3d-v4 performance hit due to -fdevirtualize

hubicka at gcc dot gnu.org Tue, 11 Nov 2014 17:22:41 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63671


--- Comment #8 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
According to perf, the culprint is inlining of
_ZNK22UniformRectilinearMeshI10MeshTraitsILi3Ed21UniformRectilinearTag12CartesianTagLi3EEE12cellPositionERK3LocILi3EE
that does get only partially inlined with devirtualization on, becuase we hit
the inline unit growth limit.

Inline summary for UniformRectilinearMesh<MeshTraits>::PointType_t
UniformRectilinearMesh<MeshTraits>::cellPosition(const Loc_t&) const [with
MeshTraits = MeshTraits<3, double, UniformRec
  self time:       66
  global time:     0
  self size:       23
  global size:     0
  min size:       0
  self stack:      0
  global stack:    0
    size:17.500000, time:56.275000, predicate:(true)
    size:4.500000, time:6.275000, predicate:(not inlined)
    size:0.500000, time:1.500000, predicate:(op0[ref offset: 0] changed) &&
(not inlined)
    size:0.500000, time:1.500000, predicate:(op0[ref offset: 0] changed)
  calls:

so this is leaf function with quite small body.  It fails to inline as:

 Estimated badness is -96240, frequency 79.20.
    Badness calculation for void Adv5::Z::MomentumfluxZ<Dim>::operator()(const
F1&, const F2&, const F3&, const Loc<Dim>&) const [with F1 =
Field<UniformRectilinearMesh<MeshTraits<3, doub
      size growth 12, time 58 inline hints: declared_inline
      -96240: guessed profile. frequency 79.200000, benefit 8.713335%, time w/o
inlining 20899, time w inlining 19078 overall growth 138 (current) 162
(original)

this is quite small badness but still enought to make the inline appear late. 
The function body is as follows:

{
  int i;
  int i;
  int _7;
  double _8;
  double _9;
  struct UniformRectilinearMeshData * _12;
  const struct Domain * _13;
  int _15;
  double _16;
  double _17;
  double _18;
  double _19;
  double * _22;
  const Element_t _23;

  <bb 2>:
  goto <bb 6>;

  <bb 3>:
  _22 = &MEM[(struct VectorEngine *)point_5(D)].x_m[i_3];
  if (_22 != 0B)
    goto <bb 4>;
  else
    goto <bb 5>;

  <bb 4>:
  MEM[(This_t *)point_5(D)].x_m[i_3] = 0.0;

  <bb 5>:
  i_14 = i_3 + 1;

  <bb 6>:
  # i_3 = PHI <0(2), i_14(5)>
  if (i_3 <= 2)
    goto <bb 3>;
  else
    goto <bb 7>;

  <bb 7>:
  # i_11 = PHI <0(6)>
  goto <bb 9>;

  <bb 8>:
  _12 = MEM[(const struct RefCountedPtr *)this_6(D)].ptr_m;
  _8 = MEM[(const double &)_12 + 104].x_m[i_1];
  _9 = MEM[(const double &)_12 + 128].x_m[i_1];
  _7 = MEM[(const struct Domain
*)loc_10(D)].D.78652.domain_m[i_1].D.77613.D.46542.D.46423.domain_m;
  _13 = &MEM[(const struct OneDomain_t *)_12 +
32B].D.120542.domain_m[i_1].D.115812.D.45237;
  _23 = MEM[(int *)_13];
  _15 = _7 - _23;
  _16 = (double) _15;
  _17 = _16 + 5.0e-1;
  _18 = _9 * _17;
  _19 = _8 + _18;
  MEM[(double &)point_5(D)].x_m[i_1] = _19;
  i_21 = i_1 + 1;

  <bb 9>:
  # i_1 = PHI <i_11(7), i_21(8)>
  if (i_1 <= 2)
    goto <bb 8>;
  else
    goto <bb 10>;

  <bb 10>:
  return point_5(D);

}

One obvious nonsense is:

 <bb 3>:
  _22 = &MEM[(struct VectorEngine *)point_5(D)].x_m[i_3];
  if (_22 != 0B)
    goto <bb 4>;
  else
    goto <bb 5>;

Redundant non-zero checks seems very common in C++ code - I ran across many of
those autogenerated by multiple inheritance or brought in by inlining.
Scheduling VRP early makes the function to be fully inlined again, but does not
fully solve the regression:

[Bug ipa/63671] [5 Regression] 21% tramp3d-v4 performance hit due to -fdevirtualize

Reply via email to