https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474

--- Comment #99 from Richard Biener <rguenth at gcc dot gnu.org> ---
Just a short brain-dump for the PTA issue:

--param max-fields-for-field-sensitive=1 helps, so some magic limit and
auto-degrading might be a good idea.

Solver stats are not so bad:

 Total vars:               21507
Non-pointer vars:          16
Statically unified vars:  6800
Dynamically unified vars: 0
Iterations:               4
Number of edges:          43380
Number of implicit edges: 23056

but varmap "compression" happens before unifying those 6800 vars which
means bitmaps are less dense than possible.  That there's nothing
dynamically unified also says that likely iteration order is sub-optimal.
We don't have entries of the forward graph and so likely do multile
DFSs starting from somewhere inside for example.  Given we add both
succ and pred edges during solving itself makes itegrating the DFS
into the iteration itself look attractive eventually.

More stats are needed to judge iteration order tweaks.

We have IL like

  D.335748 = __result_mpfr_division_mp_mp;
  __result_mpfr_division_mp_mp ={v} {CLOBBER};
  D.76250 = D.335748;
  D.335748 ={v} {CLOBBER};
...
  mpfr_add (&__result_mpfr_addition_mp_mp, &D.76250, &D.76256, 0);

that just generates a lot of initial constraints and variables.  D.335748
becomes live here, so does D.76250.  This happens early also, like with

__attribute__((fn spec (". r r ")))
struct mpfr_type mpfr_division_mp_mp (struct mpfr_type & restrict a1, struct
mpfr_type & restrict a2)
{
  struct mpfr_type __result_mpfr_division_mp_mp;
  integer(kind=4) retval;

  <bb 2> :
  mpfr_init (&__result_mpfr_division_mp_mp);
  retval_6 = mpfr_div (&__result_mpfr_division_mp_mp, a1_3(D), a2_4(D), 0);
  <retval> = __result_mpfr_division_mp_mp;
  __result_mpfr_division_mp_mp ={v} {CLOBBER};
  return <retval>;

having some early pass after inlining clean up the result would be nice
[simply renaming and eliding 1:1 copies here].  It takes until SRA to fix this
and the PTA pass after it (run as part of PRE) is fast:

 tree PTA                           :  20.26 (  7%)

another thing to notice would be not splitting vars that just occur
"bare" in the IL but that would require some pre-scanning of the IL
to note interesting (and uninteresting) variables.  That's something
we might need anyway though for improved allocated array handling for
example.

Reply via email to