On Sat, 15 Nov 2025, Jeff Law wrote:

> 
> 
> On 11/15/25 5:02 AM, Richard Biener wrote:
> 
> > 
> > In principle the dependence analysis part of LIM is quadratic, but I have
> > not seen actual cases in a long time where it would outcompete FRE or PRE in
> > terms of compile time (but IIRC we do not put hard limits on the max work it
> > does).
> > 
> > I will look at the actual test fallout then.
> But it's only working on loops and likely only on a subset of the SSA_NAMEs in
> the loops to prove properties.  So even if it's quadratic, it's probably a lot
> cheaper than PRE.

So there's one issue with scheduling LIM early.  LIM has code to
prevent expensive operations to be hoisted out of "cold" inner loops
into hotter code regions.  gcc.dg/tree-ssa/ssa-lim-19.c has a testcase
for this:

  for (i = 0; i < m; i++) // Loop 1
    {
      if (__builtin_expect (x, 0))
        for (j = 0; j < n; j++) // Loop 2
          for (k = 0; k < n; k++) // Loop 3
            {
              bar (s / 5, "one", "two");
              a[t] = s;
            }
      a[t] = t;
    }

now since I schedule the early LIM before profile estimation (which
runs very late during early opts), all profile counters are still
"invalid", so the code does not trigger and we end up hoisting the
division 's / 5' outside of the __builtin_expect conditional.

Citing the relevant pipeline area with LIM inserted:

          NEXT_PASS (pass_early_thread_jumps, /*first=*/true);
          NEXT_PASS (pass_sra_early);
          /* pass_build_ealias is a dummy pass that ensures that we
             execute TODO_rebuild_alias at this point.  */
          NEXT_PASS (pass_build_ealias);
          /* Do phiprop before FRE so we optimize std::min and std::max 
well.  */
          NEXT_PASS (pass_phiprop);
          NEXT_PASS (pass_fre, true /* may_iterate */);
          NEXT_PASS (pass_early_vrp);
          NEXT_PASS (pass_merge_phi);
          NEXT_PASS (pass_dse);
          NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */, true 
/* remove_unused_locals */);
          NEXT_PASS (pass_phiopt, true /* early_p */);
          NEXT_PASS (pass_lim, true /* early_p */);
          /* Cleanup eh is done before tail recusision to remove empty 
(only clobbers)
             finally blocks which would block tail recursion.  */
          NEXT_PASS (pass_cleanup_eh);
          NEXT_PASS (pass_tail_recursion);
          NEXT_PASS (pass_if_to_switch);
          NEXT_PASS (pass_convert_switch);
          NEXT_PASS (pass_sccopy);
          NEXT_PASS (pass_profile);
          NEXT_PASS (pass_local_pure_const);
          NEXT_PASS (pass_modref);

I suppose all passes that eventually care about profile (jump threading
does, in its cost metrics, via maybe_hot/optimize_edge_for_speed,
falling back to global defaults) would benefit from some kind of
reasonable local profile.

Now, I can simply move early LIM later, right after pass_profile
for example.  I'd expect if-to-switch / switch-convert disrupting
the profile.

Any ideas here?

Thanks,
Richard.

Reply via email to