[Bug tree-optimization/114151] [14 Regression] weird and inefficient codegen and addressing modes since r14-9193

rguenth at gcc dot gnu.org via Gcc-bugs Thu, 07 Mar 2024 00:05:02 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151


--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Andrew Macleod from comment #13)
> Created attachment 57638 [details]
> patch
> 
> Ok, there were 2 issues with simply invoking range_of_stmt, which this new
> patch resolves.  IF we aren't looking to fix this in GCC 14 right now
> anyway, this is the way to go.
> 
> 1) The cache has always tried to provide a global range by pre-folding a
> stmt for an estimate using global values.  This is a bad idea for PHIs when
> SCEV is invoked AND SCEV is calling ranger. This changes it to not
> pre-evaluate PHIs, which also saves time when functions have a lot of edges.
> Its mostly pointless for PHIs anyway since we're about to do a real
> evaluation.
> 
> 2) The cache's entry range propagator was not re-entrant.  We didn't
> previously need this, but with SCEV (and possible other place) invoking
> range_of_expr without context and having range_of_stmt being called, we can
> occasionally get layered calls for cache filling (of different ssa-names) 
> 
> With those 2 changes, we can now safely invoke range_of_stmt from a
> contextless range_of_expr call.
> 
> We would have tripped over this earlier if SCEV or one of those other places
> using range_of_expr without context had instead invoked range_of_stmt.  That
> would have been perfectly reasonable, and would have resulting in these same
> issues.  We never tripped over it because range_of_stmt is not used much
> outside of ranger.  That is the primary reason I wanted to track this down. 
> There were alternative paths to the same end result that would have
> triggered these issues.

It sounds like this part is a bugfix?

> Give this patch a try. it also bootstraps with no regressions.  I will queue
> it up for stage 1 instead assuming all is good.

It seems to work well, it now computes a lot of additional ranges and
causes a minor code generation change on the testcase (it doesn't fix the
observed regression though).

Thanks for working on this.

As of things unexplored is whether we can with better range-info lift the
constraint on the folding some more.  We're turning (A + i * B) * C into
(A * C + i * (B * C)) and need to avoid any additional intermediate undefined
overflow with this association for i in [0, n] (with n being the number of
iterations of the loop where i varies).

As said, if the regression is too important to ignore we could choose to
leave the bug unfixed for all but the case with A, B and C constant which
was the case for the testcase in the original PR.

[Bug tree-optimization/114151] [14 Regression] weird and inefficient codegen and addressing modes since r14-9193

Reply via email to