On 6/4/24 7:41 AM, Mariam Arutunian wrote:
/Mariam, your thoughts on whether or not those two phases could handle a
loop with two CRC calculations inside, essentially creating two calls to
our new builtins? /
/
/
It is feasible, but it would likely demand considerable effort and
additional work to implement effectively.
Thanks for the confirmation. I suspect it likely doesn't come up often
in practice either.
The key would be to only simulate the use-def cycle from the loop-closed PHI
(plus the loop control of course, but miter/SCEV should be enough there) and
just replace that LC PHI, leaving loop DCE to DCE.
Thank you, this is a good idea to just replace the PHI and leave the loop to
DCE to remove only single CRC parts.
It does seem like replacing the PHI when we have an optimizable case
might simplify that aspect of the implementation.
The current pass only verifies cases where a single CRC calculation is
performed within the loop. During the verification phase,
I ensure that there are no other calculations aside from those necessary for
the considered CRC computation.
Also, when I was investigating the bitwise CRC implementations used in
different software, in all cases the loop was calculating just one CRC and no
other calculations were done.
Thus, in almost all cases, the first phase will filter out non-CRCs, and during
the second phase, only real CRCs with no other calculations will be executed.
This ensures that unnecessary statements won't be executed in most cases.
But we may have had a degree of sampling bias here. If I remember
correctly I used the initial filtering pass as the "trigger" to report a
potential CRC case. If that initial filtering pass rejected cases with
other calculations in the loop, then we never would have seen those.
Leaving the loop to DCE will simplify the process of removing parts connected
to a single CRC calculation.
However, since now we detect a loop that only calculates a single CRC, we can
entirely remove it at this stage without additional checks.
Let's evaluate this option as we get to the later patches in the series.
What I like about Richard's suggestion is that it "just works" and it
will continue to work, even as the overall infrastructure changes. In
contrast a bespoke loop removal implementation in a specific pass may
need adjustment if other aspects of our infrastructure change.
If we really want a separate pass (or utility to work on a single
loop) then we might consider moving some of the final value replacement
code that doesn’t work with only SCEV there as well. There’s also
special code in loop distribution for strlen recognition now, not
exactly fitting in. >
Note I had patches to do final value replacement on demand from CD-DCE when it
figures a loop has no side effects besides of its reduction outputs (still want
to pick this up at some point again).
Oh, this could provide useful insights for our implementation.
Are you thinking of reusing that on-demand analysis to reduce the set of
loops we analyze?
Jeff