On 6/4/24 7:41 AM, Mariam Arutunian wrote:
/Mariam, your thoughts on whether or not those two phases could handle a loop with two CRC calculations inside, essentially creating two calls to our new builtins? /

/
/

It is feasible, but it would likely demand considerable effort and additional work to implement effectively.
Thanks for the confirmation. I suspect it likely doesn't come up often in practice either.



The key would be to only simulate the use-def cycle from the loop-closed PHI 
(plus the loop control of course, but miter/SCEV should be enough there) and 
just replace that LC PHI, leaving loop DCE to DCE.

Thank you, this is a good idea to just replace the PHI and leave the loop to 
DCE to remove only single CRC parts.
It does seem like replacing the PHI when we have an optimizable case might simplify that aspect of the implementation.



The current pass only verifies cases where a single CRC calculation is 
performed within the loop. During the verification phase,
I ensure that there are no other calculations aside from those necessary for 
the considered CRC computation.

Also, when I was investigating the bitwise CRC implementations used in 
different software, in all cases the loop was calculating just one CRC and no 
other calculations were done.
Thus, in almost all cases, the first phase will filter out non-CRCs, and during 
the second phase, only real CRCs with no other calculations will be executed.
This ensures that unnecessary statements won't be executed in most cases.
But we may have had a degree of sampling bias here. If I remember correctly I used the initial filtering pass as the "trigger" to report a potential CRC case. If that initial filtering pass rejected cases with other calculations in the loop, then we never would have seen those.


Leaving the loop to DCE will simplify the process of removing parts connected 
to a single CRC calculation.
However, since now we detect a loop that only calculates a single CRC, we can 
entirely remove it at this stage without additional checks.
Let's evaluate this option as we get to the later patches in the series. What I like about Richard's suggestion is that it "just works" and it will continue to work, even as the overall infrastructure changes. In contrast a bespoke loop removal implementation in a specific pass may need adjustment if other aspects of our infrastructure change.





If we really want a separate pass (or utility to work on a single loop) then we might consider moving some of the final value replacement code that doesn’t work with only SCEV there as well. There’s also special code in loop distribution for strlen recognition now, not exactly fitting in. >

Note I had patches to do final value replacement on demand from CD-DCE when it 
figures a loop has no side effects besides of its reduction outputs (still want 
to pick this up at some point again).

Oh, this could provide useful insights for our implementation.
Are you thinking of reusing that on-demand analysis to reduce the set of loops we analyze?

Jeff

Reply via email to