On Thu, Dec 05, 2024 at 04:16:22PM +0000, Andrew Carlotti wrote: > On Sun, Dec 01, 2024 at 03:54:25PM -0700, Jeff Law wrote: > > > > > > On 11/13/24 12:03 PM, Richard Sandiford wrote: > > > Andrew Carlotti <andrew.carlo...@arm.com> writes: > > > > > > > > > > > > I think this is mostly my ignorance of the code, and would be obvious > > > > > if I tried it out locally, but: why do we need to do this after > > > > > computing the kills bitmap? For mode-switching, the kills bitmap > > > > > is the inverse of the transparency bitmap, but it sounds like here > > > > > you want the kills bitmap to be more selective.
To add a bit more context to the mode-switching comparison - I think we can view this code in that pass as doing the follow: 1. Add a mode switch (assignment) to every instruction that needs (uses) a specific mode. 2. Remove redundant assignments within a basic block (this is equivalent to grouping together instructions with the same mode into block that can be treated as a single instruction for subsequent analysis). 3. Do LCM analysis, and add/remove mode switches according to the LCM output. The actual code is slightly different - I think 1. and 2. effectively run at the same time, and produce a list of necessary mode switches, but these aren't added until after the analysis in 3. has run, and the switches marked for deletion by LCM have been removed from that list. The result of this flow is that mode uses and mode assignments are always (effectively) in the same place when running LCM, so there are no differences in the transparency and kills bitmaps arising from preexisting uses without defs. The difference for hardreg switching is that the mode register assignments are created during expand, and have already been subject to some inter-BB optimisation by this point, so we now have some mode register uses that don't have corresponding mode register assignments in the same BB. A concrete example that will hopefully help is the following (all edges are fallthrough edges): BB2: fpmr = 1 USE fpmr BB3: USE fpmr BB4: fpmr = 1 USE fpmr BB5: USE fpmr BB6: fpmr = 2 USE fpmr If we marked `fpmr = 2` as transparent in BB5, then LCM would be allowed** to move the assignment to the edge (4 -> 5), which would change the value of fpmr observed in BB5. So `fpmr = 2` is not transparent when a block contains a use of fpmr. If we marked `fpmr = 1` as killed in BB3, then LCM would be unable to see that the this value is still available in BB4, so it would be unable to remove the assignment in that block. The value is still available, so we should mark `fpmr = 1` as not killed in BB3. So in a block that looks like BB3 or BB5, we should mark the hardreg assignment as neither transparent nor killed. ** In practice LCM doesn't do this, because after finding the earliest possible insertion points for the computations it moves, it then tries to move them a bit later again. I think this means that for any RTL that we will generate for fpmr (ignoring weird unsupported stuff with inline asm), LCM would give the same result even if we incorrectly claimed that the assignment was transparent in blocks like BB3 and BB5. But that would be playing with fire for no benefit other than allowing us to remove a couple of blocks of code, and would make things harder for anyone trying to use this pass to optimise special hardregs in a different context. > > > > > > > > I had to work through the entire LCM algorithm before I understood how > > > > these > > > > bitmaps were being used (and I intend to update the documentation to > > > > make this > > > > more obvious). In summary, the kills and avail bitmaps indicate > > > > whether the > > > > result of an earlier expression is still available and up-to-date, > > > > whereas the > > > > transparent and anticipatable bitmaps indicate whether a later > > > > assignment can > > > > be moved earlier. > > > > > > Right. That part is pretty standard. > > > > > > > For the existing hoist/PRE passes these are the same - this is because > > > > new > > > > pseduoregs are used to hold the result of relocated computations, so > > > > the only > > > > obstruction is if the values of the inputs to the expression are > > > > changed. > > > > > > > > For the new hardreg PRE pass the bitmaps are different in one case - if > > > > the > > > > content of the hardreg is used, then the result of the expression > > > > remains > > > > available after the use, but it isn't possible to anticipate a future > > > > assignment by moving that assignment before the earlier use. > > > > > > But what I meant was: doesn't an assignment to the hard register block > > > movement/reuse in both directions? We can't move R:=X up through a block > > > B > > > that requires R==Y (so X is not transparent in B). We also can't > > > reuse R:=X after a block that requires R==Y (because B kills X). > > > > > > That's why I was expecting the kill set to be updated too, not just the > > > transparency set. > > In general, yes, I would expect transparency and kill to be inverses of each > > other. > > Kills and transparency are genuinely different bitmaps in the context of LCM. > In the context of GCC terminology (I don't know whether the naming of our > bitmaps is standard elsewhere) we have: > > - Transparency is the extension of anticipatability to entire basic blocks. > - Kills is the bitwise inverse of the extension of availability to entire > basic blocks. > > For the existing GCSE passes, where we use a brand new pseudoreg to convey > computation results between original and new locations, it turns out that the > local conditions for anticipatability and availability (and hence transparency > and ~kills) are the same (namely that we don't change any of the inputs to the > computation). The only difference is that anticipatability ranges must end at > an existing instance of the computation, and availability ranges start at an > existing instance of the computation. > > For hardreg PRE the endpoint conditions are the essentially the same, but the > local conditions for anticipatability and availability surviving an > instruction > are now different. This is because we are directly extending the range over > which the hardreg values are live, so we need extra checks to make sure these > don't conflict. For availability, we only need to check that the destination > hardreg hasn't been modified, but for anticipatability we also need to check > that the destination hardreg hasn't been used. This is where the asymmetry > between kills and transparency arises for hardreg PRE. > > > I suspect (but would have to do a fair amount of archaeology to be sure) > > that we probably had kills computed for some other problem (classic gcse or > > const/copy propagation perhaps) and we just inverted it to work with the LCM > > algorithm which wants to query transparency. Flipping kills once into > > transparency seems better than using kills and having to flip it every time > > we visit a block during the global propagation step. > > > > jeff