On Wed, Aug 06, 2025 at 12:29:09PM +0000, Thomas de Bock wrote: > I've looked at the pattern LLVM recognizes and there is indeed a lot of > different > ways we could recognize the chains and generalize the optimization. > The way I do the pattern recognition now is by changing the conditions to ==,
You can't rely on just == even for equality, it needs to handle both == and != and based on which edge has EDGE_TRUE_VALUE vs. EDGE_FALSE_VALUE and whether it was EQ_EXPR or NE_EXPR decide what edge is for equality and what is for non-equality. Whether the comparisons are EQ_EXPR or NE_EXPR can be quite random. > redirecting the edges if needed. Since memcmp will only give us whether all > fields > were equal or not, a mix of these operators or their outgoing edges wouldn't > work. That is not true. "The memcmp() function returns an integer less than, equal to, or greater than zero if the first n bytes of s1 is found, respectively, to be less than, to match, or be greater than the first n bytes of s2. For a nonzero return value, the sign is determined by the sign of the difference between the first pair of bytes (interpreted as unsigned char) that differ in s1 and s2." So, for equality only you can pattern match eq/ne comparison of integral types with no padding (TYPE_SIZE in multiples of BITS_PER_UNIT, TYPE_PRECISION equal to TYPE_SIZE), but you can also handle the <=> cases if the comparisons are unsigned char or little endian above mentioned unsigned larger types. For <=> it would need to canonicalize the memcmp result (which can return any negative, zero or any positive, to -1/0/1 or whatever other constants are used for the PHIs for the <, == and > cases). Even the equality doesn't have to have 0/1 in PHIs, it can be 1/0 or 42/53 etc., just look at return *a == *b ? 42 : 53; > Then discovering chains where: > 1. All blocks in the chain perform 2 loads, then a condition of their > equality. > The PHI argument value corresponding to each block in the chain should be > the > same, except for possibly a TRUE edge from the final block in the chain > (possibly indirectly, by a path of empty blocks with fallthru edges) > 2. If the condition returns false, an edge is followed into the PHI node > (possibly > indirectly). If the condition returns true, an edge is taken into the next > block in the chain, which we follow to attempt to keep expanding the known > chain this way. > 3. The final comparison block in the chain does not need to lead to PHI in its > TRUE case, I believe this is the case in LLVM, but I don't see how it's > needed. Since either way we can store the edge, merge the comparisons, > then reattach the edge to the new merged comparison block, since the > condition for the edge to be taken will be that all compared fields > were equal either way, so control flow is not affected. > This helps with recognizing partial chains of comparisons that precede e.g. > an array comparison that loops (which it does not recognize currently). > Since there is no clear start or end to any chain initially, I pick them in > the order they were listed in the PHI node, then discover the successors and > predecessors. This doesn't recognize the pattern generated by <=>, but > I tried to implement it to be as general as possible and recognize the chains > in any case where it does not affect the program's logic (in the case that > all sizes are known and sufficient memory is allocated). > I think it's best to keep the recognizing implementation general, before > adding clear restrictions around its use (like being applied to defaulted > equality operators for example, or at least distinguishing the pattern > restrictions > from the general case). I think limiting its application by just > adding restricting logic to the pattern recognizer logic could prove > unpredictable. There might not be any PHI node at all, there could be just branches (see the two functions which just guard some function call on the structure comparisons), it should handle that as well (so either PHI or all non-equal edges with the same destinations). E.g. reassoc maybe_optimize_range_tests can start from any stmt in the sequence and just walks it backwards and forward from there to find the first and last stmt (or bb) in the chain. That one is about optimizing say if (a && b && c && d && e && f && g && h) where the a-h expressions could have some cases optimized. Normally reassoc will handle just the single bb a & b & c & d & e & f & g & h cases, but expansion of the && on some targets is e.g. one expression per bb or two expressions per bb. I'd really suggest you look through what that code does because this pattern recognition is similar (but not the same). Also note that even in the PHI case, the last bb could be different from the others. Jakub