Re: Help with comparison-merging optimization pass

Thomas de Bock via Gcc Wed, 06 Aug 2025 06:33:26 -0700

> On Wed, Aug 06, 2025 at 12:29:09PM +0000, Thomas de Bock wrote:
> > I've looked at the pattern LLVM recognizes and there is indeed a lot of 
> > different
> > ways we could recognize the chains and generalize the optimization.
> > The way I do the pattern recognition now is by changing the conditions to 
> > ==,

> You can't rely on just == even for equality, it needs to handle both == and
> != and based on which edge has EDGE_TRUE_VALUE vs. EDGE_FALSE_VALUE and
> whether it was EQ_EXPR or NE_EXPR decide what edge is for equality and what
> is for non-equality.  Whether the comparisons are EQ_EXPR or NE_EXPR can be
> quite random.

Apologies if I was unclear or misunderstand, I believe that's exactly what I am
doing right now. I change the !='s' to =='s' and switch their true with their 
false
edge, from there we can simply find the equality edge by finding the TRUE edge.
If this is an unwelcome change however, it can easily be done without changing
the operator.

> > redirecting the edges if needed. Since memcmp will only give us whether all 
> > fields
> > were equal or not, a mix of these operators or their outgoing edges 
> > wouldn't work.

> That is not true.

> "The memcmp() function returns an integer less than, equal to, or greater
> than zero if the first n bytes of s1 is found, respectively, to be less
> than, to match, or be greater than the first n bytes of s2.

> For a nonzero return value, the sign is determined by the sign of the
> difference between the first pair of bytes (interpreted as unsigned char)
> that differ in s1 and s2."

Sorry for my wording, that is not the only thing memcmp gives us, but it
does not give us information regarding which fields mismatched. So we cannot
deal with (after !='s are converted to =='s) blocks where not all blocks in
a chain have the same node as their (eventual, possibly through fallthru's)
FALSE edge's destination, is what I was trying to say.

> So, for equality only you can pattern match eq/ne comparison of integral
> types with no padding (TYPE_SIZE in multiples of BITS_PER_UNIT,
> TYPE_PRECISION equal to TYPE_SIZE), but you can also handle the <=> cases
> if the comparisons are unsigned char or little endian above mentioned
> unsigned larger types.
> For <=> it would need to canonicalize the memcmp result (which can return
> any negative, zero or any positive, to -1/0/1 or whatever other constants
> are used for the PHIs for the <, == and > cases).
> Even the equality doesn't have to have 0/1 in PHIs, it can be 1/0 or
> 42/53 etc., just look at return *a == *b ? 42 : 53;

I agree, currently the pass just makes sure all blocks in a chain have the same
value in their PHI arg, only the TRUE edge of the final block in the chain
may differ, since control flow taking that edge is equivalent to the memcmp
we replace the blocks with returning 0 (since all fields must have been
equal for it to get there).

> > Then discovering chains where:
> > 1. All blocks in the chain perform 2 loads, then a condition of their 
> > equality.
> >   The PHI argument value corresponding to each block in the chain should be 
> > the
> >   same, except for possibly a TRUE edge from the final block in the chain
> >   (possibly indirectly, by a path of empty blocks with fallthru edges)
> > 2. If the condition returns false, an edge is followed into the PHI node 
> > (possibly
> >   indirectly). If the condition returns true, an edge is taken into the next
> >   block in the chain, which we follow to attempt to keep expanding the known
> >   chain this way.
> > 3. The final comparison block in the chain does not need to lead to PHI in 
> > its
> >   TRUE case, I believe this is the case in LLVM, but I don't see how it's
> >   needed. Since either way we can store the edge, merge the comparisons,
> >   then reattach the edge to the new merged comparison block, since the
> >   condition for the edge to be taken will be that all compared fields
> >   were equal either way, so control flow is not affected.
> >   This helps with recognizing partial chains of comparisons that precede 
> > e.g.
> >   an array comparison that loops (which it does not recognize currently).
> > Since there is no clear start or end to any chain initially, I pick them in
> > the order they were listed in the PHI node, then discover the successors and
> > predecessors. This doesn't recognize the pattern generated by <=>, but
> > I tried to implement it to be as general as possible and recognize the 
> > chains
> > in any case where it does not affect the program's logic (in the case that
> > all sizes are known and sufficient memory is allocated).
> > I think it's best to keep the recognizing implementation general, before
> > adding clear restrictions around its use (like being applied to defaulted
> > equality operators for example, or at least distinguishing the pattern 
> > restrictions
> > from the general case). I think limiting its application by just
> > adding restricting logic to the pattern recognizer logic could prove 
> > unpredictable.

> There might not be any PHI node at all, there could be just branches (see
> the two functions which just guard some function call on the structure
> comparisons), it should handle that as well (so either PHI or all non-equal
> edges with the same destinations).
> E.g. reassoc maybe_optimize_range_tests can start from any stmt in the
> sequence and just walks it backwards and forward from there to find the
> first and last stmt (or bb) in the chain.  That one is about optimizing
> say if (a && b && c && d && e && f && g && h)
> where the a-h expressions could have some cases optimized.  Normally reassoc
> will handle just the single bb a & b & c & d & e & f & g & h cases, but
> expansion of the && on some targets is e.g. one expression per bb or two
> expressions per bb.
> I'd really suggest you look through what that code does because this
> pattern recognition is similar (but not the same).
> Also note that even in the PHI case, the last bb could be different from the
> others.

I started implementing the optimization with mostly the default equality
operator in mind, but you are right, ideally, it does not depend on the
presence of a PHI node, I'll look into that pass thank you.

>         Jakub

________________________________
From: Jakub Jelinek <[email protected]>
Sent: 06 August 2025 12:50:36
To: Thomas de Bock
Cc: Richard Biener; [email protected]
Subject: [ext] Re: Help with comparison-merging optimization pass

On Wed, Aug 06, 2025 at 12:29:09PM +0000, Thomas de Bock wrote:
> I've looked at the pattern LLVM recognizes and there is indeed a lot of 
> different
> ways we could recognize the chains and generalize the optimization.
> The way I do the pattern recognition now is by changing the conditions to ==,

You can't rely on just == even for equality, it needs to handle both == and
!= and based on which edge has EDGE_TRUE_VALUE vs. EDGE_FALSE_VALUE and
whether it was EQ_EXPR or NE_EXPR decide what edge is for equality and what
is for non-equality.  Whether the comparisons are EQ_EXPR or NE_EXPR can be
quite random.

> redirecting the edges if needed. Since memcmp will only give us whether all 
> fields
> were equal or not, a mix of these operators or their outgoing edges wouldn't 
> work.

That is not true.

"The memcmp() function returns an integer less than, equal to, or greater
than zero if the first n bytes of s1 is found, respectively, to be less
than, to match, or be greater than the first n bytes of s2.

For a nonzero return value, the sign is determined by the sign of the
difference between the first pair of bytes (interpreted as unsigned char)
that differ in s1 and s2."

So, for equality only you can pattern match eq/ne comparison of integral
types with no padding (TYPE_SIZE in multiples of BITS_PER_UNIT,
TYPE_PRECISION equal to TYPE_SIZE), but you can also handle the <=> cases
if the comparisons are unsigned char or little endian above mentioned
unsigned larger types.
For <=> it would need to canonicalize the memcmp result (which can return
any negative, zero or any positive, to -1/0/1 or whatever other constants
are used for the PHIs for the <, == and > cases).
Even the equality doesn't have to have 0/1 in PHIs, it can be 1/0 or
42/53 etc., just look at return *a == *b ? 42 : 53;

> Then discovering chains where:
> 1. All blocks in the chain perform 2 loads, then a condition of their 
> equality.
>   The PHI argument value corresponding to each block in the chain should be 
> the
>   same, except for possibly a TRUE edge from the final block in the chain
>   (possibly indirectly, by a path of empty blocks with fallthru edges)
> 2. If the condition returns false, an edge is followed into the PHI node 
> (possibly
>   indirectly). If the condition returns true, an edge is taken into the next
>   block in the chain, which we follow to attempt to keep expanding the known
>   chain this way.
> 3. The final comparison block in the chain does not need to lead to PHI in its
>   TRUE case, I believe this is the case in LLVM, but I don't see how it's
>   needed. Since either way we can store the edge, merge the comparisons,
>   then reattach the edge to the new merged comparison block, since the
>   condition for the edge to be taken will be that all compared fields
>   were equal either way, so control flow is not affected.
>   This helps with recognizing partial chains of comparisons that precede e.g.
>   an array comparison that loops (which it does not recognize currently).
> Since there is no clear start or end to any chain initially, I pick them in
> the order they were listed in the PHI node, then discover the successors and
> predecessors. This doesn't recognize the pattern generated by <=>, but
> I tried to implement it to be as general as possible and recognize the chains
> in any case where it does not affect the program's logic (in the case that
> all sizes are known and sufficient memory is allocated).
> I think it's best to keep the recognizing implementation general, before
> adding clear restrictions around its use (like being applied to defaulted
> equality operators for example, or at least distinguishing the pattern 
> restrictions
> from the general case). I think limiting its application by just
> adding restricting logic to the pattern recognizer logic could prove 
> unpredictable.

There might not be any PHI node at all, there could be just branches (see
the two functions which just guard some function call on the structure
comparisons), it should handle that as well (so either PHI or all non-equal
edges with the same destinations).
E.g. reassoc maybe_optimize_range_tests can start from any stmt in the
sequence and just walks it backwards and forward from there to find the
first and last stmt (or bb) in the chain.  That one is about optimizing
say if (a && b && c && d && e && f && g && h)
where the a-h expressions could have some cases optimized.  Normally reassoc
will handle just the single bb a & b & c & d & e & f & g & h cases, but
expansion of the && on some targets is e.g. one expression per bb or two
expressions per bb.
I'd really suggest you look through what that code does because this
pattern recognition is similar (but not the same).
Also note that even in the PHI case, the last bb could be different from the
others.

        Jakub

This e-mail and any attachments may contain information that is confidential 
and proprietary and otherwise protected from disclosure. If you are not the 
intended recipient of this e-mail, do not read, duplicate or redistribute it by 
any means. Please immediately delete it and any attachments and notify the 
sender that you have received it by mistake. Unintended recipients are 
prohibited from taking action on the basis of information in this e-mail or any 
attachments. The DRW Companies make no representations that this e-mail or any 
attachments are free of computer viruses or other defects.

Re: Help with comparison-merging optimization pass

Reply via email to