https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109901

            Bug ID: 109901
           Summary: Optimization opportunity: ((((a) > (b)) - ((a) < (b)))
                    < 0) -> ((a) < (b))
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: richard.yao at alumni dot stonybrook.edu
  Target Milestone: ---

LLVM/Clang also misses this optimization opportunity, so I also filed an issue
with them:

https://github.com/llvm/llvm-project/issues/62790

The following transformations can be done as an optimization:

((((a) > (b)) - ((a) < (b))) < 0) -> ((a) < (b))

((((a) > (b)) - ((a) < (b))) <= 0) -> ((a) <= (b))

((((a) > (b)) - ((a) < (b))) == -1) -> ((a) < (b))

((((a) > (b)) - ((a) < (b))) == 1) -> ((a) > (b))

((((a) > (b)) - ((a) < (b))) == 0) -> ((a) == (b))

((((a) > (b)) - ((a) < (b))) > 0) -> ((a) > (b))

((((a) > (b)) - ((a) < (b))) >= 0) -> ((a) >= (b))

((((a) >= (b)) - ((a) <= (b))) < 0) -> ((a) < (b))

((((a) >= (b)) - ((a) <= (b))) <= 0) -> ((a) <= (b))

((((a) >= (b)) - ((a) <= (b))) == -1) -> ((a) < (b))

((((a) >= (b)) - ((a) <= (b))) == 1) -> ((a) > (b))

((((a) >= (b)) - ((a) <= (b))) == 0) -> ((a) == (b))

((((a) >= (b)) - ((a) <= (b))) > 0) -> ((a) > (b))

((((a) >= (b)) - ((a) <= (b))) >= 0) -> ((a) >= (b))


Both (((a) > (b)) - ((a) < (b))) and (((a) >= (b)) - ((a) <= (b))) will
generate -1, 0 or 1 when comparing two integers (signed or unsigned). When
comparators using this trick are inlined into the caller, the above
transformations become applicable.

I noticed that neither compiler exploits this high level optimization
opportunity when I was working on using a faster binary search implementation
for the OpenZFS b-tree code. It relied on a macro to achieve C++-style inlining
of the comparator into the implementation by making different versions of the
same function.

The following example at godbolt does not use a macro to make it easier to see
which lines of assembly correspond to which lines of high level C:

https://gcc.godbolt.org/z/cdedccxae

On amd64, GCC generates 15 instructions for the loop. If you comment out line
35 and uncomment line 37, GCC will generate 11 instructions for the loop. This
is the output GCC would produce if it supported that high level optimization.

Had the comparator returned 1 for less than and 0 for greater than or equal to,
we would have had the 11-instruction version of the loop without any need for
this optimization. Changing the semantics because our compilers lack this
optimization would be painful in part because the entire code base expects the
-1, 0 or 1 return value semantics and other code depends on these comparators.

It would be nice if GCC implemented this optimization.

Reply via email to