https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109901
Bug ID: 109901 Summary: Optimization opportunity: ((((a) > (b)) - ((a) < (b))) < 0) -> ((a) < (b)) Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: richard.yao at alumni dot stonybrook.edu Target Milestone: --- LLVM/Clang also misses this optimization opportunity, so I also filed an issue with them: https://github.com/llvm/llvm-project/issues/62790 The following transformations can be done as an optimization: ((((a) > (b)) - ((a) < (b))) < 0) -> ((a) < (b)) ((((a) > (b)) - ((a) < (b))) <= 0) -> ((a) <= (b)) ((((a) > (b)) - ((a) < (b))) == -1) -> ((a) < (b)) ((((a) > (b)) - ((a) < (b))) == 1) -> ((a) > (b)) ((((a) > (b)) - ((a) < (b))) == 0) -> ((a) == (b)) ((((a) > (b)) - ((a) < (b))) > 0) -> ((a) > (b)) ((((a) > (b)) - ((a) < (b))) >= 0) -> ((a) >= (b)) ((((a) >= (b)) - ((a) <= (b))) < 0) -> ((a) < (b)) ((((a) >= (b)) - ((a) <= (b))) <= 0) -> ((a) <= (b)) ((((a) >= (b)) - ((a) <= (b))) == -1) -> ((a) < (b)) ((((a) >= (b)) - ((a) <= (b))) == 1) -> ((a) > (b)) ((((a) >= (b)) - ((a) <= (b))) == 0) -> ((a) == (b)) ((((a) >= (b)) - ((a) <= (b))) > 0) -> ((a) > (b)) ((((a) >= (b)) - ((a) <= (b))) >= 0) -> ((a) >= (b)) Both (((a) > (b)) - ((a) < (b))) and (((a) >= (b)) - ((a) <= (b))) will generate -1, 0 or 1 when comparing two integers (signed or unsigned). When comparators using this trick are inlined into the caller, the above transformations become applicable. I noticed that neither compiler exploits this high level optimization opportunity when I was working on using a faster binary search implementation for the OpenZFS b-tree code. It relied on a macro to achieve C++-style inlining of the comparator into the implementation by making different versions of the same function. The following example at godbolt does not use a macro to make it easier to see which lines of assembly correspond to which lines of high level C: https://gcc.godbolt.org/z/cdedccxae On amd64, GCC generates 15 instructions for the loop. If you comment out line 35 and uncomment line 37, GCC will generate 11 instructions for the loop. This is the output GCC would produce if it supported that high level optimization. Had the comparator returned 1 for less than and 0 for greater than or equal to, we would have had the 11-instruction version of the loop without any need for this optimization. Changing the semantics because our compilers lack this optimization would be painful in part because the entire code base expects the -1, 0 or 1 return value semantics and other code depends on these comparators. It would be nice if GCC implemented this optimization.