https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120022

            Bug ID: 120022
           Summary: [Optimization opportunity] Related with Bug 119917 and
                    120020
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: a1343922569 at outlook dot com
  Target Milestone: ---

Relate to Bug 119917 and 120020
In Bug 120020, I give the wrong code at the first, and I want to fix the code,
but the administrator reply so fast and mark it invalid immediately. Now I will
re-show the right code, and the right explanation of this optimization
opportunity.

Godbolt link (GCC, generate suboptimal assembly code for myDivMod1, which is
not the same as myDivMod2)
64-bit variable: https://gcc.godbolt.org/z/7j9MsKsMr
32-bit variable: https://gcc.godbolt.org/z/EEKzWWx14
16-bit variable: https://gcc.godbolt.org/z/5j9zoxrro
8-bit variable: https://gcc.godbolt.org/z/3sWh9K6r5


Godbolt link (Clang, generate optimal assembly code for myDivMod1, which is the
same as myDivMod2)
64-bit variable: https://gcc.godbolt.org/z/cneMq3eYT
32-bit variable: https://gcc.godbolt.org/z/E1e9x65hj
16-bit variable: https://gcc.godbolt.org/z/3aq9dsa3M
8-bit variable: https://gcc.godbolt.org/z/8EG8TEzGq

Optimization suggestion
I suggest enhancing GCC to recognize situations where multiple non-volatile
inline assembly blocks across function calls share identical or highly similar
operations, and optimize them by merging the operations when semantically safe.
For the example above, the assembly code generated for myDivMod1 should be no
more complex than that of myDivMod2 and contain no more than one div
instruction.
Another strong reason to support this optimization suggestion is that, Clang
generates identical code for both myDivMod1 and myDivMod2 (see the godbolt link
above for details), which proves that Clang has effectively optimized the case
in myDivMod1, but GCC has not.

And the Bug 119917 is actually the right code, and the latest clang (trunk
version) has applied this optimization, including 8/16/32/64-bit situation. The
similar issue has mentionedin Bug 117529 about five months ago, but this issue
has still not been resolved so far, and with no more reply.

If my description has a little mistake, let's discuss it slowly, but don't
label it as "invalid" too early. Thank you for your understanding!

Reply via email to