On Thu, Nov 7, 2024 at 5:50 AM H.J. Lu <hjl.to...@gmail.com> wrote: > > On Wed, Nov 6, 2024 at 6:01 PM Richard Biener > <richard.guent...@gmail.com> wrote: > > > > On Wed, Nov 6, 2024 at 10:52 AM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > On Wed, Nov 6, 2024 at 4:29 PM Richard Biener > > > <richard.guent...@gmail.com> wrote: > > > > > > > > On Tue, Nov 5, 2024 at 10:50 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > > > > > On Tue, Nov 5, 2024 at 5:27 PM Richard Biener > > > > > <richard.guent...@gmail.com> wrote: > > > > > > > > > > > > On Tue, Nov 5, 2024 at 10:09 AM Richard Biener > > > > > > <richard.guent...@gmail.com> wrote: > > > > > > > > > > > > > > On Tue, Nov 5, 2024 at 5:23 AM Jeff Law <jeffreya...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 11/4/24 8:13 PM, H.J. Lu wrote: > > > > > > > > > On Tue, Nov 5, 2024 at 10:57 AM Jeff Law > > > > > > > > > <jeffreya...@gmail.com> wrote: > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> On 11/4/24 7:52 PM, H.J. Lu wrote: > > > > > > > > >>> On Tue, Nov 5, 2024 at 8:48 AM Jeff Law > > > > > > > > >>> <jeffreya...@gmail.com> wrote: > > > > > > > > >>>> > > > > > > > > >>>> > > > > > > > > >>>> > > > > > > > > >>>> On 11/4/24 5:42 PM, H.J. Lu wrote: > > > > > > > > >>>>> On Tue, Nov 5, 2024 at 8:07 AM Jeff Law > > > > > > > > >>>>> <jeffreya...@gmail.com> wrote: > > > > > > > > >>>>>> > > > > > > > > >>>>>> > > > > > > > > >>>>>> > > > > > > > > >>>>>> On 11/1/24 4:32 PM, H.J. Lu wrote: > > > > > > > > >>>>>>> For targets, like x86, which define > > > > > > > > >>>>>>> TARGET_PROMOTE_PROTOTYPES to return > > > > > > > > >>>>>>> true, all integer arguments smaller than int are passed > > > > > > > > >>>>>>> as int: > > > > > > > > >>>>>>> > > > > > > > > >>>>>>> [hjl@gnu-tgl-3 pr14907]$ cat x.c > > > > > > > > >>>>>>> extern int baz (char c1); > > > > > > > > >>>>>>> > > > > > > > > >>>>>>> int > > > > > > > > >>>>>>> foo (char c1) > > > > > > > > >>>>>>> { > > > > > > > > >>>>>>> return baz (c1); > > > > > > > > >>>>>>> } > > > > > > > > >>>>>>> [hjl@gnu-tgl-3 pr14907]$ gcc -S -O2 -m32 x.c > > > > > > > > >>>>>>> [hjl@gnu-tgl-3 pr14907]$ cat x.s > > > > > > > > >>>>>>> .file "x.c" > > > > > > > > >>>>>>> .text > > > > > > > > >>>>>>> .p2align 4 > > > > > > > > >>>>>>> .globl foo > > > > > > > > >>>>>>> .type foo, @function > > > > > > > > >>>>>>> foo: > > > > > > > > >>>>>>> .LFB0: > > > > > > > > >>>>>>> .cfi_startproc > > > > > > > > >>>>>>> movsbl 4(%esp), %eax > > > > > > > > >>>>>>> movl %eax, 4(%esp) > > > > > > > > >>>>>>> jmp baz > > > > > > > > >>>>>>> .cfi_endproc > > > > > > > > >>>>>>> .LFE0: > > > > > > > > >>>>>>> .size foo, .-foo > > > > > > > > >>>>>>> .ident "GCC: (GNU) 14.2.1 20240912 (Red Hat > > > > > > > > >>>>>>> 14.2.1-3)" > > > > > > > > >>>>>>> .section .note.GNU-stack,"",@progbits > > > > > > > > >>>>>>> [hjl@gnu-tgl-3 pr14907]$ > > > > > > > > >>>>>>> > > > > > > > > >>>>>>> But integer promotion: > > > > > > > > >>>>>>> > > > > > > > > >>>>>>> movsbl 4(%esp), %eax > > > > > > > > >>>>>>> movl %eax, 4(%esp) > > > > > > > > >>>>>>> > > > > > > > > >>>>>>> isn't necessary if incoming arguments and outgoing > > > > > > > > >>>>>>> arguments are the > > > > > > > > >>>>>>> same. Use unpromoted incoming integer arguments as > > > > > > > > >>>>>>> outgoing arguments > > > > > > > > >>>>>>> if incoming integer arguments are the same as outgoing > > > > > > > > >>>>>>> arguments to > > > > > > > > >>>>>>> avoid unnecessary integer promotion. > > > > > > > > >>>>>> Is there a particular reason x86 can't use the same > > > > > > > > >>>>>> mechanisms that > > > > > > > > >>>>> > > > > > > > > >>>>> Other targets define TARGET_PROMOTE_PROTOTYPES to return > > > > > > > > >>>>> false > > > > > > > > >>>>> to avoid this issue. Changing x86 > > > > > > > > >>>>> TARGET_PROMOTE_PROTOTYPES > > > > > > > > >>>>> to return false will break LLVM which assumes that > > > > > > > > >>>>> incoming char/short > > > > > > > > >>>>> arguments on x86 are always extended to int. The > > > > > > > > >>>>> following targets > > > > > > > > >>>> Then my suggestion would be to cover this in REE somehow. > > > > > > > > >>>> We started > > > > > > > > >>>> looking at that a couple years ago and set it aside. But > > > > > > > > >>>> the basic > > > > > > > > >>>> idea was to expose the ABI guarantees to REE, then let REE > > > > > > > > >>>> do its thing. > > > > > > > > >>>> > > > > > > > > >>>> Jeff > > > > > > > > >>>> > > > > > > > > >>> > > > > > > > > >>> For > > > > > > > > >>> > > > > > > > > >>> extern int baz (char c1); > > > > > > > > >>> > > > > > > > > >>> int > > > > > > > > >>> foo (char c1) > > > > > > > > >>> { > > > > > > > > >>> return baz (c1); > > > > > > > > >>> } > > > > > > > > >>> > > > > > > > > >>> on i386, we get these > > > > > > > > >>> > > > > > > > > >>> (insn 7 4 8 2 (set (reg:SI 0 ax [orig:102 c1 ] [102]) > > > > > > > > >>> (sign_extend:SI (mem/c:QI (plus:SI (reg/f:SI 7 sp) > > > > > > > > >>> (const_int 4 [0x4])) [0 c1+0 S1 > > > > > > > > >>> A32]))) "x.c":6:10 > > > > > > > > >>> 190 {extendqisi2} > > > > > > > > >>> (expr_list:REG_EQUIV (mem:SI (reg/f:SI 16 argp) [0 > > > > > > > > >>> S4 A32]) > > > > > > > > >>> (nil))) > > > > > > > > >>> (insn 8 7 9 2 (set (mem:SI (plus:SI (reg/f:SI 7 sp) > > > > > > > > >>> (const_int 4 [0x4])) [0 S4 A32]) > > > > > > > > >>> (reg:SI 0 ax [orig:102 c1 ] [102])) "x.c":6:10 95 > > > > > > > > >>> {*movsi_internal} > > > > > > > > >>> (nil)) > > > > > > > > >>> (call_insn/j 9 8 10 2 (set (reg:SI 0 ax) > > > > > > > > >>> (call (mem:QI (symbol_ref:SI ("baz") [flags 0x41] > > > > > > > > >>> <function_decl 0x7f27347aae00 > > > > > > > > >>> baz>) [0 baz S1 A8]) > > > > > > > > >>> (const_int 4 [0x4]))) "x.c":6:10 1420 > > > > > > > > >>> {*sibcall_value} > > > > > > > > >>> (expr_list:REG_CALL_DECL (symbol_ref:SI ("baz") > > > > > > > > >>> [flags 0x41] > > > > > > > > >>> <function_decl 0x7f273 > > > > > > > > >>> 47aae00 baz>) > > > > > > > > >>> (nil)) > > > > > > > > >>> (expr_list:SI (use (mem:SI (reg/f:SI 16 argp) [0 S4 > > > > > > > > >>> A32])) > > > > > > > > >>> (nil))) > > > > > > > > >>> > > > > > > > > >>> before REE. How should REE work to elimate > > > > > > > > >>> > > > > > > > > >>> (insn 7 4 8 2 (set (reg:SI 0 ax [orig:102 c1 ] [102]) > > > > > > > > >>> (sign_extend:SI (mem/c:QI (plus:SI (reg/f:SI 7 sp) > > > > > > > > >>> (const_int 4 [0x4])) [0 c1+0 S1 > > > > > > > > >>> A32]))) "x.c":6:10 > > > > > > > > >>> 190 {extendqisi2} > > > > > > > > >>> (expr_list:REG_EQUIV (mem:SI (reg/f:SI 16 argp) [0 > > > > > > > > >>> S4 A32]) > > > > > > > > >>> (nil))) > > > > > > > > >>> (insn 8 7 9 2 (set (mem:SI (plus:SI (reg/f:SI 7 sp) > > > > > > > > >>> (const_int 4 [0x4])) [0 S4 A32]) > > > > > > > > >>> (reg:SI 0 ax [orig:102 c1 ] [102])) "x.c":6:10 95 > > > > > > > > >>> {*movsi_internal} > > > > > > > > >>> (nil)) > > > > > > > > >> You'll have to write code to describe the ABI how the values > > > > > > > > >> in question > > > > > > > > >> are already extended to REE. It's not going to "just work", > > > > > > > > >> you'll have > > > > > > > > >> to do some development. I'm not inclined to ACK the > > > > > > > > >> expansion patch at > > > > > > > > >> this time given we've got multiple ways to handle extension > > > > > > > > >> removal. > > > > > > > > >> > > > > > > > > > > > > > > > > > > For char and short parameters, x86 ABI leaves the upper bits > > > > > > > > > in > > > > > > > > > 32 bit fields undefined. If the outgoing arguments are the > > > > > > > > > same > > > > > > > > > as the incoming arguments, there is no need to extend outgoing > > > > > > > > > arguments. Also ABI info isn't available in REE. I am not > > > > > > > > > sure if > > > > > > > > > REE is the best fit here. > > > > > > > > The whole point is to make it available and utilize it. > > > > > > > > > > > > > > I think most of the complication is due to the promotion being > > > > > > > performed > > > > > > > on GIMPLE but GIMPLE not taking the incoming argument as promoted. > > > > > > > > > > > > > > On my wishlist is to defer "applying" promote_prototypes to call > > > > > > > RTL > > > > > > > expansion and keeping GIMPLE "type correct", aka _not_ promoting > > > > > > > arguments in the C/C++/Ada frontends. I'll note that not all > > > > > > > frontends > > > > > > > perform this promotion (sic!), and thus HJs pattern matching of > > > > > > > the > > > > > > > sibling call site is required for correctness even! > > > > > > > > > > > > > > I'll also note that there is DECL_ARG_TYPE on PARM_DECLs which > > > > > > > should reflect the incoming promotion state - using the target > > > > > > > hook in > > > > > > > RTL expansion is "wrong" (see above - not applied consistently), > > > > > > > but > > > > > > > DECL_ARG_TYPE should tell you where it was applied and where not. > > > > > > > With LTO and calling C from fortran this optimization might go > > > > > > > wrong > > > > > > > (fortran doesn't promote, C sets DECL_ARG_TYPE to promoted type). > > > > > > > > > > My patch only uses DECL_ARG_TYPE to compare against the incoming > > > > > argument to check if the incoming argument is used as outgoing > > > > > argument > > > > > unchanged. My patch doesn't change any functionality whether the > > > > > caller > > > > > does promotion or not since GCC always assumes the upper bits are > > > > > undefined. My patch keeps the incoming argument unchanged only > > > > > when there is no change in source code. > > > > > > > > > > > > > > > > > > > I'd say we should either > > > > > > > > > > > > > > a) drop targetm.promote_prototypes (and hope nobody uses it to > > > > > > > implement an ABI requirement) > > > > > > > b) apply targetm.promote_prototypes during RTL expansion > > > > > > > > > > > > > > So the patch as-is isn't good I think (if you don't succeed with > > > > > > > Fortran due to lack of small integer types you can also try D > > > > > > > or Modula for the LTO wrong-code testcase). > > > > > > > > > > > > > > ISTR we do rely on DECL_ARG_TYPE in combine but wrongly so > > > > > > > as said. > > > > > > > > > > > > program test > > > > > > use iso_c_binding, only: c_short > > > > > > interface > > > > > > subroutine foo(a) bind(c) > > > > > > import c_short > > > > > > integer(kind=c_short), intent(in), value :: a > > > > > > end subroutine foo > > > > > > end interface > > > > > > integer(kind=c_short) a(5); > > > > > > call foo (a(3)) > > > > > > end > > > > > > > > > > > > shows > > > > > > > > > > > > ;; foo (_7); > > > > > > > > > > > > (insn 13 12 14 (set (reg:HI 102) > > > > > > (mem/c:HI (plus:DI (reg/f:DI 93 virtual-stack-vars) > > > > > > (const_int -6 [0xfffffffffffffffa])) [1 a[2]+0 S2 > > > > > > A16])) "t.f90":10:19 -1 > > > > > > (nil)) > > > > > > > > > > > > (insn 14 13 15 (set (reg:HI 5 di) > > > > > > (reg:HI 102)) "t.f90":10:19 -1 > > > > > > (nil)) > > > > > > > > > > > > (call_insn 15 14 0 (call (mem:QI (symbol_ref:DI ("foo") [flags 0x41] > > > > > > <function_decl 0x7f7fcc430800 foo>) [0 foo S1 A8]) > > > > > > (const_int 0 [0])) "t.f90":10:19 -1 > > > > > > (expr_list:REG_CALL_DECL (symbol_ref:DI ("foo") [flags 0x41] > > > > > > <function_decl 0x7f7fcc430800 foo>) > > > > > > (nil)) > > > > > > (expr_list:HI (use (reg:HI 5 di)) > > > > > > (nil))) > > > > > > > > > > > > and with -Os generates > > > > > > > > > > > > movw 10(%rsp), %di > > > > > > call foo > > > > > > > > > > > > but a C TU would assume 'foo' receives promoted arguments. > > > > > > > > > > GCC follows x86 psABI and assumes the upper bits are undefined. > > > > > But LLVM assumes the upper bits are extended: > > > > > > > > I'm saying your patch increases the likelyhood we run into problems with > > > > out internal inconsistency of handling promote_prototypes - it adds to > > > > the > > > > number of places we wrongly assume all callers honor it when the > > > > callee was compiled in a way to honor the target hook. > > > > > > > > So IMO the patch is wrong (in the sense of wrong-code), and as given > > > > promote_prototypes is only useful for TU local binding functions. > > > > > > TARGET_PROMOTE_PROTOTYPES isn't defined for psABI purpose. > > > x86 psABI doesn't require it. GCC uses only the lower bits of incoming > > > arguments. But it isn't the GCC's job to promote incoming arguments > > > and copy them to outgoing arguments since not usages of such functions > > > are compiled by GCC. > > > > But if you now elide the promotion for the call to foo in > > > > static void foo (char c) { ... } > > Only calling local functions, like foo, is an issue. There is no > issue if foo is global. Optimization for calling global functions > is OK. Am I correct?
I think it's OK to elide the promotion for calls to global functions, but as you can't rely on the incoming promotion, this would eventually just undo promote_prototypes. Like when you LTO the Fortran TU I gave with the C implementation the Fortran FE will not have applied promote_prototypes to the call of the C implementation but the C implementation will have DECL_ARG_TYPE in a way to suggest it has. And both functions likely appear to be local due to using LTO. So whether it's safe to elide promote_prototypes depends on whether somebody else (in this case combine) would still rely on it being performed (I think combine is wrong today, at least when LTO is involved and thus it's "all callers visible" argument falls down). Richard. > > > void bar (char c) { foo (c); } > > > > then the existing combine optimization breaks since when all calls are > > from the local TU (as for foo()) we assume that the promotion happened > > but after your optimization it now depends on the callers of bar which > > we do not control. > > > > Richard. > > > > > > I showed a way to rectify this by applying promote_prototypes at > > > > call RTL expansion time (instead of or in addition to frontends). Iff > > > > we > > > > want to keep doing this "optimization" - given we perform this > > > > optimization > > > > in combine it got effectively part of the de-factor x86 ABI as we assume > > > > our callers perform the promotion, no? > > > > > > > > Richard. > > > > > > > > > $ cat x1.c > > > > > extern int baz (int); > > > > > > > > > > int > > > > > foo (char c1) > > > > > { > > > > > return baz (c1) + 1; > > > > > } > > > > > $ gcc -S -O2 x1.c > > > > > $ cat x1.s > > > > > .file "x1.c" > > > > > .text > > > > > .p2align 4 > > > > > .globl foo > > > > > .type foo, @function > > > > > foo: > > > > > .LFB0: > > > > > .cfi_startproc > > > > > subq $8, %rsp > > > > > .cfi_def_cfa_offset 16 > > > > > movsbl %dil, %edi > > > > > call baz > > > > > addq $8, %rsp > > > > > .cfi_def_cfa_offset 8 > > > > > addl $1, %eax > > > > > ret > > > > > .cfi_endproc > > > > > .LFE0: > > > > > .size foo, .-foo > > > > > .ident "GCC: (GNU) 14.2.1 20240912 (Red Hat 14.2.1-3)" > > > > > .section .note.GNU-stack,"",@progbits > > > > > $ clang -S -O2 x1.c > > > > > $ cat x1.s > > > > > .text > > > > > .file "x1.c" > > > > > .globl foo # -- Begin function foo > > > > > .p2align 4, 0x90 > > > > > .type foo,@function > > > > > foo: # @foo > > > > > .cfi_startproc > > > > > # %bb.0: > > > > > pushq %rax > > > > > .cfi_def_cfa_offset 16 > > > > > callq baz > > > > > incl %eax > > > > > popq %rcx > > > > > .cfi_def_cfa_offset 8 > > > > > retq > > > > > .Lfunc_end0: > > > > > .size foo, .Lfunc_end0-foo > > > > > .cfi_endproc > > > > > # -- End function > > > > > .ident "clang version 19.1.0 (Fedora 19.1.0-1.fc41)" > > > > > .section ".note.GNU-stack","",@progbits > > > > > .addrsig > > > > > $ > > > > > > > > > > -- > > > > > H.J. > > > > > > > > > > > > -- > > > H.J. > > > > -- > H.J.