On Thu, Feb 13, 2020 at 1:42 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > On Thu, Feb 13, 2020 at 01:28:43PM +0100, Uros Bizjak wrote: > > On Thu, Feb 13, 2020 at 1:06 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > On Thu, Feb 13, 2020 at 09:29:32AM +0100, Uros Bizjak wrote: > > > > On Wed, Feb 12, 2020 at 1:21 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > > > > > On Mon, Feb 10, 2020 at 12:01 PM Uros Bizjak <ubiz...@gmail.com> > > > > > wrote: > > > > > > > > > > > > On Mon, Feb 10, 2020 at 8:53 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > > > > > > > > > On Mon, Feb 10, 2020 at 11:40 AM Uros Bizjak <ubiz...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > On Mon, Feb 10, 2020 at 8:22 PM H.J. Lu <hjl.to...@gmail.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > Since nested function isn't only called directly, there is > > > > > > > > > ENDBR32 at > > > > > > > > > function entry and we need to skip it for direct jump in > > > > > > > > > trampoline. > > > > > > > > > > > > > > > > Hm, I'm afraid I don't understand this comment. Can you perhaps > > > > > > > > rephrase it? > > > > > > > > > > > > > > > > > > > > > > ix86_trampoline_init has > > > > > > > > > > > > > > /* Compute offset from the end of the jmp to the target > > > > > > > function. > > > > > > > In the case in which the trampoline stores the static > > > > > > > chain on > > > > > > > the stack, we need to skip the first insn which pushes > > > > > > > the > > > > > > > (call-saved) register static chain; this push is 1 byte. > > > > > > > */ > > > > > > > offset += 5; > > > > > > > disp = expand_binop (SImode, sub_optab, fnaddr, > > > > > > > plus_constant (Pmode, XEXP (m_tramp, > > > > > > > 0), > > > > > > > offset - (MEM_P (chain) > > > > > > > ? 1 : 0)), > > > > > > > NULL_RTX, 1, OPTAB_DIRECT); > > > > > > > emit_move_insn (mem, disp); > > > > > > > > > > > > > > Without CET, we got > > > > > > > > > > > > > > 0000011 <bar.1878>: > > > > > > > 11: 56 push %esi > > > > > > > 12: 55 push %ebp <<<<<< trampoline jumps > > > > > > > here. > > > > > > > 13: 89 e5 mov %esp,%ebp > > > > > > > 15: 83 ec 08 sub $0x8,%esp > > > > > > > > > > > > > > With CET, if bar isn't only called directly, we got > > > > > > > > > > > > > > 00000015 <bar.1878>: > > > > > > > 15: f3 0f 1e fb endbr32 > > > > > > > 19: 56 push %esi > > > > > > > 1a: 55 push %ebp <<<<<<<< trampoline > > > > > > > jumps here. > > > > > > > 1b: 89 e5 mov %esp,%ebp > > > > > > > 1d: 83 ec 08 sub $0x8,%esp > > > > > > > > > > > > > > We need to add 4 bytes for trampoline to skip endbr32. > > > > > > > > > > > > > > Here is the updated patch to check if nested function isn't only > > > > > > > called directly, > > > > > > > > > > > > Please figure out the final patch. I don't want to waste my time > > > > > > reviewing different version every half hour. Ping me in a couple of > > > > > > days. > > > > > > > > > > This is the final version: > > > > > > > > > > https://gcc.gnu.org/ml/gcc-patches/2020-02/msg00586.html > > > > > > > > > > You can try the testcase in the patch on any machine with CET binutils > > > > > since ENDBR32 is nop on none-CET machines. Without this patch, > > > > > the test will fail. > > > > > > > > Please rephrase the comment. I don't understand what it tries to say. > > > > > > > > > > Here is the patch with updated comments. OK for master and 8/9 branches? > > > > > > Thanks. > > > > > > H.J. > > > --- > > > Since pass_insert_endbranch inserts ENDBR32 at entry of the target > > > function if it may be called indirectly, we also need to skip the > > > 4-byte ENDBR32 for direct jump in trampoline if it is the case. > > > > " > > Skip ENDBR32 at the entry if called indirectly. > > " > > pass_insert_endbranch inserts ENDBR at the entry of the target > > function. We need to skip > > > > > > Tested on Linux/x86-64 CET machine with and without -m32. > > > > > > gcc/ > > > > > > PR target/93656 > > > * config/i386/i386.c (ix86_trampoline_init): Skip ENDBR32 at > > > the target function entry. > > > > > > gcc/testsuite/ > > > > > > PR target/93656 > > > * gcc.target/i386/pr93656.c: New test. > > > --- > > > gcc/config/i386/i386.c | 9 ++++++++- > > > gcc/testsuite/gcc.target/i386/pr93656.c | 4 ++++ > > > 2 files changed, 12 insertions(+), 1 deletion(-) > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr93656.c > > > > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > > > index 44bc0e0176a..52640b74cc8 100644 > > > --- a/gcc/config/i386/i386.c > > > +++ b/gcc/config/i386/i386.c > > > @@ -16839,9 +16839,16 @@ ix86_trampoline_init (rtx m_tramp, tree fndecl, > > > rtx chain_value) > > > the stack, we need to skip the first insn which pushes the > > > (call-saved) register static chain; this push is 1 byte. */ > > > offset += 5; > > > + int skip = MEM_P (chain) ? 1 : 0; > > > > offset += 5 - MEM_P (chain) ? 1: 0; > > This is done on purpose sinc there is > > gcc_assert (offset <= TRAMPOLINE_SIZE); > > after it. We shouldn't update offset. > > > > > > + /* Since pass_insert_endbranch inserts ENDBR32 at entry of the > > > + target function if it may be called indirectly, we also need > > > + to skip the 4-byte ENDBR32 if it is the case. */ > > > > /* Skip ENDBR32 at the entry of the target function when called indirectly. > > */ > > Fixed. > > > > > > + if (need_endbr > > > + && !cgraph_node::get (fndecl)->only_called_directly_p ()) > > > + skip += 4; > > > > offset += 4; > > > > > disp = expand_binop (SImode, sub_optab, fnaddr, > > > plus_constant (Pmode, XEXP (m_tramp, 0), > > > - offset - (MEM_P (chain) ? 1 : > > > 0)), > > > + offset - skip), > > > > plus_constant (Pmode, XEXP (m_tramp, 0), offset), > > > > Uros. > > > > Here is the updated patch. OK for master and 8/9 branches? > > Thanks. > > > H.J. > -- > Skip ENDBR32 at the entry of the target function when called indirectly.
Hm, this was sent too early, I was trying to say: "Skip ENDBR32 at the entry of target function when initializing trampoline." > > Tested on Linux/x86-64 CET machine with and without -m32. > > gcc/ > > PR target/93656 > * config/i386/i386.c (ix86_trampoline_init): Skip ENDBR32 at > the target function entry. > > gcc/testsuite/ > > PR target/93656 > * gcc.target/i386/pr93656.c: New test. LGTM for mainline, please wait a week before backporting. Uros. > --- > gcc/config/i386/i386.c | 8 +++++++- > gcc/testsuite/gcc.target/i386/pr93656.c | 4 ++++ > 2 files changed, 11 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr93656.c > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index 44bc0e0176a..3f3dcd53eb2 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -16839,9 +16839,15 @@ ix86_trampoline_init (rtx m_tramp, tree fndecl, rtx > chain_value) > the stack, we need to skip the first insn which pushes the > (call-saved) register static chain; this push is 1 byte. */ > offset += 5; > + int skip = MEM_P (chain) ? 1 : 0; > + /* Skip ENDBR32 at the entry of the target function when called > + indirectly. */ Just say "Skip ENDBR32 at the entry of the target function." The condition below is clear, the purpose is not. > + if (need_endbr > + && !cgraph_node::get (fndecl)->only_called_directly_p ()) > + skip += 4; > disp = expand_binop (SImode, sub_optab, fnaddr, > plus_constant (Pmode, XEXP (m_tramp, 0), > - offset - (MEM_P (chain) ? 1 : 0)), > + offset - skip), > NULL_RTX, 1, OPTAB_DIRECT); > emit_move_insn (mem, disp); > } > diff --git a/gcc/testsuite/gcc.target/i386/pr93656.c > b/gcc/testsuite/gcc.target/i386/pr93656.c > new file mode 100644 > index 00000000000..f0ac8c8edaa > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr93656.c > @@ -0,0 +1,4 @@ > +/* { dg-do run { target { ia32 && cet } } } */ > +/* { dg-options "-O2 -fcf-protection" } */ > + > +#include "pr67770.c" > -- > 2.24.1 >