On Thu, Jun 9, 2011 at 11:17 AM, Ian Lance Taylor <i...@google.com> wrote: > asharif tools <asharif.to...@gmail.com> writes: > >> On Wed, Jun 8, 2011 at 10:32 PM, Ian Lance Taylor <i...@google.com> wrote: >>> asharif tools <asharif.to...@gmail.com> writes: >>> >>>> function: >>>> call __i686.get_pc_thunk.bx >>>> addl $_GLOBAL_OFFSET_TABLE_, %ebx >>>> movl %gs:20, %eax # Stack-guard init >>>> movl %eax, -12(%ebp) # Stack-guard init >>> >>>> Now, what I want to do is move stack guard initialization part >>>> (consisting of the two instructions I have commented as "Stack-guard >>>> init" into get_pc_thunk.bx for those functions that have both the >>>> stack guard and a call to get_pc_thunk.bx. The compiler should >>>> generate a "stack_guarded_get_pc_thunk.bx" that will do move the >>>> %gs:20 value to the correction location on the stack instead of >>>> executing nops. In this way some useful work can be done instead of >>>> nops. >>> >>> I don't understand how you can do that. The offset from %ebp will be >>> different in different functions. When optimizing, it is likely to be >>> an offset from %esp instead. The scratch register used may also be >>> different; consider functions with __attribute__ ((regparm (2))), or the >>> use of -mregparm=2. >> >> I see. >> >> Would it be possible for the caller of stack_protected_get_pc_thunk to >> pass in this offset from gs in the return register (ebx in this case) >> in all the cases you described? > > You mean the offset from %esp or %ebp. This would require an leal > instruction, so now you are only saving one instruction. And that by > itself would not be enough, because __stack_protected_get_pc_thunk would > not know which register it could use to move the value. But you could > have different variants of the function, or it could preserve the > register. With those conditions, yes, I think it would be possible. > But the savings seems fairly minimal to me, and it only matters on the > Atom. Not that I want to stop you if you are interested.
Ian, I got this to work with -O0 and a patch is attached for those who want to take a peek (It's a big hack right now and needs a lot of clean-up). This is what it does: 1. When gcc decides to add a call to get_pc_thunk for accessing globals with -fPIE, it checks if the stack guard is present in the current function. If so, it notes the base register, the offset and the scratch register used to move the stack guard from gs:0x14 to the base of the stack. 2. During the emission of get_pc_thunk, it generates extra get_pc_thunk()-like functions that use the base register, offset and scratch register noted in step (1). I learnt several things from implementing this and I want to improve on this implementation (of course a final clean-up would be required like changing the static array of get_pc_thunks to a VEC() or GTY(), etc. before I put this patch up for review). But before that I want some input from you. Here are some drawbacks of this current implementation: a. The one of immediate concern is that -O2 doesn't work with it. The reason is that between the call to get_pc_thunk() and the assembly to move the stack guard to the stack, there could be a write to the base register that was noted in step (1) above. So I'd have to note the def of that register and make sure that the call to get_pc_thunk() as well as all uses of the return register is after that def. b. It is too specific. I was thinking of scanning RTL instructions just before and after the get_pc_thunk() call and moving them to unique get_pc_thunk() functions instead of the nops that currently reside there. I could have a knob to control how many instructions to move there. For this transformation to be safe, I'd have to make sure offsets to esp are moved by 4 and the return register is not used in any of those instructions (because I want to fill up nops before the def of that return register). For (b), I'd like to save the RTL of the instructions around the call to get_pc_thunk and delete them from the function. Then, in ix86_code_end(), I want to be able to re-emit that RTL in assembly form. Do you think that is feasible? Is there a utility function to print RTL in assembly form easily so I can just use output_asm_insn in ix86_code_end()? > > Ian >
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 16d977e..1be797c 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -8768,11 +8768,30 @@ ix86_setup_frame_addresses (void) static int pic_labels_used; +typedef struct +{ + int stack_reg; + int stack_offset; + int scratch_reg; +} stack_guard_code; + + +/* TODO: Do this using a VEC */ +/* + DEF_VEC_P(stack_guard_code); +DEF_VEC_ALLOC_P(stack_guard_code, gc); +* + * static VEC(stack_guard_code, gc) stack_guard_codes; */ +static int stack_guard_codes_size; +static stack_guard_code stack_guard_codes[0x100]; + +int GET_PC_THUNK_NAME_SIZE = 0x100; + /* Fills in the label name that should be used for a pc thunk for the given register. */ static void -get_pc_thunk_name (char name[32], unsigned int regno) +get_pc_thunk_name (char name[GET_PC_THUNK_NAME_SIZE], unsigned int regno) { gcc_assert (!TARGET_64BIT); @@ -8782,6 +8801,11 @@ get_pc_thunk_name (char name[32], unsigned int regno) ASM_GENERATE_INTERNAL_LABEL (name, "LPR", regno); } +void get_hardened_thunk_suffix(stack_guard_code * c, char * buffer) +{ + sprintf(buffer, "_hardened_%d_%u_%d", c->stack_reg, c->stack_offset, c->scratch_reg); +} + /* This function generates code for -fpic that loads %ebx with the return address of the caller and then returns. */ @@ -8791,16 +8815,26 @@ ix86_code_end (void) { rtx xops[2]; int regno; + int i; + + for (i = 0; i < 1+stack_guard_codes_size; i++) + { for (regno = AX_REG; regno <= SP_REG; regno++) { - char name[32]; + char name[GET_PC_THUNK_NAME_SIZE]; tree decl; if (!(pic_labels_used & (1 << regno))) continue; get_pc_thunk_name (name, regno); + if (i != 0) + { + char suffix[0x100]; + get_hardened_thunk_suffix(&stack_guard_codes[i-1], suffix); + strcat(name, suffix); + } decl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL, get_identifier (name), @@ -8850,9 +8884,23 @@ ix86_code_end (void) /* Make sure unwind info is emitted for the thunk if needed. */ final_start_function (emit_barrier (), asm_out_file, 1); + if (i != 0) + { + rtx hardened_xops[2]; + hardened_xops[0] = gen_rtx_REG (Pmode, stack_guard_codes[i-1].scratch_reg); + hardened_xops[1] = gen_rtx_REG (Pmode, stack_guard_codes[i-1].stack_reg); + + output_asm_insn("movl\t%%gs:20, %0", hardened_xops); + char mov_insn_fmt[0x100]; + sprintf(mov_insn_fmt, "movl\t%%0, %d(%%1)", stack_guard_codes[i-1].stack_offset); + output_asm_insn(mov_insn_fmt, hardened_xops); + output_asm_insn("xor\t%0, %0", hardened_xops); + } + + /* Pad stack IP move with 4 instructions (two NOPs count as one instruction). */ - if (TARGET_PAD_SHORT_FUNCTION) + if (TARGET_PAD_SHORT_FUNCTION && i == 0) { int i = 8; @@ -8870,11 +8918,47 @@ ix86_code_end (void) set_cfun (NULL); current_function_decl = NULL; } + } if (flag_split_stack) file_end_indicate_split_stack (); } +void add_stack_guard_code(stack_guard_code * c) +{ + int i; + for( i=0 ; i<stack_guard_codes_size ; i++ ) + { + if(c->stack_reg == stack_guard_codes[i].stack_reg && + c->stack_offset == stack_guard_codes[i].stack_offset && + c->scratch_reg == stack_guard_codes[i].scratch_reg) + return; + } + stack_guard_codes[i] = *c; + stack_guard_codes_size++; +} + +// TODO: Get it to work again. +void remove_stack_guard() +{ + basic_block bb; + rtx insn; + FOR_EACH_BB (bb) + { + FOR_BB_INSNS(bb, insn) + { + if (NONJUMP_INSN_P(insn)) + { + int regno = REGNO(insn); + if (crtl->stack_guard_prologue == insn) + { + delete_insn(insn); + } + } + } + } +} + /* Emit code for the SET_GOT patterns. */ const char * @@ -8963,8 +9047,33 @@ output_set_got (rtx dest, rtx label ATTRIBUTE_UNUSED) } else { - char name[32]; + char name[GET_PC_THUNK_NAME_SIZE]; get_pc_thunk_name (name, REGNO (dest)); + if (crtl->stack_protect_guard) + { + rtx temp1, temp2; + temp1 = XEXP(crtl->stack_guard_prologue, 4); + temp2 = XEXP(XEXP(XVECEXP(temp1, 0, 0), 0), 0); + int stack_reg = REGNO(XEXP(temp2, 0)); + int stack_offset = XWINT(XEXP(temp2, 1), 0); + int scratch_reg = REGNO(XEXP(XVECEXP(temp1, 0, 1), 0)); + if (stack_reg == SP_REG) + { + stack_offset += 4; + } + stack_guard_code current_code; + current_code.stack_reg = stack_reg; + current_code.scratch_reg = scratch_reg; + current_code.stack_offset = stack_offset; + char suffix[0x100]; + get_hardened_thunk_suffix(¤t_code, suffix); + strcat(name, suffix); + +/// printf("stack_reg: %d stack_offset: %d scratch_reg: %d\n", stack_reg, stack_offset, scratch_reg); + add_stack_guard_code(¤t_code); + + remove_stack_guard(); + } pic_labels_used |= 1 << REGNO (dest); #ifdef DWARF2_UNWIND_INFO diff --git a/gcc/coretypes.h b/gcc/coretypes.h index 20932b8..bc745e5 100644 --- a/gcc/coretypes.h +++ b/gcc/coretypes.h @@ -171,5 +171,7 @@ union _dont_use_tree_here_; #endif -#endif /* coretypes.h */ +/* Ahmad added for debugging. */ +#define myprintf(format, args...) printf("%s +%d" format "\n", __FILE__, __LINE__, ##args); +#endif /* coretypes.h */ diff --git a/gcc/function.c b/gcc/function.c index 67f0098..554d9e9 100644 --- a/gcc/function.c +++ b/gcc/function.c @@ -4584,6 +4584,13 @@ stack_protect_prologue (void) rtx insn = gen_stack_protect_set (x, y); if (insn) { +/// emit_call_insn(GEN_CALL_VALUE (get_rtx_REG(VOIDmode, 0), const0_rtx, const0_rtx, NULL_RTX, const0_rtx)); +/// gen_call(const0_rtx, const0_rtx, const0_rtx); +/// myprintf(""); +/// emit_call_insn(gen_call(const0_rtx, NULL, NULL)); +/// emit_call_insn(gen_rtx_CALL(VOIDmode, gen_rtx_REG(QImode, AX_REG), NULL)); +/// emit_call_insn(gen_rtx_PARALLEL(VOIDmode, NULL)); + crtl->stack_guard_prologue = insn; emit_insn (insn); return; } diff --git a/gcc/function.h b/gcc/function.h index 7563ba1..5a0701d 100644 --- a/gcc/function.h +++ b/gcc/function.h @@ -250,6 +250,9 @@ struct GTY(()) rtl_data { /* For function.c */ + /* ahmad added */ + rtx stack_guard_prologue; + /* # of bytes of outgoing arguments. If ACCUMULATE_OUTGOING_ARGS is defined, the needed space is pushed by the prologue. */ int outgoing_args_size;