An unusual x86_64 code model
The vSphere Hypervisor (ESXi) kernel runs on x86_64 and loads all text and data sections (for the kernel itself and for modules) within a 2GB window that lives around virtual address 0x4180 (65.5 TiB). Thus, 32-bit absolute addresses won't work, but %rip-relative addressing is fine. Additionally, because this is a kernel, the usual issues of "shared text" that discourage text relocations are inapplicable. What this means in terms of GCC is: the usual small code model won't work, nor -mcmodel=kernel, because they assume signed 32-bit addresses. The large code model probably will work, but that turns everything into movabs and indirect calls, which is unnecessarily inefficient. The closest approximation is -fPIC or -fPIE, but that assumes we want to implement the PLT/GOT machinery in our loader, which we don't; it imposes overhead for no benefit. The existing workaround, which predates my personal involvement, is to use -fPIE together with a -include'd file that uses a #pragma to set the default symbol visibility to hidden, which suppresses the PLTness. That works on GCC 4.1, but with newer versions that no longer affects implicitly declared functions (which turn up occasionally in third-party drivers), or coverage instrumentation's calls to __gcov_init, or probably other things that have not yet been discovered. Also, it was never an ideal solution, except in that it didn't require modifying the compiler (at the time). Thus, I'm trying to find the right solution. My current attempt is to add an -mno-plt flag in i386.opt, and add it to the list of reasons not to print "@PLT" after symbol names. This seems to work, although I've only done minimal testing so far. But is that the right way to do that, do people think? Or should I look into making this its own -mcmodel option? (Which would raise the question of what to call it -- medsmall? smallhigh? altkernel?) Or is there some other way that this ought to be done? Thanks, --Jed
Re: An unusual x86_64 code model
On Tue, Aug 09, 2011 at 04:58:01PM -0700, Andrew Pinski wrote: > On Tue, Aug 9, 2011 at 4:26 PM, Jed Davis wrote: > > The existing workaround, which predates my personal involvement, is to > > use -fPIE together with a -include'd file that uses a #pragma to set the > > default symbol visibility to hidden, which suppresses the PLTness. > > That works on GCC 4.1, but with newer versions that no longer affects > > implicitly declared functions (which turn up occasionally in third-party > > drivers), or coverage instrumentation's calls to __gcov_init, or probably > > other things that have not yet been discovered. Also, it was never an > > ideal solution, except in that it didn't require modifying the compiler > > (at the time). > > Have you tried -fvisibility=hidden option ? Sadly, that doesn't work: $ cat test.c int baz(); int quux() { return baz(); } int foo() { return bar() + quux(); } $ gcc -fprofile-arcs -fvisibility=hidden -fPIE -S test.c $ grep call test.s callbaz@PLT callbar@PLT callquux call__gcov_init@PLT The fine manual states that "extern declarations are not affected by -fvisibility", so that result is expected. Adding "#pragma GCC visibility push(hidden)" also takes care of baz (declared and extern), but not bar (implicit) or __gcov_init (very implicit). --Jed
Re: An unusual x86_64 code model
On Tue, Aug 09, 2011 at 04:26:06PM -0700, Jed Davis wrote: > Thus, I'm trying to find the right solution. My current attempt is to > add an -mno-plt flag in i386.opt, and add it to the list of reasons not > to print "@PLT" after symbol names. This seems to work, although I've > only done minimal testing so far. Emphasis on "minimal"; a reference to the address of an extern variable yields a @GOTPCREL. So that wasn't going to work in any case. The more of i386.c I read, the more I realize that the resemblance to CM_SMALL_PIC was mostly coincidental. --Jed
Re: An unusual x86_64 code model
Second attempt: I now have a modified GCC 4.4.3 which recognizes -mcmodel=smallhigh; in CM_SMALLHIGH, pic_32bit_operand acts as it does for PIC (to get lea instead of movabs), and legitimate_address_p accepts SYMBOLIC_CONSTs with no indexing (for anything with a memory constraint). Beyond that, the defaults (i.e., what happens if the code model isn't any of the previously defined ones) happen to be more or less what I want. In particular, the operand printer chooses %rip-relative mode over plain disp32 for any symbolic displacement in all the smallish modes; this is commented as if it were just a space optimization (avoiding the SIB byte), but in fact it's necessary for -fPIC to work, so I think I can depend on it. One thing I'm not so sure about is accepting any SYMBOLIC_CONST as a legitimate address. That allows, for example, a symbol address cast to uintptr_t and added to (6ULL << 32), which will never fit. On the other hand, -fPIC allows offsets of up to +/- 16Mib for some unexplained reason, meaning that I can break it by pushing the code+data size almost to 2GiB with a large .bss and evaluating (uintptr_t)&_end+0xff. I think I could try to fix that by interrogating the SYMBOL_REF_DECL for the object's size, but given that -fPIC doesn't go that far, it's not clear that I need to. Thoughts? Also, it may actually work now. I've successfully bootstrapped with BOOT_CFLAGS='-mcmodel=smallhigh -O2 -g', after which the only _32 or _32S relocations in the gcc/ subdirectory of the objdir were either .debug references that I assume are safe, or in crtstuff that's part of the libgcc build and not affected by BOOT_CFLAGS. (It also successfully builds ESXi with kernel coverage enabled, but that's less informative for people on this list.) Once our lawyers approve, I can also send the actual diff; by my count, it makes nontrivial changes to a whole seven lines of code. --Jed
Re: An unusual x86_64 code model
On Thu, Aug 18, 2011 at 03:37:15PM +0200, Michael Matz wrote: > On Wed, 17 Aug 2011, Jed Davis wrote: > > > One thing I'm not so sure about is accepting any SYMBOLIC_CONST as a > > legitimate address. That allows, for example, a symbol address cast > > to uintptr_t and added to (6ULL << 32), which will never fit. On the > > other hand, -fPIC allows offsets of up to +/- 16Mib for some unexplained > > reason, > > The x86-64 ABI specifies this. All symbols have to be located between 0x0 > and 2^31-2^24-1, and that is so that everything in memory objects of > length less than 2^24 can be addressed directly. Oh, of course. For some reason I went through the ELF spec, but didn't think to see what the x86_64 ABI had to say about code models. Everything makes much more sense now. Thanks for the pointer. > Otherwise only the base address of symbols would be addressable > directly and any offsetted variant would have to be calculated > explicitely. Right; that's what I was trying to avoid doing. It looks like I can reuse legitimate_pic_address_disp_p for this; this is not quite PIC, but the same set of non-immediate displacement-only addresses is usable in general operands. Except that then pic_32bit_operand does the wrong thing, because actual PIC has hooks in the MI recog.c that affect the constraints (I think?), and I don't. But... what "pic_32bit_operand" actually means is "can I use LEA to obtain this value?", and anything that's a legitimate address in strict RTL can be LEA'ed. So that takes care of that. Back to testing, I guess. --Jed
x86_64 -mcmodel=smallhigh, cont'd
I posted about this a few months ago, but I've been busy with higher-priority work until recently, so I've only just picked it back up, but I think I've fixed the last bug. To review: my goal is to give the x86 backend a code model where code and data reside within an arbitrary 2GiB of the address space, and where the loader/runtime is not required to support the ELF PIC facilities. The motivating example for this is the vSphere Hypervisor (ESXi) kernel and its modules, which are loaded outside the range of both the "small" and "kernel" models of the x86_64 ABI for technical reasons which, for the sake of brevity, I will not attempt to explain here. This being a kernel environment and not a "shared text" user program, text relocations are free; thus, the overhead of PLT/GOT indirection is unnecessary and, indeed, unwelcome. Implementing this code model appears to be relatively simple: legitimate addresses are computed as for CM_SMALL_PIC but treating every symbol as local, and any constant_address_p immediate can be loaded with lea instead of movabs. As before, if it sounds like I'm still doing this wrong, that would be nice to know; this project is more or less my first nontrivial exposure to GCC internals. Our legal department appears to be fine with contributing this back, although I want to wait until I get specific confirmation before mailing any diffs. At the moment I have diffs against 4.4.3 (yes, I know it's old) and HEAD, but they still need documentation changes and testcases before they'd be useful. But my other question is: is there interest in having this change contributed? It's not inherently vendor-specific, but I'm having trouble thinking of anyone else who'd want to use it, and I don't know what GCC's policy is on features like that. --Jed
Re: How to GTYize a struct properly?
"Laurynas Biveinis" <[EMAIL PROTECTED]> writes: > After my best effor so far: > > struct histogram_value_t GTY(()) > { > struct > {/* <--- line 48, error below occurs here */ > tree value; /* The value to profile. */ > tree stmt; /* Insn containing the value. */ > gcov_type *counters;/* Pointer to first counter. */ > struct histogram_value_t GTY((chain_next("%h.next")) *next; > /* > Linked list pointer. */ At the risk of stating the obvious, those parentheses after "GTY" look unbalanced to me. -- (let ((C call-with-current-continuation)) (apply (lambda (x y) (x y)) (map ((lambda (r) ((C C) (lambda (s) (r (lambda l (apply (s s) l)) (lambda (f) (lambda (l) (if (null? l) C (lambda (k) (display (car l)) ((f (cdr l)) (C k)))'((#\J #\d #\D #\v #\s) (#\e #\space #\a #\i #\newline)
Re: Merging identical functions in GCC
Mike Stump <[EMAIL PROTECTED]> writes: > On Sep 15, 2006, at 2:32 PM, Ross Ridge wrote: >> Also, I don't think it's safe if you merge only functions in COMDAT >> sections. > > Sure it is, one just needs to merge them as: > > variant1: nop > variant2: nop > variant3: nop > [ ... ] > > this way important inequalities still work. As a convenient side-effect, setting breakpoints on only one variant will also still work. -- (let ((C call-with-current-continuation)) (apply (lambda (x y) (x y)) (map ((lambda (r) ((C C) (lambda (s) (r (lambda l (apply (s s) l)) (lambda (f) (lambda (l) (if (null? l) C (lambda (k) (display (car l)) ((f (cdr l)) (C k)))'((#\J #\d #\D #\v #\s) (#\e #\space #\a #\i #\newline)
ARM, stack unwinding, and Firefox OS
I've been working on profiling tools for Firefox OS, and one of the central problems is getting stack traces for sample-based profiling. The old APCS frame pointer variant (where r11/fp heads a linked list of {fp, sp, lr, pc} frames) is convenient -- it's compatible with the Linux kernel profiler as-is, it's simple to work with in general, and in particular it's relatively easy to inject dynamically generated pseudo-frames into the profile. Of course, this doesn't work on Thumb, where the full stmdb/ldmia don't exist. But it almost works on Thumb2 -- the sp and pc can't be stored, and the sp can't be loaded, but it's possible to save {r11, r12, r14} (i.e., {fp, ip, lr}) along with the other saved registers, and obtain a frame with the fp and lr fields at the same offsets as for -mapcs-frame. This is, conveniently, enough for the Linux kernel profiler's user stack walker. It makes it possible to lose the second-last stack frame if sampled between a call and committing the new frame to r11 -- I assume this is what the saved PC is for? -- but this is the same situation that, e.g., x86 frame pointer walking is in; and this is for profiling, so full correctness isn't an absolute requirement. (At some point I should mention -mtpcs-frame, which as of GCC 4.4.3 emits a nontrivial number of instructions to put the entire {fp, sp, lr, pc} in the expected places... on Thumb1, and is silently ignored on Thumb2, and seems to have no test coverage, and seems to have bit-rotted in more recent versions.) I've attached a patch (against GCC 4.4.3, because that's what we're currently using) for comment. The option probably needs a more serious name than -mthumb2-fake-apcs-frame, and how it interacts with related options may not be ideal. But, more importantly, I'm essentially inventing a vendor-specific ABI here, and I don't know if that's the kind of thing that would be accepted. --Jed
Re: ARM, stack unwinding, and Firefox OS
On Thu, Apr 25, 2013 at 07:25:42PM -0700, Jed Davis wrote: > I've attached a patch Let's try that again --Jed diff --git a/gcc-4.4.3/gcc/config/arm/arm.c b/gcc-4.4.3/gcc/config/arm/arm.c index bef07e3..ce6acf1 100644 --- a/gcc-4.4.3/gcc/config/arm/arm.c +++ b/gcc-4.4.3/gcc/config/arm/arm.c @@ -1381,6 +1381,21 @@ arm_override_options (void) target_flags &= ~MASK_APCS_FRAME; } + if (TARGET_THUMB2_FAKE_APCS_FRAME && !(insn_flags & FL_THUMB2)) +{ + warning (0, "ignoring -mthumb2-fake-apcs-frame for non-Thumb2 target"); + target_flags &= ~MASK_THUMB2_FAKE_APCS_FRAME; +} + + if (TARGET_THUMB2_FAKE_APCS_FRAME && TARGET_ARM) +{ + target_flags &= ~MASK_THUMB2_FAKE_APCS_FRAME; + if (!TARGET_APCS_FRAME) + { + warning (0, "-mthumb2-fake-apcs-frame but not -mapcs-frame specified when compiling for ARM"); + } +} + /* Callee super interworking implies thumb interworking. Adding this to the flags here simplifies the logic elsewhere. */ if (TARGET_THUMB && TARGET_CALLEE_INTERWORKING) @@ -12696,6 +12711,11 @@ arm_compute_save_reg_mask (void) if (cfun->machine->lr_save_eliminated) save_reg_mask &= ~ (1 << LR_REGNUM); + if (TARGET_THUMB2_FAKE_APCS_FRAME && (save_reg_mask & (1 << LR_REGNUM))) +save_reg_mask |= + (1 << ARM_HARD_FRAME_POINTER_REGNUM) + | (1 << IP_REGNUM); + if (TARGET_REALLY_IWMMXT && ((bit_count (save_reg_mask) + ARM_NUM_INTS (crtl->args.pretend_args_size + @@ -14506,6 +14526,15 @@ arm_expand_prologue (void) RTX_FRAME_RELATED_P (insn) = 1; } } + else if (TARGET_THUMB2_FAKE_APCS_FRAME && + (offsets->saved_regs_mask & (1 << ARM_HARD_FRAME_POINTER_REGNUM))) { +rtx arm_fp_rtx = gen_raw_REG (Pmode, ARM_HARD_FRAME_POINTER_REGNUM); + +insn = GEN_INT (saved_regs); +insn = emit_insn (gen_addsi3 (arm_fp_rtx, stack_pointer_rtx, insn)); +/* This is not "frame-related", because it doesn't set the frame + pointer that a debugger would use to find things. */ + } if (offsets->outgoing_args != offsets->saved_args + saved_regs) { diff --git a/gcc-4.4.3/gcc/config/arm/arm.h b/gcc-4.4.3/gcc/config/arm/arm.h index 1189914..d50525e 100644 --- a/gcc-4.4.3/gcc/config/arm/arm.h +++ b/gcc-4.4.3/gcc/config/arm/arm.h @@ -837,11 +837,12 @@ extern int arm_structure_size_boundary; is an easy way of ensuring that it remains valid for all \ calls. */ \ if (TARGET_APCS_FRAME || TARGET_CALLER_INTERWORKING \ - || TARGET_TPCS_FRAME || TARGET_TPCS_LEAF_FRAME) \ + || TARGET_TPCS_FRAME || TARGET_TPCS_LEAF_FRAME \ + || TARGET_THUMB2_FAKE_APCS_FRAME)\ {\ fixed_regs[ARM_HARD_FRAME_POINTER_REGNUM] = 1; \ call_used_regs[ARM_HARD_FRAME_POINTER_REGNUM] = 1; \ - if (TARGET_CALLER_INTERWORKING)\ + if (TARGET_CALLER_INTERWORKING || TARGET_THUMB2_FAKE_APCS_FRAME) \ global_regs[ARM_HARD_FRAME_POINTER_REGNUM] = 1; \ }\ SUBTARGET_CONDITIONAL_REGISTER_USAGE\ diff --git a/gcc-4.4.3/gcc/config/arm/arm.opt b/gcc-4.4.3/gcc/config/arm/arm.opt index 6aca395..5c8c0c1 100644 --- a/gcc-4.4.3/gcc/config/arm/arm.opt +++ b/gcc-4.4.3/gcc/config/arm/arm.opt @@ -37,6 +37,10 @@ mapcs-frame Target Report Mask(APCS_FRAME) Generate APCS conformant stack frames +mthumb2-fake-apcs-frame +Target Report Mask(THUMB2_FAKE_APCS_FRAME) +Emulate APCS conformant stack frames in Thumb2 code + mapcs-reentrant Target Report Mask(APCS_REENT) Generate re-entrant, PIC code
Re: Should -Wmaybe-uninitialized be included in -Wall?
On Wed, Jul 10, 2013 at 06:11:11PM +0200, Andi Kleen wrote: > FWIW basically -Werror -Wall defines a compiler version specific > variant of C. May be great for individual developers, but it's always > a serious mistake in any distributed Makefile. Not always. Any project large enough (or serious enough about build reproducibility) to include its own toolchain can be written in that compiler-version-specific subset and nonetheless be worked on by more than one person. This is not uncommon in the BSDs, for example; see instances of "WARNS=4". It's an uncommon use case (and, I think, not a justification for changing -Wall), but it does exist and it is useful. --Jed