from:"Jed Davis"

An unusual x86_64 code model

2011-08-09 Thread Jed Davis

The vSphere Hypervisor (ESXi) kernel runs on x86_64 and loads all text
and data sections (for the kernel itself and for modules) within a 2GB
window that lives around virtual address 0x4180 (65.5 TiB). 
Thus, 32-bit absolute addresses won't work, but %rip-relative addressing
is fine.  Additionally, because this is a kernel, the usual issues of
"shared text" that discourage text relocations are inapplicable.

What this means in terms of GCC is: the usual small code model won't
work, nor -mcmodel=kernel, because they assume signed 32-bit addresses.
The large code model probably will work, but that turns everything into
movabs and indirect calls, which is unnecessarily inefficient.  The
closest approximation is -fPIC or -fPIE, but that assumes we want to
implement the PLT/GOT machinery in our loader, which we don't; it imposes
overhead for no benefit.

The existing workaround, which predates my personal involvement, is to
use -fPIE together with a -include'd file that uses a #pragma to set the
default symbol visibility to hidden, which suppresses the PLTness.
That works on GCC 4.1, but with newer versions that no longer affects
implicitly declared functions (which turn up occasionally in third-party
drivers), or coverage instrumentation's calls to __gcov_init, or probably
other things that have not yet been discovered.  Also, it was never an
ideal solution, except in that it didn't require modifying the compiler
(at the time).

Thus, I'm trying to find the right solution.  My current attempt is to
add an -mno-plt flag in i386.opt, and add it to the list of reasons not
to print "@PLT" after symbol names.  This seems to work, although I've
only done minimal testing so far.

But is that the right way to do that, do people think?  Or should I
look into making this its own -mcmodel option?  (Which would raise the
question of what to call it -- medsmall? smallhigh? altkernel?)  Or is
there some other way that this ought to be done?

Thanks,
--Jed

Re: An unusual x86_64 code model

2011-08-12 Thread Jed Davis

On Tue, Aug 09, 2011 at 04:58:01PM -0700, Andrew Pinski wrote:
> On Tue, Aug 9, 2011 at 4:26 PM, Jed Davis  wrote:
> > The existing workaround, which predates my personal involvement, is to
> > use -fPIE together with a -include'd file that uses a #pragma to set the
> > default symbol visibility to hidden, which suppresses the PLTness.
> > That works on GCC 4.1, but with newer versions that no longer affects
> > implicitly declared functions (which turn up occasionally in third-party
> > drivers), or coverage instrumentation's calls to __gcov_init, or probably
> > other things that have not yet been discovered.  Also, it was never an
> > ideal solution, except in that it didn't require modifying the compiler
> > (at the time).
> 
> Have you tried -fvisibility=hidden option ?

Sadly, that doesn't work:

$ cat test.c
int baz();
int quux() { return baz(); }
int foo() { return bar() + quux(); }
$ gcc -fprofile-arcs -fvisibility=hidden -fPIE -S test.c
$ grep call test.s
callbaz@PLT
callbar@PLT
callquux
call__gcov_init@PLT

The fine manual states that "extern declarations are not affected by
-fvisibility", so that result is expected.

Adding "#pragma GCC visibility push(hidden)" also takes care of baz
(declared and extern), but not bar (implicit) or __gcov_init (very
implicit).

--Jed

Re: An unusual x86_64 code model

2011-08-12 Thread Jed Davis

On Tue, Aug 09, 2011 at 04:26:06PM -0700, Jed Davis wrote:
> Thus, I'm trying to find the right solution.  My current attempt is to
> add an -mno-plt flag in i386.opt, and add it to the list of reasons not
> to print "@PLT" after symbol names.  This seems to work, although I've
> only done minimal testing so far.

Emphasis on "minimal"; a reference to the address of an extern variable
yields a @GOTPCREL.  So that wasn't going to work in any case.  The more
of i386.c I read, the more I realize that the resemblance to CM_SMALL_PIC
was mostly coincidental.

--Jed

Re: An unusual x86_64 code model

2011-08-17 Thread Jed Davis

Second attempt: I now have a modified GCC 4.4.3 which recognizes
-mcmodel=smallhigh; in CM_SMALLHIGH, pic_32bit_operand acts as it does
for PIC (to get lea instead of movabs), and legitimate_address_p accepts
SYMBOLIC_CONSTs with no indexing (for anything with a memory constraint).

Beyond that, the defaults (i.e., what happens if the code model isn't
any of the previously defined ones) happen to be more or less what I
want.  In particular, the operand printer chooses %rip-relative mode
over plain disp32 for any symbolic displacement in all the smallish
modes; this is commented as if it were just a space optimization
(avoiding the SIB byte), but in fact it's necessary for -fPIC to work,
so I think I can depend on it.

One thing I'm not so sure about is accepting any SYMBOLIC_CONST as a
legitimate address.  That allows, for example, a symbol address cast
to uintptr_t and added to (6ULL << 32), which will never fit.  On the
other hand, -fPIC allows offsets of up to +/- 16Mib for some unexplained
reason, meaning that I can break it by pushing the code+data size almost
to 2GiB with a large .bss and evaluating (uintptr_t)&_end+0xff.

I think I could try to fix that by interrogating the SYMBOL_REF_DECL for
the object's size, but given that -fPIC doesn't go that far, it's not
clear that I need to.

Thoughts?

Also, it may actually work now.  I've successfully bootstrapped with
BOOT_CFLAGS='-mcmodel=smallhigh -O2 -g', after which the only _32 or
_32S relocations in the gcc/ subdirectory of the objdir were either
.debug references that I assume are safe, or in crtstuff that's part of
the libgcc build and not affected by BOOT_CFLAGS.  (It also successfully
builds ESXi with kernel coverage enabled, but that's less informative
for people on this list.)

Once our lawyers approve, I can also send the actual diff; by my count,
it makes nontrivial changes to a whole seven lines of code.

--Jed

Re: An unusual x86_64 code model

2011-08-23 Thread Jed Davis

On Thu, Aug 18, 2011 at 03:37:15PM +0200, Michael Matz wrote:
> On Wed, 17 Aug 2011, Jed Davis wrote:
> 
> > One thing I'm not so sure about is accepting any SYMBOLIC_CONST as a
> > legitimate address.  That allows, for example, a symbol address cast
> > to uintptr_t and added to (6ULL << 32), which will never fit.  On the
> > other hand, -fPIC allows offsets of up to +/- 16Mib for some unexplained
> > reason,
> 
> The x86-64 ABI specifies this.  All symbols have to be located between 0x0 
> and 2^31-2^24-1, and that is so that everything in memory objects of 
> length less than 2^24 can be addressed directly.

Oh, of course.  For some reason I went through the ELF spec, but
didn't think to see what the x86_64 ABI had to say about code models.
Everything makes much more sense now.  Thanks for the pointer.

> Otherwise only the base address of symbols would be addressable
> directly and any offsetted variant would have to be calculated
> explicitely.

Right; that's what I was trying to avoid doing.

It looks like I can reuse legitimate_pic_address_disp_p for this; this
is not quite PIC, but the same set of non-immediate displacement-only
addresses is usable in general operands.

Except that then pic_32bit_operand does the wrong thing, because actual
PIC has hooks in the MI recog.c that affect the constraints (I think?),
and I don't.  But... what "pic_32bit_operand" actually means is "can I
use LEA to obtain this value?", and anything that's a legitimate address
in strict RTL can be LEA'ed.  So that takes care of that.  Back to
testing, I guess.

--Jed

x86_64 -mcmodel=smallhigh, cont'd

2012-01-17 Thread Jed Davis

I posted about this a few months ago, but I've been busy with
higher-priority work until recently, so I've only just picked it back up,
but I think I've fixed the last bug.

To review: my goal is to give the x86 backend a code model where code and
data reside within an arbitrary 2GiB of the address space, and where the
loader/runtime is not required to support the ELF PIC facilities.

The motivating example for this is the vSphere Hypervisor (ESXi) kernel
and its modules, which are loaded outside the range of both the "small"
and "kernel" models of the x86_64 ABI for technical reasons which,
for the sake of brevity, I will not attempt to explain here.  This
being a kernel environment and not a "shared text" user program, text
relocations are free; thus, the overhead of PLT/GOT indirection is
unnecessary and, indeed, unwelcome.

Implementing this code model appears to be relatively simple: legitimate
addresses are computed as for CM_SMALL_PIC but treating every symbol
as local, and any constant_address_p immediate can be loaded with lea
instead of movabs.

As before, if it sounds like I'm still doing this wrong, that would be
nice to know; this project is more or less my first nontrivial exposure
to GCC internals.

Our legal department appears to be fine with contributing this back,
although I want to wait until I get specific confirmation before mailing
any diffs.  At the moment I have diffs against 4.4.3 (yes, I know it's
old) and HEAD, but they still need documentation changes and testcases
before they'd be useful.

But my other question is: is there interest in having this change
contributed?  It's not inherently vendor-specific, but I'm having trouble
thinking of anyone else who'd want to use it, and I don't know what GCC's
policy is on features like that.

--Jed

Re: How to GTYize a struct properly?

2006-08-14 Thread Jed Davis

"Laurynas Biveinis" <[EMAIL PROTECTED]> writes:

> After my best effor so far:
>
> struct histogram_value_t GTY(())
> {
>   struct
> {/* <--- line 48, error below occurs here */
>   tree value; /* The value to profile.  */
>   tree stmt;  /* Insn containing the value.  */
>   gcov_type *counters;/* Pointer to first counter.  */
>   struct histogram_value_t GTY((chain_next("%h.next")) *next; 
> /*
> Linked list pointer.  */

At the risk of stating the obvious, those parentheses after "GTY" look
unbalanced to me.

-- 
(let ((C call-with-current-continuation)) (apply (lambda (x y) (x y)) (map
((lambda (r) ((C C) (lambda (s) (r (lambda l (apply (s s) l))  (lambda
(f) (lambda (l) (if (null? l) C (lambda (k) (display (car l)) ((f (cdr l))
(C k)))'((#\J #\d #\D #\v #\s) (#\e #\space #\a #\i #\newline)

Re: Merging identical functions in GCC

2006-09-18 Thread Jed Davis

Mike Stump <[EMAIL PROTECTED]> writes:

> On Sep 15, 2006, at 2:32 PM, Ross Ridge wrote:
>> Also, I don't think it's safe if you merge only functions in COMDAT
>> sections.
>
> Sure it is, one just needs to merge them as:
>
> variant1: nop
> variant2: nop
> variant3: nop
>   [ ... ]
>
> this way important inequalities still work.

As a convenient side-effect, setting breakpoints on only one variant
will also still work.

-- 
(let ((C call-with-current-continuation)) (apply (lambda (x y) (x y)) (map
((lambda (r) ((C C) (lambda (s) (r (lambda l (apply (s s) l))  (lambda
(f) (lambda (l) (if (null? l) C (lambda (k) (display (car l)) ((f (cdr l))
(C k)))'((#\J #\d #\D #\v #\s) (#\e #\space #\a #\i #\newline)

ARM, stack unwinding, and Firefox OS

2013-04-25 Thread Jed Davis

I've been working on profiling tools for Firefox OS, and one of the
central problems is getting stack traces for sample-based profiling.
The old APCS frame pointer variant (where r11/fp heads a linked list
of {fp, sp, lr, pc} frames) is convenient -- it's compatible with the
Linux kernel profiler as-is, it's simple to work with in general, and
in particular it's relatively easy to inject dynamically generated
pseudo-frames into the profile.

Of course, this doesn't work on Thumb, where the full stmdb/ldmia don't
exist.  But it almost works on Thumb2 -- the sp and pc can't be stored,
and the sp can't be loaded, but it's possible to save {r11, r12, r14}
(i.e., {fp, ip, lr}) along with the other saved registers, and obtain a
frame with the fp and lr fields at the same offsets as for -mapcs-frame.

This is, conveniently, enough for the Linux kernel profiler's user stack
walker.  It makes it possible to lose the second-last stack frame if
sampled between a call and committing the new frame to r11 -- I assume
this is what the saved PC is for? -- but this is the same situation
that, e.g., x86 frame pointer walking is in; and this is for profiling,
so full correctness isn't an absolute requirement.

(At some point I should mention -mtpcs-frame, which as of GCC 4.4.3
emits a nontrivial number of instructions to put the entire {fp, sp,
lr, pc} in the expected places... on Thumb1, and is silently ignored on
Thumb2, and seems to have no test coverage, and seems to have bit-rotted
in more recent versions.)

I've attached a patch (against GCC 4.4.3, because that's what we're
currently using) for comment.  The option probably needs a more serious
name than -mthumb2-fake-apcs-frame, and how it interacts with related
options may not be ideal.  But, more importantly, I'm essentially
inventing a vendor-specific ABI here, and I don't know if that's the
kind of thing that would be accepted.

--Jed

Re: ARM, stack unwinding, and Firefox OS

2013-04-25 Thread Jed Davis

On Thu, Apr 25, 2013 at 07:25:42PM -0700, Jed Davis wrote:
> I've attached a patch

Let's try that again

--Jed

diff --git a/gcc-4.4.3/gcc/config/arm/arm.c b/gcc-4.4.3/gcc/config/arm/arm.c
index bef07e3..ce6acf1 100644
--- a/gcc-4.4.3/gcc/config/arm/arm.c
+++ b/gcc-4.4.3/gcc/config/arm/arm.c
@@ -1381,6 +1381,21 @@ arm_override_options (void)
   target_flags &= ~MASK_APCS_FRAME;
 }
 
+  if (TARGET_THUMB2_FAKE_APCS_FRAME && !(insn_flags & FL_THUMB2))
+{
+  warning (0, "ignoring -mthumb2-fake-apcs-frame for non-Thumb2 target");
+  target_flags &= ~MASK_THUMB2_FAKE_APCS_FRAME;
+}
+
+  if (TARGET_THUMB2_FAKE_APCS_FRAME && TARGET_ARM)
+{
+  target_flags &= ~MASK_THUMB2_FAKE_APCS_FRAME;
+  if (!TARGET_APCS_FRAME)
+	{
+	  warning (0, "-mthumb2-fake-apcs-frame but not -mapcs-frame specified when compiling for ARM");
+	}
+}
+
   /* Callee super interworking implies thumb interworking.  Adding
  this to the flags here simplifies the logic elsewhere.  */
   if (TARGET_THUMB && TARGET_CALLEE_INTERWORKING)
@@ -12696,6 +12711,11 @@ arm_compute_save_reg_mask (void)
   if (cfun->machine->lr_save_eliminated)
 save_reg_mask &= ~ (1 << LR_REGNUM);
 
+  if (TARGET_THUMB2_FAKE_APCS_FRAME && (save_reg_mask & (1 << LR_REGNUM)))
+save_reg_mask |=
+  (1 << ARM_HARD_FRAME_POINTER_REGNUM)
+  | (1 << IP_REGNUM);
+
   if (TARGET_REALLY_IWMMXT
   && ((bit_count (save_reg_mask)
 	   + ARM_NUM_INTS (crtl->args.pretend_args_size +
@@ -14506,6 +14526,15 @@ arm_expand_prologue (void)
 	  RTX_FRAME_RELATED_P (insn) = 1;
 	}
 }
+  else if (TARGET_THUMB2_FAKE_APCS_FRAME &&
+	   (offsets->saved_regs_mask & (1 << ARM_HARD_FRAME_POINTER_REGNUM))) {
+rtx arm_fp_rtx = gen_raw_REG (Pmode, ARM_HARD_FRAME_POINTER_REGNUM);
+
+insn = GEN_INT (saved_regs);
+insn = emit_insn (gen_addsi3 (arm_fp_rtx, stack_pointer_rtx, insn));
+/* This is not "frame-related", because it doesn't set the frame
+   pointer that a debugger would use to find things. */
+  }
 
   if (offsets->outgoing_args != offsets->saved_args + saved_regs)
 {
diff --git a/gcc-4.4.3/gcc/config/arm/arm.h b/gcc-4.4.3/gcc/config/arm/arm.h
index 1189914..d50525e 100644
--- a/gcc-4.4.3/gcc/config/arm/arm.h
+++ b/gcc-4.4.3/gcc/config/arm/arm.h
@@ -837,11 +837,12 @@ extern int arm_structure_size_boundary;
  is an easy way of ensuring that it remains valid for all	\
  calls.  */			\
   if (TARGET_APCS_FRAME || TARGET_CALLER_INTERWORKING		\
-  || TARGET_TPCS_FRAME || TARGET_TPCS_LEAF_FRAME)		\
+  || TARGET_TPCS_FRAME || TARGET_TPCS_LEAF_FRAME		\
+  || TARGET_THUMB2_FAKE_APCS_FRAME)\
 {\
   fixed_regs[ARM_HARD_FRAME_POINTER_REGNUM] = 1;		\
   call_used_regs[ARM_HARD_FRAME_POINTER_REGNUM] = 1;	\
-  if (TARGET_CALLER_INTERWORKING)\
+  if (TARGET_CALLER_INTERWORKING || TARGET_THUMB2_FAKE_APCS_FRAME) \
 	global_regs[ARM_HARD_FRAME_POINTER_REGNUM] = 1;		\
 }\
   SUBTARGET_CONDITIONAL_REGISTER_USAGE\
diff --git a/gcc-4.4.3/gcc/config/arm/arm.opt b/gcc-4.4.3/gcc/config/arm/arm.opt
index 6aca395..5c8c0c1 100644
--- a/gcc-4.4.3/gcc/config/arm/arm.opt
+++ b/gcc-4.4.3/gcc/config/arm/arm.opt
@@ -37,6 +37,10 @@ mapcs-frame
 Target Report Mask(APCS_FRAME)
 Generate APCS conformant stack frames
 
+mthumb2-fake-apcs-frame
+Target Report Mask(THUMB2_FAKE_APCS_FRAME)
+Emulate APCS conformant stack frames in Thumb2 code
+
 mapcs-reentrant
 Target Report Mask(APCS_REENT)
 Generate re-entrant, PIC code

Re: Should -Wmaybe-uninitialized be included in -Wall?

2013-07-12 Thread Jed Davis

On Wed, Jul 10, 2013 at 06:11:11PM +0200, Andi Kleen wrote:
> FWIW basically -Werror -Wall defines a compiler version specific
> variant of C. May be great for individual developers, but it's always
> a serious mistake in any distributed Makefile.

Not always.  Any project large enough (or serious enough about build
reproducibility) to include its own toolchain can be written in that
compiler-version-specific subset and nonetheless be worked on by more
than one person.  This is not uncommon in the BSDs, for example; see
instances of "WARNS=4".

It's an uncommon use case (and, I think, not a justification for changing
-Wall), but it does exist and it is useful.

--Jed

An unusual x86_64 code model

Re: An unusual x86_64 code model

Re: An unusual x86_64 code model

Re: An unusual x86_64 code model

Re: An unusual x86_64 code model

x86_64 -mcmodel=smallhigh, cont'd

Re: How to GTYize a struct properly?

Re: Merging identical functions in GCC

ARM, stack unwinding, and Firefox OS

Re: ARM, stack unwinding, and Firefox OS

Re: Should -Wmaybe-uninitialized be included in -Wall?

11 matches

Site Navigation

Mail list logo

Footer information