from:"mathieu lacage"

dwarf2 basic block start information

2005-11-09 Thread mathieu lacage


hi,

Since the cvs version of gas supports extensions for the dwarf2 
basic_block location information, I thought I could try to add support 
to gcc for this feature. My use of this feature is related to binary 
code analysis: being able to gather the bb boundaries through gcc's 
debugging output would save me reverse engineering it from the binary 
code itself.


The attached code is the start of a patch to do this. It would be really 
nice to have feedback on:

 - the approach choosen
 - the bugs which I have stumbled upon.

The patch itself is pretty straightforward. I have simply added an 
argument to the source_line debug_hook and I have implemented it 
correctly (I think) for the dwarf2 backend. The final.c pass now reads 
the rtl BASIC_BLOCK note to invoke source_line correctly. Is this the 
right approach ?


I have tested lightly this patch on the sample C code below on x86 with 
gcc svn HEAD and binutils cvs HEAD:


#include 
static int foo (void)
{
  if (3) {
  int i = 0;
  while (i < 100) {
  printf ("test\n");
  i++;
  }
  }
  return 8;
}

int main (int argc, char *argv[])
{
  foo ();
  return 0;
}

While the debugging output looks quite correct at -O0, the -O2 output 
seems broken:

 :
 0:   8d 4c 24 04 lea0x4(%esp),%ecx
 4:   83 e4 f0and$0xfff0,%esp
 7:   ff 71 fcpushl  0xfffc(%ecx)
 a:   55  push   %ebp
 b:   89 e5   mov%esp,%ebp
 d:   53  push   %ebx
 e:   31 db   xor%ebx,%ebx
10:   51  push   %ecx
11:   83 ec 10sub$0x10,%esp
14:   8d b6 00 00 00 00   lea0x0(%esi),%esi
1a:   8d bf 00 00 00 00   lea0x0(%edi),%edi
20:   c7 04 24 00 00 00 00movl   $0x0,(%esp)
27:   43  inc%ebx
28:   e8 fc ff ff ff  call   29 
2d:   83 fb 64cmp$0x64,%ebx
30:   75 ee   jne20 
32:   83 c4 10add$0x10,%esp
35:   31 c0   xor%eax,%eax
37:   59  pop%ecx
38:   5b  pop%ebx
39:   5d  pop%ebp
3a:   8d 61 fclea0xfffc(%ecx),%esp
3d:   c3  ret

With this list of basic block boundaries as reported by the debugging 
information:

ad: 0x0
ad: 0x11
ad: 0x20
ad: 0x32

Clearly, 0x11 is not a bb boundary so we have a bug. Despite the fact 
that my understanding of gcc internals is close to nil, it seems to me 
that this problem is most likely related to some sort of inlining pass 
which did not update the rtl BASIC_BLOCK note. Thus, the following 
questions:


1) is it expected that some rtl optimization passes would bork the 
BASIC_BLOCK notes ?
2) if it is, are these known culprits and would there be interest in 
patches to try to fix this ?
3) have anyone an idea of which passes might be the culprits ? (it might 
save a lot of time wandering through gcc sources)


If someone is interested in trying out this patch, the released version 
of readelf on my system seems to be able to dump the basic block dwarf2 
instructions with --debug-dump=line. I have also written a small tool to 
dump only this information there: http://cutebugs.net/code/bozo-profiler/


The test binary generated by the top-level Makefile in bin/test/ can be 
invoked with:

test dw2_bb [BINARY FILE]

[EMAIL PROTECTED] bozo-profiler]$ make
make: Nothing to be done for `all'.
[EMAIL PROTECTED] bozo-profiler]$ ./bin/test/test dw2_bb bin/test/test

regards,
Mathieu
Index: gcc/final.c
===
--- gcc/final.c (revision 106485)
+++ gcc/final.c (working copy)
@@ -129,6 +129,8 @@
 static rtx debug_insn;
 rtx current_output_insn;
 
+int current_start_basic_block = 0;
+
 /* Line number of last NOTE.  */
 static int last_linenum;
 
@@ -1744,6 +1746,7 @@
  else
*seen |= SEEN_BB;
 
+ current_start_basic_block = 1;
  break;
 
case NOTE_INSN_EH_REGION_BEG:
@@ -2071,8 +2074,21 @@
   note in a row.  */
if (notice_source_line (insn))
  {
-   (*debug_hooks->source_line) (last_linenum, last_filename);
+   if (current_start_basic_block)
+ {
+   current_start_basic_block = 0;
+   (*debug_hooks->source_line) (last_linenum, last_filename, 
LINE_FLAG_BASIC_BLOCK);
+ }
+   else 
+ {
+   (*debug_hooks->source_line) (last_linenum, last_filename, 0);
+ }
  }
+   else if (current_start_basic_block)
+ {
+   current_start_basic_block = 0;
+   (*debug_hooks->source_line) (last_linenum, last_filename, 
LINE_FLAG_BASIC_BLOCK);
+ }
 
if (GET_CODE (body) == ASM_INPUT)
  {
@@ -2498,6 +2

Re: dwarf2 basic block start information

2005-11-14 Thread Mathieu Lacage

On Mon, 2005-11-14 at 21:30 -0500, Daniel Jacobowitz wrote:
> On Wed, Nov 09, 2005 at 07:19:45PM +0100, mathieu lacage wrote:
> > While the debugging output looks quite correct at -O0, the -O2 output 
> > seems broken:
> >  :
> >  0:   8d 4c 24 04 lea0x4(%esp),%ecx
> >  4:   83 e4 f0and$0xfff0,%esp
> >  7:   ff 71 fcpushl  0xfffc(%ecx)
> >  a:   55  push   %ebp
> >  b:   89 e5   mov%esp,%ebp
> >  d:   53  push   %ebx
> >  e:   31 db   xor%ebx,%ebx
> > 10:   51  push   %ecx
> > 11:   83 ec 10sub$0x10,%esp
> > 14:   8d b6 00 00 00 00   lea0x0(%esi),%esi
> 
> > With this list of basic block boundaries as reported by the debugging 
> > information:
> > ad: 0x0
> > ad: 0x11
> 
> > Clearly, 0x11 is not a bb boundary so we have a bug. Despite the fact 

This was a bug in my dwarf2 reading code. I fixed it and this testcase
works for me now.

> No, not clear at all.  Every place which could be the target of a jump
> will be the start of a basic block, but you are not guaranteed that all
> sequential basic blocks are combined.  Probably either Jim's right and

It would be nice if you could post an example where they are not
combined.

> it's related to the end of the prologue, or it's a different basic
> block because of some artifact of inlining.  This shouldn't present any
> problem for a tool using the basic block information.

Inlining or end-of-prologue do not seem to have an influence on this. It
seems to actually work quite well. I will send an updated version of the
patch in another email.

Mathieu
--

Re: dwarf2 basic block start information

2005-11-15 Thread Mathieu Lacage

Here is an updated version with a few bugs fixed (How I managed to
introduce bugs in a 20-liner patch still eludes me).

On Mon, 2005-11-14 at 21:26 -0500, Daniel Jacobowitz wrote: 
> On Mon, Nov 14, 2005 at 06:24:47PM -0800, Jim Wilson wrote:
> > mathieu lacage wrote:
> > >Clearly, 0x11 is not a bb boundary so we have a bug. 
> > 
> > Looks like it could be the prologue end, but I don't see any obvious 
> > reason why this patch could do that.  I suggest you try debugging your 
> > patch to see why you are getting the extra call with 
> > LINE_FLAG_BASIC_BLOCK set in this case.
> > 
> > Using -p would make the diff more readable.

svn diff -x -p does not work here. Is there a magic incantation I should
run to produce such a diff ?

> > 
> > We get complaints every time the debug info size increases.  Since this 
> > is apparently only helpful to an optional utility, this extra debug info 
> > should not be emitted by default.  There should be an option to emit it.

Any suggestion on a name ? 

> I'd like to know what the size impact of including basic block
> information would be, first; a lot of tools, including GDB, could make
> use of it if it were available.

linux-2.6.14 stock default config. size of dw2 .debug_line section:
without patch:  1433756
with patch: 1557345

Out of curiosity, I wonder what gdb would use it for.

regards,
Mathieu
-- 
Index: gcc/final.c
===
--- gcc/final.c	(revision 106485)
+++ gcc/final.c	(working copy)
@@ -129,6 +129,8 @@
 static rtx debug_insn;
 rtx current_output_insn;
 
+int current_start_basic_block = 0;
+
 /* Line number of last NOTE.  */
 static int last_linenum;
 
@@ -1744,6 +1746,7 @@
 	  else
 	*seen |= SEEN_BB;
 
+	  current_start_basic_block = 1;
 	  break;
 
 	case NOTE_INSN_EH_REGION_BEG:
@@ -2067,11 +2070,26 @@
 
 	break;
 	  }
+
+
 	/* Output this line note if it is the first or the last line
 	   note in a row.  */
 	if (notice_source_line (insn))
 	  {
-	(*debug_hooks->source_line) (last_linenum, last_filename);
+	if (current_start_basic_block)
+	  {
+		current_start_basic_block = 0;
+		(*debug_hooks->source_line) (last_linenum, last_filename, LINE_FLAG_BASIC_BLOCK);
+	  }
+	else 
+	  {
+		(*debug_hooks->source_line) (last_linenum, last_filename, 0);
+	  }
+	  } 
+	else if (current_start_basic_block)
+	  {
+	current_start_basic_block = 0;
+	(*debug_hooks->source_line) (insn_line (insn), insn_file (insn), LINE_FLAG_BASIC_BLOCK);
 	  }
 
 	if (GET_CODE (body) == ASM_INPUT)
@@ -2498,6 +2516,7 @@
 	current_output_insn = debug_insn = 0;
   }
 }
+
   return NEXT_INSN (insn);
 }
 
Index: gcc/debug.c
===
--- gcc/debug.c	(revision 106485)
+++ gcc/debug.c	(working copy)
@@ -33,7 +33,7 @@
   debug_nothing_int_int,	 /* begin_block */
   debug_nothing_int_int,	 /* end_block */
   debug_true_tree,		 /* ignore_block */
-  debug_nothing_int_charstar,	 /* source_line */
+  debug_nothing_int_charstar_int,	 /* source_line */
   debug_nothing_int_charstar,	 /* begin_prologue */
   debug_nothing_int_charstar,	 /* end_prologue */
   debug_nothing_int_charstar,	 /* end_epilogue */
@@ -94,6 +94,13 @@
 }
 
 void
+debug_nothing_int_charstar_int (unsigned int line ATTRIBUTE_UNUSED,
+const char *text ATTRIBUTE_UNUSED,
+unsigned int flags ATTRIBUTE_UNUSED)
+{
+}
+
+void
 debug_nothing_int (unsigned int line ATTRIBUTE_UNUSED)
 {
 }
Index: gcc/debug.h
===
--- gcc/debug.h	(revision 106485)
+++ gcc/debug.h	(working copy)
@@ -59,7 +59,7 @@
   bool (* ignore_block) (tree);
 
   /* Record a source file location at (FILE, LINE).  */
-  void (* source_line) (unsigned int line, const char *file);
+  void (* source_line) (unsigned int line, const char *file, unsigned int flags);
 
   /* Called at start of prologue code.  LINE is the first line in the
  function.  This has been given the same prototype as source_line,
@@ -129,12 +129,16 @@
   int start_end_main_source_file;
 };
 
+
+#define LINE_FLAG_BASIC_BLOCK ((unsigned int)1)
+
 extern const struct gcc_debug_hooks *debug_hooks;
 
 /* The do-nothing hooks.  */
 extern void debug_nothing_void (void);
 extern void debug_nothing_charstar (const char *);
 extern void debug_nothing_int_charstar (unsigned int, const char *);
+extern void debug_nothing_int_charstar_int (unsigned int, const char *, unsigned int flags);
 extern void debug_nothing_int (unsigned int);
 extern void debug_nothing_int_int (unsigned int, unsigned int);
 extern void debug_nothing_tree (tree);
Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c	(revision 106485)
+++ gcc/dwarf2out.c	(wor

Re: Link-time optimzation

2005-11-17 Thread mathieu lacage


hi,

Daniel Berlin wrote:


I discovered this when deep hacking into the symbol code of GDB a while
ago.  Apparently, some people enjoy breakpointing symbols by using the
fully mangled name, which appears (nowadays) mainly in the minsym table.
 

This sort of hack is often used to work around what appears to be the 
inability of gdb to put breakpoints in c++ constructors (or maybe it is 
bad dwarf2 debugging output by gcc, I don't know).


regards,
Mathieu

Re: LTO, LLVM, etc.

2005-12-05 Thread Mathieu Lacage

hi mark,

On Mon, 2005-12-05 at 21:33 -0800, Mark Mitchell wrote:

> I'm not saying that having two different formats is necessarily a bad
> thing (we've already got Tree and RTL, so we're really talking about two
> levels or three), or that switching to LLVM is a bad idea, but I don't
> think there's any inherent reason that we must necessarily have multiple
> representations.

In what I admit is a relatively limited experience (compared to that of
you or other gcc contributors) of working with a few large old sucky
codebases, I think I have learned one thing: genericity is most often
bad. Specifically, I think that trying to re-use the same data
structure/algorithms/code for widely different scenarios is what most
often leads to large overall complexity and fragility.

It seems to me that the advantages of using the LTO representation for
frontend-dumping and optimization (code reuse, etc.) are not worth the
cost (a single piece of code used for two very different use-cases will
necessarily be more complex and thus prone to design bugs). Hubris will
lead developers to ignore the latter because they believe they can avoid
the complexity trap of code reuse. It might work in the short term
because you and others might be able to achieve this feat but I fail to
see how you will be able to avoid the inevitable decay of code inherent
to this solution in the long run.

A path where different solutions for different problems are evolved
independently and then merged where it makes sense seems better to me
than a path where a single solution to two different problems is
attempted from the start. 

Which is thus why I think that "there are inherent reasons that you must
necessarily have multiple representations".

regards,
Mathieu

PS: I know I am oversimplifying the problem and your position and I
apologize for this.
--

Re: detailed comparison of generated code size for GCC and other compilers

2009-12-15 Thread Mathieu Lacage

On Tue, 2009-12-15 at 11:24 +0100, Andi Kleen wrote:
> John Regehr  writes:
> 
> >> I would only be worried for cases where no warning is issued *and*
> >> unitialized accesses are eliminated.
> >
> > Yeah, it would be excellent if GCC maintained the invariant that for
> > all uses of uninitialized storage, either the compiler or else
> > valgrind will issue a warning.
> 
> My understanding was that valgrind's detection of uninitialized
> local variables is not 100% reliable because it cannot track
> all updates of the frames (it's difficult to distingush stack
> reuse from uninitialized stack)

I am not a valgrind expert so, take the following with a grain of salt
but I think that the above statement is wrong: valgrind reliably detects
use of uninitialized variables if you define 'use' as meaning 'affects
control flow of your program' in valgrind.

i.e., try this:

[mlac...@diese ~]$ cat > test.c
int f(void)
{
int x;
return x;
}
int main (int argc, char *argv[])
{
if (f())
{
printf ("something\n"); 
}
return 0;
}
^C
[mlac...@diese ~]$ gcc ./test.c
./test.c: In function ‘main’:
./test.c:10: warning: incompatible implicit declaration of built-in
function ‘printf’
[mlac...@diese ~]$ valgrind ./a.out 
==18933== Memcheck, a memory error detector.
==18933== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et
al.
==18933== Using LibVEX rev 1804, a library for dynamic binary
translation.
==18933== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==18933== Using valgrind-3.3.0, a dynamic binary instrumentation
framework.
==18933== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et
al.
==18933== For more details, rerun with: -v
==18933== 
==18933== Conditional jump or move depends on uninitialised value(s)
==18933==at 0x80483D7: main (in /home/mlacage/a.out)
something
==18933== 
==18933== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 12 from
1)
==18933== malloc/free: in use at exit: 0 bytes in 0 blocks.
==18933== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
==18933== For counts of detected errors, rerun with: -v
==18933== All heap blocks were freed -- no leaks are possible.
[mlac...@diese ~]$

Re: Split Stacks proposal

2009-02-27 Thread Mathieu Lacage

comments below,

On Thu, 2009-02-26 at 14:05 -0800, Ian Lance Taylor wrote:
> I've put a project proposal for split stacks on the wiki at
> http://gcc.gnu.org/wiki/SplitStacks .  The idea is to permit the stack
> of a single thread to be split into discontiguous segments, thus
> permitting many more threads to be active at one time without worrying
> about stack overflow or about wasting lots of stack space for inactive
> threads.  The compiler would have to generate code to support detecting
> when new stack space is needed, and to deal with some of the
> consequences of moving to a new stack.

It would be totally awesome to do this if you could provide an option to
delegate to a user-provided function the allocation and deallocation of
the stack blobs needed by threads.

i.e., the problem I run into is that I create a lot of user-space
threads and I need to allocate at the very least 2 pages (1 normal page,
1 guard page) for each thread:
  - 2 pages of address space is a lot if you have a lot of threads: it's
easy to run out of address space (physical memory is less of a concern
for me)
  - 2 pages is not enough for a lot of threads and if one of my threads
hits the 2-page limit, I have to stop my program and restart it with a
bigger stack space for the offending thread which can be quickly
"annoying".

So, ideally, I would be able to not allocate statically any address
space for my threads and defer stack space allocation only when needed.
Ideally, I would even be able to use heap memory for that stack space if
I wanted to.

Another use-case I could foresee for this would be to be able to profile
the runtime stack-usage of an application/set of threads to optimize it.

> I would be interested in hearing comments about this.
> 
> I'm not currently working on this, but I may start working on it at some
> point.

I looked a bit at the page you pointed. A couple of questions:

  - if you want to use the stack protector and split stacks, it should
be fairly trivial to extend the data structure which contains the stack
protector with a new field, no ?

  - what would be a typical size for the stack space slop ? (for
example, on i386 ?)

  - I understand that you need to copy the function parameters from the
old stack to the new stack, but, why would you need to invoke the C++
copy or move constructors for this ? Would a memcpy not be sufficient to
ensure proper C++ semantics in this case ? An example which shows how a
memcpy would break might be interesting.

Mathieu

Re: Split Stacks proposal

2009-02-27 Thread Mathieu Lacage

On Fri, 2009-02-27 at 08:54 -0800, Ian Lance Taylor wrote:

> > It would be totally awesome to do this if you could provide an option to
> > delegate to a user-provided function the allocation and deallocation of
> > the stack blobs needed by threads.
> 
> Yes, this would be a goal.

The main reason I asked about this is that it is not obvious to me how
this could be done: yes, you can call any function from your
compiler-generated code but, what would the user need to do to change
which address is called ?

1) specify a function name for allocation on the command-line during
compilation and link statically into every binary an object file which
contains the specified symbol ? This would allow you to have one
allocation function per generated binary (shared library or executable)
which might not be very desirable from a user-perspective.

2) generate code which uses a well-known name, calls this symbol through
the PLT (on ELF systems), and, relies on the ELF loader to resolve that
symbol in each binary to a single user-provided function. The libc could
provide its own default implementation which is overridden either with
LD_PRELOAD or by linking the function into the main executable.

I don't care much about which option is chosen but it would be nice to
know how you intend to deal with this aspect of the project.

Mathieu

Re: GCC 4.4.0 Status Report (2009-03-13)

2009-03-23 Thread Mathieu Lacage


On Mon, 2009-03-23 at 11:54 +1100, Ben Elliston wrote:

> Can you give some indication of how the subset is enforced?

I find it weird that you choose to ignore the obvious: code reviews,
maintainer management, etc. Just like what you (gcc developers) do in
gcc's C codebase everyday. Unless, of course, gcc grew an automatic code
checker overnight for ugly code ?

regards,
Mathieu

Re: First cut on outputing gimple for LTO using DWARF3. Discussion invited!!!!

2006-08-31 Thread mathieu lacage

hi,

On Wed, 2006-08-30 at 16:44 -0500, Mark Mitchell wrote:

[snip]

> (Implied, but not stated, in your mail is the fact that the abbreviation 
> table cannot be indexed directly.  If it could be, then you wouldn't 
> have to read the entire abbreviation table for each function; you would 
> just read the referenced abbreviations.  Because the abbreviation table 
> records are of variable length, it is indeed true that you cannot make 
> random accesses to the table.  So, this paragraph is just fleshing out 
> your argument.)

I have spent a considerable amount of time looking at the abbrev tables
output by gcc are not totally random: their entries are sorted by their
abbrev code. That is, the abbrev code of entry i+1 is higher than that
of entry i.

I don't know how useful this would be for you but for me, it made a
_huge_ difference because it means I did not have to parse the whole
abbrev table when looking for an entry with a specific abbrev code and I
did not have to create a cache of code->offset mappings for the full
table. You can use a very small (I use 16 entries) cache of abbrev codes
for each abbrev table which tells you where the entry which contains
that abbrev code is located in the table. The key here is that you do
not need to cache all codes because you can use a code smaller than the
one you need to start parsing the table at that point.

The attached file implements such a cache. The
dwarf2_abbrev_cu_read_decl function searches for an entry in the abbrev
table which matches a given abbrev code using the code cache. I cannot
remember if this version actually works because I remember hacking it
quite a bit and stopping in the middle of the rework but this code
should illustrate what I was suggesting.

I hope this helps,

regards,
Mathieu

/* -*- Mode: C; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 8 -*- */
/*
  This program is free software; you can redistribute it and/or
  modify it under the terms of the GNU General Public License
  version 2 as published by the Free Software Foundation.

  This program is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  GNU General Public License for more details.

  You should have received a copy of the GNU General Public License
  along with this program; if not, write to the Free Software
  Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.

  Copyright (C) 2004,2005  Mathieu Lacage
  Author: Mathieu Lacage <[EMAIL PROTECTED]>
*/

#ifndef DWARF2_ABBREV_H
#define DWARF2_ABBREV_H

#include 
#include 
struct reader;

struct dwarf2_abbrev {
uint32_t start; /* start of .debug_abbrev from start of file */
uint32_t end;   /* end of .debug_abbrev from start of file */
};

#define CACHE_SIZE (16)
struct dwarf2_abbrev_cu {
uint32_t start; /* offset from start of file */
uint32_t end;   /* offset from start of file */
struct cache {
uint8_t keys[CACHE_SIZE];
uint32_t values[CACHE_SIZE];
uint8_t last_used[CACHE_SIZE];
uint8_t time;
} cache;
};

struct dwarf2_abbrev_decl {
uint32_t offset; /* offset from start of file to this decl. */
uint64_t abbr_code;  /* abbrev code for this entry */
uint64_t tag;/* abbrev tag for this entry */
uint8_t children;/* children for this entry */
};

struct dwarf2_abbrev_attr {
uint64_t form;
uint64_t name;
};

void dwarf2_abbrev_initialize (struct dwarf2_abbrev *abbrev,
   uint32_t abbrev_start /* offset from start of file */,
   uint32_t abbrev_end /* offset from start of file */);

void dwarf2_abbrev_initialize_cu (struct dwarf2_abbrev *abbrev,
  struct dwarf2_abbrev_cu *abbrev_cu,
  uint32_t offset /* offset from start of file */);

void dwarf2_abbrev_cu_read_decl (struct dwarf2_abbrev_cu *abbrev_cu,
 struct dwarf2_abbrev_decl *decl,
 uint64_t code,
 struct reader *reader);

void dwarf2_abbrev_decl_read_attr_first (struct dwarf2_abbrev_decl *decl,
 struct dwarf2_abbrev_attr *attr,
 uint32_t *new_offset,
 struct reader *reader);
void dwarf2_abbrev_read_attr (uint32_t cur_offset,
  struct dwarf2_abbrev_attr *attr,
  uint32_t *new_offset,
  struct reader *reader);
bool dwarf2_abbrev_attr_is_last (struct dwarf2_abbrev_attr *attr);

#endif /* DWARF2_ABBREV_H */
/* -*- Mode: C; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 8 -

a new pass, "globalization", for user-space virtualization

2006-09-09 Thread mathieu lacage

hi,

I am looking into implementing a new instrumentation pass in gcc
called a "globalizer" and I would be really grateful for feedback
on whether or not such a pass could be considered for inclusion
(from a purely technical perspective).

1) Rationale


I work on network simulation tools. These tools are used to
describe network topologies, simulate traffic flows through
them and analyze the behavior of the network during the 
simulation. One of the things the network guys would like
to be able to do is run a number of instances of an
existing user-space routing daemon in the simulator. To
be able to do this, you need, among others, to run
multiple virtual processes in a single user-space process
and each of these virtual processes must access a private
version of its global variables (static or not). In a perfect
world, we should be able to deal correctly with TLS variables
and make each simulation process maintain as many instances 
of its TLS variables as needed.

TLS variables aside, there are a number of ways to implement
that globalization process:
1) implement a C/C++ globalizer which edits the source code.
2) on ELF systems, recompile all the code as PIC and play
crazy tricks with the dynamic loader
3) add an option to the compiler to perform the transformations
done by 1).

1) has already been implemented but only deals with C source 
code. Adding support for C++ is a matter of ripping out an
existing c++ parser and hack transformations in it. I feel 
this path is a dead end so, I would like to avoid going down
that road

2) I implemented this solution but it is pretty icky, and 
requires hacking a copy of the glibc dynamic loader. There
are also a bunch of issues related to debugging (gdb needs
to be taught about multiple versions of the same function
in memory at different virtual addresses) which make
this solution rather unatractive.

So, I am looking into getting 3) to work

2) Proposed pass solution:
--

The idea is to change the way the address of a global 
variable is calculated: rather than merely access a memory
area directly or through the GOT, we need to add another
level of indirection.

The simplest way to add another level of indirection is
to replace every declaration and definition of a static 
variable by another static variable which would be an 
array of the original variables:

static int a;
void foo (void) 
{
a = 1;
}

would be transformed into:

static int a[MAX_PROCESS_NUMBER];
void foo (void)
{
a[process_id] = 1;
}

Another solution would be to do something like this:

extern void *magic_function (void *variable_uid);

static int a;
void foo (void)
{
void *ptr = magic_function (&a);
int *pa = (int *)ptr;
*pa = 1;
}

and then make the user provide magic_function at link time,
just like __cyg_profile_func_enter. magic_function
would need to lookup the variable uniquely
identified by its address within the context of the current
simulation process. This solution solves the problem of
a fixed-size array (no need to rebuild when the number of
processes changes) because it is then up to the 
simulator's magic_function to do the right thing. However,
this would be much slower. Would the speed difference
really matter in my use-cases ? I doubt it.

3) questions


So, I have many questions but the main one is that I 
would like to know whether or not there would be interest
in integrating such a pass in gcc proper or if this is
deemed to be too domain-specific to be considered.

thank you,
Mathieu

RE: a new pass, "globalization", for user-space virtualization

2006-09-09 Thread mathieu lacage

On Sat, 2006-09-09 at 16:52 +0100, Dave Korn wrote:
>   I think this would be a great feature to have, even if it did only work with
> simple globals and couldn't handle TLS.
> 
>   Disclaimer: I haven't thought it through thoroughly yet :)  Nor am I sure
> whether the better solution might not be to just force all globals to be
> accessed via the GOT and allow multiple GOT pointers?  That would also keep

I am not sure what you are suggesting but unless we change in major ways
the way ELF PIC works, we cannot allow multiple GOT pointers since the
GOT is necessarily located at a fixed delta from the code base address.
To have multiple GOT pointers, you need to have multiple code base
addresses, that is, you need to map multiple times the same binary at
multiple base addresses and make sure the dynamic loader also loads
multiple times each library these binaries depend on. This is basically
what some ELF systems implement as "loader namespaces".

This is the solution I alluded when I refered to using magic ELF PIC
tricks. There are 2 ways to implement this if you rebuild everything as
PIC:
1) use dlmopen. The only problem is that the glibc version of dlmopen
cannot accomodate more than 16 namespaces which would give me roughly 16
simulation processes per unix process. Nothing close to what I need.
2) rip out the glibc loader and change the dlmopen implementation to use
a higher number of namespaces

In both cases, the big problem is that the same function will then be
located at two places in the virtual address space so, if you want to
place a breakpoint in such a function, it will be cumbersome to make
sure you break in every instance of that function. And it appears that
this use-case is pretty common. i.e., the network guys I talked to about
this solution went crazy because they spend their time putting a single
breakpoint for all instances of a single function.

> all the per-process data together, as opposed to grouping all the data for
> each individual object across all processes together in an array, which might
> be preferable.

yes, it would be preferable but I don't see how to make it work (other
than by running the same simulation twice: once to gather the memory
requirements of the static area of each process and a second time to
allocate the memory only once for each process).

Mathieu

Re: a new pass, "globalization", for user-space virtualization

2006-09-09 Thread mathieu lacage

On Sat, 2006-09-09 at 17:34 +0200, mathieu lacage wrote:

> Another solution would be to do something like this:
> 
> extern void *magic_function (void *variable_uid);
> 
> static int a;
> void foo (void)
> {
>   void *ptr = magic_function (&a);
>   int *pa = (int *)ptr;
>   *pa = 1;
> }

I think that the code above should be:

extern void *magic_function (void *variable_uid, int size);
static int a;
void foo (void)
{
void *ptr = magic_function (&a, sizeof (a));
int *pa = (int *)ptr;
*pa = 1;
}

Mathieu

an inter-procedural SSA-based pass

2006-09-13 Thread mathieu lacage

hi,

I am trying to write an inter-procedural SSA-based pass: all the
existing (in trunk) IPA passes seem to be running on a non-ssa
representation and I have been unable to figure out how to hack passes.c
to make it schedule an inter-procedural pass right after ssa
construction or after the end of all_optimizations. Is this possible ?
If so, could someone suggest how to hack passes.c to do this ?

Maybe it is the idea of writing an IPA pass operating on SSA which is
just plain braindead in which case it would be nice for someone to tell
me so :)

thank you,
Mathieu

dwarf2 basic block start information

Re: dwarf2 basic block start information

Re: dwarf2 basic block start information

Re: Link-time optimzation

Re: LTO, LLVM, etc.

Re: detailed comparison of generated code size for GCC and other compilers

Re: Split Stacks proposal

Re: Split Stacks proposal

Re: GCC 4.4.0 Status Report (2009-03-13)

Re: First cut on outputing gimple for LTO using DWARF3. Discussion invited!!!!

a new pass, "globalization", for user-space virtualization

RE: a new pass, "globalization", for user-space virtualization

Re: a new pass, "globalization", for user-space virtualization

an inter-procedural SSA-based pass

14 matches

Site Navigation

Mail list logo

Footer information