dwarf2 basic block start information
hi, Since the cvs version of gas supports extensions for the dwarf2 basic_block location information, I thought I could try to add support to gcc for this feature. My use of this feature is related to binary code analysis: being able to gather the bb boundaries through gcc's debugging output would save me reverse engineering it from the binary code itself. The attached code is the start of a patch to do this. It would be really nice to have feedback on: - the approach choosen - the bugs which I have stumbled upon. The patch itself is pretty straightforward. I have simply added an argument to the source_line debug_hook and I have implemented it correctly (I think) for the dwarf2 backend. The final.c pass now reads the rtl BASIC_BLOCK note to invoke source_line correctly. Is this the right approach ? I have tested lightly this patch on the sample C code below on x86 with gcc svn HEAD and binutils cvs HEAD: #include static int foo (void) { if (3) { int i = 0; while (i < 100) { printf ("test\n"); i++; } } return 8; } int main (int argc, char *argv[]) { foo (); return 0; } While the debugging output looks quite correct at -O0, the -O2 output seems broken: : 0: 8d 4c 24 04 lea0x4(%esp),%ecx 4: 83 e4 f0and$0xfff0,%esp 7: ff 71 fcpushl 0xfffc(%ecx) a: 55 push %ebp b: 89 e5 mov%esp,%ebp d: 53 push %ebx e: 31 db xor%ebx,%ebx 10: 51 push %ecx 11: 83 ec 10sub$0x10,%esp 14: 8d b6 00 00 00 00 lea0x0(%esi),%esi 1a: 8d bf 00 00 00 00 lea0x0(%edi),%edi 20: c7 04 24 00 00 00 00movl $0x0,(%esp) 27: 43 inc%ebx 28: e8 fc ff ff ff call 29 2d: 83 fb 64cmp$0x64,%ebx 30: 75 ee jne20 32: 83 c4 10add$0x10,%esp 35: 31 c0 xor%eax,%eax 37: 59 pop%ecx 38: 5b pop%ebx 39: 5d pop%ebp 3a: 8d 61 fclea0xfffc(%ecx),%esp 3d: c3 ret With this list of basic block boundaries as reported by the debugging information: ad: 0x0 ad: 0x11 ad: 0x20 ad: 0x32 Clearly, 0x11 is not a bb boundary so we have a bug. Despite the fact that my understanding of gcc internals is close to nil, it seems to me that this problem is most likely related to some sort of inlining pass which did not update the rtl BASIC_BLOCK note. Thus, the following questions: 1) is it expected that some rtl optimization passes would bork the BASIC_BLOCK notes ? 2) if it is, are these known culprits and would there be interest in patches to try to fix this ? 3) have anyone an idea of which passes might be the culprits ? (it might save a lot of time wandering through gcc sources) If someone is interested in trying out this patch, the released version of readelf on my system seems to be able to dump the basic block dwarf2 instructions with --debug-dump=line. I have also written a small tool to dump only this information there: http://cutebugs.net/code/bozo-profiler/ The test binary generated by the top-level Makefile in bin/test/ can be invoked with: test dw2_bb [BINARY FILE] [EMAIL PROTECTED] bozo-profiler]$ make make: Nothing to be done for `all'. [EMAIL PROTECTED] bozo-profiler]$ ./bin/test/test dw2_bb bin/test/test regards, Mathieu Index: gcc/final.c === --- gcc/final.c (revision 106485) +++ gcc/final.c (working copy) @@ -129,6 +129,8 @@ static rtx debug_insn; rtx current_output_insn; +int current_start_basic_block = 0; + /* Line number of last NOTE. */ static int last_linenum; @@ -1744,6 +1746,7 @@ else *seen |= SEEN_BB; + current_start_basic_block = 1; break; case NOTE_INSN_EH_REGION_BEG: @@ -2071,8 +2074,21 @@ note in a row. */ if (notice_source_line (insn)) { - (*debug_hooks->source_line) (last_linenum, last_filename); + if (current_start_basic_block) + { + current_start_basic_block = 0; + (*debug_hooks->source_line) (last_linenum, last_filename, LINE_FLAG_BASIC_BLOCK); + } + else + { + (*debug_hooks->source_line) (last_linenum, last_filename, 0); + } } + else if (current_start_basic_block) + { + current_start_basic_block = 0; + (*debug_hooks->source_line) (last_linenum, last_filename, LINE_FLAG_BASIC_BLOCK); + } if (GET_CODE (body) == ASM_INPUT) { @@ -2498,6 +2
Re: dwarf2 basic block start information
On Mon, 2005-11-14 at 21:30 -0500, Daniel Jacobowitz wrote: > On Wed, Nov 09, 2005 at 07:19:45PM +0100, mathieu lacage wrote: > > While the debugging output looks quite correct at -O0, the -O2 output > > seems broken: > > : > > 0: 8d 4c 24 04 lea0x4(%esp),%ecx > > 4: 83 e4 f0and$0xfff0,%esp > > 7: ff 71 fcpushl 0xfffc(%ecx) > > a: 55 push %ebp > > b: 89 e5 mov%esp,%ebp > > d: 53 push %ebx > > e: 31 db xor%ebx,%ebx > > 10: 51 push %ecx > > 11: 83 ec 10sub$0x10,%esp > > 14: 8d b6 00 00 00 00 lea0x0(%esi),%esi > > > With this list of basic block boundaries as reported by the debugging > > information: > > ad: 0x0 > > ad: 0x11 > > > Clearly, 0x11 is not a bb boundary so we have a bug. Despite the fact This was a bug in my dwarf2 reading code. I fixed it and this testcase works for me now. > No, not clear at all. Every place which could be the target of a jump > will be the start of a basic block, but you are not guaranteed that all > sequential basic blocks are combined. Probably either Jim's right and It would be nice if you could post an example where they are not combined. > it's related to the end of the prologue, or it's a different basic > block because of some artifact of inlining. This shouldn't present any > problem for a tool using the basic block information. Inlining or end-of-prologue do not seem to have an influence on this. It seems to actually work quite well. I will send an updated version of the patch in another email. Mathieu --
Re: dwarf2 basic block start information
Here is an updated version with a few bugs fixed (How I managed to introduce bugs in a 20-liner patch still eludes me). On Mon, 2005-11-14 at 21:26 -0500, Daniel Jacobowitz wrote: > On Mon, Nov 14, 2005 at 06:24:47PM -0800, Jim Wilson wrote: > > mathieu lacage wrote: > > >Clearly, 0x11 is not a bb boundary so we have a bug. > > > > Looks like it could be the prologue end, but I don't see any obvious > > reason why this patch could do that. I suggest you try debugging your > > patch to see why you are getting the extra call with > > LINE_FLAG_BASIC_BLOCK set in this case. > > > > Using -p would make the diff more readable. svn diff -x -p does not work here. Is there a magic incantation I should run to produce such a diff ? > > > > We get complaints every time the debug info size increases. Since this > > is apparently only helpful to an optional utility, this extra debug info > > should not be emitted by default. There should be an option to emit it. Any suggestion on a name ? > I'd like to know what the size impact of including basic block > information would be, first; a lot of tools, including GDB, could make > use of it if it were available. linux-2.6.14 stock default config. size of dw2 .debug_line section: without patch: 1433756 with patch: 1557345 Out of curiosity, I wonder what gdb would use it for. regards, Mathieu -- Index: gcc/final.c === --- gcc/final.c (revision 106485) +++ gcc/final.c (working copy) @@ -129,6 +129,8 @@ static rtx debug_insn; rtx current_output_insn; +int current_start_basic_block = 0; + /* Line number of last NOTE. */ static int last_linenum; @@ -1744,6 +1746,7 @@ else *seen |= SEEN_BB; + current_start_basic_block = 1; break; case NOTE_INSN_EH_REGION_BEG: @@ -2067,11 +2070,26 @@ break; } + + /* Output this line note if it is the first or the last line note in a row. */ if (notice_source_line (insn)) { - (*debug_hooks->source_line) (last_linenum, last_filename); + if (current_start_basic_block) + { + current_start_basic_block = 0; + (*debug_hooks->source_line) (last_linenum, last_filename, LINE_FLAG_BASIC_BLOCK); + } + else + { + (*debug_hooks->source_line) (last_linenum, last_filename, 0); + } + } + else if (current_start_basic_block) + { + current_start_basic_block = 0; + (*debug_hooks->source_line) (insn_line (insn), insn_file (insn), LINE_FLAG_BASIC_BLOCK); } if (GET_CODE (body) == ASM_INPUT) @@ -2498,6 +2516,7 @@ current_output_insn = debug_insn = 0; } } + return NEXT_INSN (insn); } Index: gcc/debug.c === --- gcc/debug.c (revision 106485) +++ gcc/debug.c (working copy) @@ -33,7 +33,7 @@ debug_nothing_int_int, /* begin_block */ debug_nothing_int_int, /* end_block */ debug_true_tree, /* ignore_block */ - debug_nothing_int_charstar, /* source_line */ + debug_nothing_int_charstar_int, /* source_line */ debug_nothing_int_charstar, /* begin_prologue */ debug_nothing_int_charstar, /* end_prologue */ debug_nothing_int_charstar, /* end_epilogue */ @@ -94,6 +94,13 @@ } void +debug_nothing_int_charstar_int (unsigned int line ATTRIBUTE_UNUSED, +const char *text ATTRIBUTE_UNUSED, +unsigned int flags ATTRIBUTE_UNUSED) +{ +} + +void debug_nothing_int (unsigned int line ATTRIBUTE_UNUSED) { } Index: gcc/debug.h === --- gcc/debug.h (revision 106485) +++ gcc/debug.h (working copy) @@ -59,7 +59,7 @@ bool (* ignore_block) (tree); /* Record a source file location at (FILE, LINE). */ - void (* source_line) (unsigned int line, const char *file); + void (* source_line) (unsigned int line, const char *file, unsigned int flags); /* Called at start of prologue code. LINE is the first line in the function. This has been given the same prototype as source_line, @@ -129,12 +129,16 @@ int start_end_main_source_file; }; + +#define LINE_FLAG_BASIC_BLOCK ((unsigned int)1) + extern const struct gcc_debug_hooks *debug_hooks; /* The do-nothing hooks. */ extern void debug_nothing_void (void); extern void debug_nothing_charstar (const char *); extern void debug_nothing_int_charstar (unsigned int, const char *); +extern void debug_nothing_int_charstar_int (unsigned int, const char *, unsigned int flags); extern void debug_nothing_int (unsigned int); extern void debug_nothing_int_int (unsigned int, unsigned int); extern void debug_nothing_tree (tree); Index: gcc/dwarf2out.c === --- gcc/dwarf2out.c (revision 106485) +++ gcc/dwarf2out.c (wor
Re: Link-time optimzation
hi, Daniel Berlin wrote: I discovered this when deep hacking into the symbol code of GDB a while ago. Apparently, some people enjoy breakpointing symbols by using the fully mangled name, which appears (nowadays) mainly in the minsym table. This sort of hack is often used to work around what appears to be the inability of gdb to put breakpoints in c++ constructors (or maybe it is bad dwarf2 debugging output by gcc, I don't know). regards, Mathieu
Re: LTO, LLVM, etc.
hi mark, On Mon, 2005-12-05 at 21:33 -0800, Mark Mitchell wrote: > I'm not saying that having two different formats is necessarily a bad > thing (we've already got Tree and RTL, so we're really talking about two > levels or three), or that switching to LLVM is a bad idea, but I don't > think there's any inherent reason that we must necessarily have multiple > representations. In what I admit is a relatively limited experience (compared to that of you or other gcc contributors) of working with a few large old sucky codebases, I think I have learned one thing: genericity is most often bad. Specifically, I think that trying to re-use the same data structure/algorithms/code for widely different scenarios is what most often leads to large overall complexity and fragility. It seems to me that the advantages of using the LTO representation for frontend-dumping and optimization (code reuse, etc.) are not worth the cost (a single piece of code used for two very different use-cases will necessarily be more complex and thus prone to design bugs). Hubris will lead developers to ignore the latter because they believe they can avoid the complexity trap of code reuse. It might work in the short term because you and others might be able to achieve this feat but I fail to see how you will be able to avoid the inevitable decay of code inherent to this solution in the long run. A path where different solutions for different problems are evolved independently and then merged where it makes sense seems better to me than a path where a single solution to two different problems is attempted from the start. Which is thus why I think that "there are inherent reasons that you must necessarily have multiple representations". regards, Mathieu PS: I know I am oversimplifying the problem and your position and I apologize for this. --
Re: detailed comparison of generated code size for GCC and other compilers
On Tue, 2009-12-15 at 11:24 +0100, Andi Kleen wrote: > John Regehr writes: > > >> I would only be worried for cases where no warning is issued *and* > >> unitialized accesses are eliminated. > > > > Yeah, it would be excellent if GCC maintained the invariant that for > > all uses of uninitialized storage, either the compiler or else > > valgrind will issue a warning. > > My understanding was that valgrind's detection of uninitialized > local variables is not 100% reliable because it cannot track > all updates of the frames (it's difficult to distingush stack > reuse from uninitialized stack) I am not a valgrind expert so, take the following with a grain of salt but I think that the above statement is wrong: valgrind reliably detects use of uninitialized variables if you define 'use' as meaning 'affects control flow of your program' in valgrind. i.e., try this: [mlac...@diese ~]$ cat > test.c int f(void) { int x; return x; } int main (int argc, char *argv[]) { if (f()) { printf ("something\n"); } return 0; } ^C [mlac...@diese ~]$ gcc ./test.c ./test.c: In function ‘main’: ./test.c:10: warning: incompatible implicit declaration of built-in function ‘printf’ [mlac...@diese ~]$ valgrind ./a.out ==18933== Memcheck, a memory error detector. ==18933== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al. ==18933== Using LibVEX rev 1804, a library for dynamic binary translation. ==18933== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP. ==18933== Using valgrind-3.3.0, a dynamic binary instrumentation framework. ==18933== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al. ==18933== For more details, rerun with: -v ==18933== ==18933== Conditional jump or move depends on uninitialised value(s) ==18933==at 0x80483D7: main (in /home/mlacage/a.out) something ==18933== ==18933== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 12 from 1) ==18933== malloc/free: in use at exit: 0 bytes in 0 blocks. ==18933== malloc/free: 0 allocs, 0 frees, 0 bytes allocated. ==18933== For counts of detected errors, rerun with: -v ==18933== All heap blocks were freed -- no leaks are possible. [mlac...@diese ~]$
Re: Split Stacks proposal
comments below, On Thu, 2009-02-26 at 14:05 -0800, Ian Lance Taylor wrote: > I've put a project proposal for split stacks on the wiki at > http://gcc.gnu.org/wiki/SplitStacks . The idea is to permit the stack > of a single thread to be split into discontiguous segments, thus > permitting many more threads to be active at one time without worrying > about stack overflow or about wasting lots of stack space for inactive > threads. The compiler would have to generate code to support detecting > when new stack space is needed, and to deal with some of the > consequences of moving to a new stack. It would be totally awesome to do this if you could provide an option to delegate to a user-provided function the allocation and deallocation of the stack blobs needed by threads. i.e., the problem I run into is that I create a lot of user-space threads and I need to allocate at the very least 2 pages (1 normal page, 1 guard page) for each thread: - 2 pages of address space is a lot if you have a lot of threads: it's easy to run out of address space (physical memory is less of a concern for me) - 2 pages is not enough for a lot of threads and if one of my threads hits the 2-page limit, I have to stop my program and restart it with a bigger stack space for the offending thread which can be quickly "annoying". So, ideally, I would be able to not allocate statically any address space for my threads and defer stack space allocation only when needed. Ideally, I would even be able to use heap memory for that stack space if I wanted to. Another use-case I could foresee for this would be to be able to profile the runtime stack-usage of an application/set of threads to optimize it. > I would be interested in hearing comments about this. > > I'm not currently working on this, but I may start working on it at some > point. I looked a bit at the page you pointed. A couple of questions: - if you want to use the stack protector and split stacks, it should be fairly trivial to extend the data structure which contains the stack protector with a new field, no ? - what would be a typical size for the stack space slop ? (for example, on i386 ?) - I understand that you need to copy the function parameters from the old stack to the new stack, but, why would you need to invoke the C++ copy or move constructors for this ? Would a memcpy not be sufficient to ensure proper C++ semantics in this case ? An example which shows how a memcpy would break might be interesting. Mathieu
Re: Split Stacks proposal
On Fri, 2009-02-27 at 08:54 -0800, Ian Lance Taylor wrote: > > It would be totally awesome to do this if you could provide an option to > > delegate to a user-provided function the allocation and deallocation of > > the stack blobs needed by threads. > > Yes, this would be a goal. The main reason I asked about this is that it is not obvious to me how this could be done: yes, you can call any function from your compiler-generated code but, what would the user need to do to change which address is called ? 1) specify a function name for allocation on the command-line during compilation and link statically into every binary an object file which contains the specified symbol ? This would allow you to have one allocation function per generated binary (shared library or executable) which might not be very desirable from a user-perspective. 2) generate code which uses a well-known name, calls this symbol through the PLT (on ELF systems), and, relies on the ELF loader to resolve that symbol in each binary to a single user-provided function. The libc could provide its own default implementation which is overridden either with LD_PRELOAD or by linking the function into the main executable. I don't care much about which option is chosen but it would be nice to know how you intend to deal with this aspect of the project. Mathieu
Re: GCC 4.4.0 Status Report (2009-03-13)
On Mon, 2009-03-23 at 11:54 +1100, Ben Elliston wrote: > Can you give some indication of how the subset is enforced? I find it weird that you choose to ignore the obvious: code reviews, maintainer management, etc. Just like what you (gcc developers) do in gcc's C codebase everyday. Unless, of course, gcc grew an automatic code checker overnight for ugly code ? regards, Mathieu
Re: First cut on outputing gimple for LTO using DWARF3. Discussion invited!!!!
hi, On Wed, 2006-08-30 at 16:44 -0500, Mark Mitchell wrote: [snip] > (Implied, but not stated, in your mail is the fact that the abbreviation > table cannot be indexed directly. If it could be, then you wouldn't > have to read the entire abbreviation table for each function; you would > just read the referenced abbreviations. Because the abbreviation table > records are of variable length, it is indeed true that you cannot make > random accesses to the table. So, this paragraph is just fleshing out > your argument.) I have spent a considerable amount of time looking at the abbrev tables output by gcc are not totally random: their entries are sorted by their abbrev code. That is, the abbrev code of entry i+1 is higher than that of entry i. I don't know how useful this would be for you but for me, it made a _huge_ difference because it means I did not have to parse the whole abbrev table when looking for an entry with a specific abbrev code and I did not have to create a cache of code->offset mappings for the full table. You can use a very small (I use 16 entries) cache of abbrev codes for each abbrev table which tells you where the entry which contains that abbrev code is located in the table. The key here is that you do not need to cache all codes because you can use a code smaller than the one you need to start parsing the table at that point. The attached file implements such a cache. The dwarf2_abbrev_cu_read_decl function searches for an entry in the abbrev table which matches a given abbrev code using the code cache. I cannot remember if this version actually works because I remember hacking it quite a bit and stopping in the middle of the rework but this code should illustrate what I was suggesting. I hope this helps, regards, Mathieu /* -*- Mode: C; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 8 -*- */ /* This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. Copyright (C) 2004,2005 Mathieu Lacage Author: Mathieu Lacage <[EMAIL PROTECTED]> */ #ifndef DWARF2_ABBREV_H #define DWARF2_ABBREV_H #include #include struct reader; struct dwarf2_abbrev { uint32_t start; /* start of .debug_abbrev from start of file */ uint32_t end; /* end of .debug_abbrev from start of file */ }; #define CACHE_SIZE (16) struct dwarf2_abbrev_cu { uint32_t start; /* offset from start of file */ uint32_t end; /* offset from start of file */ struct cache { uint8_t keys[CACHE_SIZE]; uint32_t values[CACHE_SIZE]; uint8_t last_used[CACHE_SIZE]; uint8_t time; } cache; }; struct dwarf2_abbrev_decl { uint32_t offset; /* offset from start of file to this decl. */ uint64_t abbr_code; /* abbrev code for this entry */ uint64_t tag;/* abbrev tag for this entry */ uint8_t children;/* children for this entry */ }; struct dwarf2_abbrev_attr { uint64_t form; uint64_t name; }; void dwarf2_abbrev_initialize (struct dwarf2_abbrev *abbrev, uint32_t abbrev_start /* offset from start of file */, uint32_t abbrev_end /* offset from start of file */); void dwarf2_abbrev_initialize_cu (struct dwarf2_abbrev *abbrev, struct dwarf2_abbrev_cu *abbrev_cu, uint32_t offset /* offset from start of file */); void dwarf2_abbrev_cu_read_decl (struct dwarf2_abbrev_cu *abbrev_cu, struct dwarf2_abbrev_decl *decl, uint64_t code, struct reader *reader); void dwarf2_abbrev_decl_read_attr_first (struct dwarf2_abbrev_decl *decl, struct dwarf2_abbrev_attr *attr, uint32_t *new_offset, struct reader *reader); void dwarf2_abbrev_read_attr (uint32_t cur_offset, struct dwarf2_abbrev_attr *attr, uint32_t *new_offset, struct reader *reader); bool dwarf2_abbrev_attr_is_last (struct dwarf2_abbrev_attr *attr); #endif /* DWARF2_ABBREV_H */ /* -*- Mode: C; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 8 -
a new pass, "globalization", for user-space virtualization
hi, I am looking into implementing a new instrumentation pass in gcc called a "globalizer" and I would be really grateful for feedback on whether or not such a pass could be considered for inclusion (from a purely technical perspective). 1) Rationale I work on network simulation tools. These tools are used to describe network topologies, simulate traffic flows through them and analyze the behavior of the network during the simulation. One of the things the network guys would like to be able to do is run a number of instances of an existing user-space routing daemon in the simulator. To be able to do this, you need, among others, to run multiple virtual processes in a single user-space process and each of these virtual processes must access a private version of its global variables (static or not). In a perfect world, we should be able to deal correctly with TLS variables and make each simulation process maintain as many instances of its TLS variables as needed. TLS variables aside, there are a number of ways to implement that globalization process: 1) implement a C/C++ globalizer which edits the source code. 2) on ELF systems, recompile all the code as PIC and play crazy tricks with the dynamic loader 3) add an option to the compiler to perform the transformations done by 1). 1) has already been implemented but only deals with C source code. Adding support for C++ is a matter of ripping out an existing c++ parser and hack transformations in it. I feel this path is a dead end so, I would like to avoid going down that road 2) I implemented this solution but it is pretty icky, and requires hacking a copy of the glibc dynamic loader. There are also a bunch of issues related to debugging (gdb needs to be taught about multiple versions of the same function in memory at different virtual addresses) which make this solution rather unatractive. So, I am looking into getting 3) to work 2) Proposed pass solution: -- The idea is to change the way the address of a global variable is calculated: rather than merely access a memory area directly or through the GOT, we need to add another level of indirection. The simplest way to add another level of indirection is to replace every declaration and definition of a static variable by another static variable which would be an array of the original variables: static int a; void foo (void) { a = 1; } would be transformed into: static int a[MAX_PROCESS_NUMBER]; void foo (void) { a[process_id] = 1; } Another solution would be to do something like this: extern void *magic_function (void *variable_uid); static int a; void foo (void) { void *ptr = magic_function (&a); int *pa = (int *)ptr; *pa = 1; } and then make the user provide magic_function at link time, just like __cyg_profile_func_enter. magic_function would need to lookup the variable uniquely identified by its address within the context of the current simulation process. This solution solves the problem of a fixed-size array (no need to rebuild when the number of processes changes) because it is then up to the simulator's magic_function to do the right thing. However, this would be much slower. Would the speed difference really matter in my use-cases ? I doubt it. 3) questions So, I have many questions but the main one is that I would like to know whether or not there would be interest in integrating such a pass in gcc proper or if this is deemed to be too domain-specific to be considered. thank you, Mathieu
RE: a new pass, "globalization", for user-space virtualization
On Sat, 2006-09-09 at 16:52 +0100, Dave Korn wrote: > I think this would be a great feature to have, even if it did only work with > simple globals and couldn't handle TLS. > > Disclaimer: I haven't thought it through thoroughly yet :) Nor am I sure > whether the better solution might not be to just force all globals to be > accessed via the GOT and allow multiple GOT pointers? That would also keep I am not sure what you are suggesting but unless we change in major ways the way ELF PIC works, we cannot allow multiple GOT pointers since the GOT is necessarily located at a fixed delta from the code base address. To have multiple GOT pointers, you need to have multiple code base addresses, that is, you need to map multiple times the same binary at multiple base addresses and make sure the dynamic loader also loads multiple times each library these binaries depend on. This is basically what some ELF systems implement as "loader namespaces". This is the solution I alluded when I refered to using magic ELF PIC tricks. There are 2 ways to implement this if you rebuild everything as PIC: 1) use dlmopen. The only problem is that the glibc version of dlmopen cannot accomodate more than 16 namespaces which would give me roughly 16 simulation processes per unix process. Nothing close to what I need. 2) rip out the glibc loader and change the dlmopen implementation to use a higher number of namespaces In both cases, the big problem is that the same function will then be located at two places in the virtual address space so, if you want to place a breakpoint in such a function, it will be cumbersome to make sure you break in every instance of that function. And it appears that this use-case is pretty common. i.e., the network guys I talked to about this solution went crazy because they spend their time putting a single breakpoint for all instances of a single function. > all the per-process data together, as opposed to grouping all the data for > each individual object across all processes together in an array, which might > be preferable. yes, it would be preferable but I don't see how to make it work (other than by running the same simulation twice: once to gather the memory requirements of the static area of each process and a second time to allocate the memory only once for each process). Mathieu
Re: a new pass, "globalization", for user-space virtualization
On Sat, 2006-09-09 at 17:34 +0200, mathieu lacage wrote: > Another solution would be to do something like this: > > extern void *magic_function (void *variable_uid); > > static int a; > void foo (void) > { > void *ptr = magic_function (&a); > int *pa = (int *)ptr; > *pa = 1; > } I think that the code above should be: extern void *magic_function (void *variable_uid, int size); static int a; void foo (void) { void *ptr = magic_function (&a, sizeof (a)); int *pa = (int *)ptr; *pa = 1; } Mathieu
an inter-procedural SSA-based pass
hi, I am trying to write an inter-procedural SSA-based pass: all the existing (in trunk) IPA passes seem to be running on a non-ssa representation and I have been unable to figure out how to hack passes.c to make it schedule an inter-procedural pass right after ssa construction or after the end of all_optimizations. Is this possible ? If so, could someone suggest how to hack passes.c to do this ? Maybe it is the idea of writing an IPA pass operating on SSA which is just plain braindead in which case it would be nice for someone to tell me so :) thank you, Mathieu