vtrelocs: large/modular C++ app speedup ...
Hi guys, I spent a little time recently researching ways to reduce the number of unique named relocations that must be processed at dlopen time for large C++ libraries[1]. Apologies for spamming all 3 lists like this, but it touches all 3 projects. Since almost all function relocations of this type are inside vtables, I implemented a new way of relocating vtables. This is a new '.suse.vtrelocs' section. As we inherit a class across a shared library boundary we construct new vtables that are often extremely similar to their parents. However - this similarity is not exposed - instead we fill the new vtable with many unique named relocations, one per method. This generates lots of .rel entries, and emits lots of external symbols; worse these symbols tend to be duplicated across ~all libraries deriving from the base class. Instead a vtreloc sections contains (a sorted): struct { void **src, **dest; int copy_slot_bitmask; } vtreloc_entries[] = { ... } The run-time cost of processing these is insignificant in comparison to the cost of processing the remaining relocations, giving a pleasant speed win. A brief slide-deck with the results of my research is here: http://www.gnome.org/~michael/vtrelocs-gcc.pdf and has a comparison against the current state of the art wrt. reducing relocations: -Bsymbolic-functions [ in itself a substantial optimisation ]. The 3 prototype patches for discussion are attached. There are a number of trivial hacks in there (of course) - eg. environment variables to turn the feature on, leaving an empty .vtrelocs section in object files etc. The more interesting problems are: * glibc - the memory protection semantics need adjusting - since we need to fixup relocations in 'init' order: shouldn't be impossibly hard to fix but I just turn off protection ;-) + subsequent dlopens can (I think) avoid touching already relocated libraries they don't own avoiding this sort of problem. * gcc - the code to generate the vtreloc sections is written for comfort not speed. This is a fall-back from having initially tried to integrate the work into build_vtbl_initializer & friends with some success, but rather a tangling of the code. * vtreloc section design - the section should be readonly, and prolly refer by offset to .bss relocations that can be re-used for implementing indirect calls via. parent vtable to virtual functions. That should save relocs, but make each entry slightly larger. Of course, apart from the run-time speed wins, some of the nicest potential size wins come from breaking the ABI[2] & depending on the vtrelocs to fixup vtables: eg. hiding all thunks (implemented), or potentially hiding all virtual function symbols & invoking them via their parent vtable (not implemented). Wrt. testing, I can build & run an OO.o built with this - clearly not a unit-test ;-) but perhaps helpful. Feedback much appreciated, Thanks, Michael. [1] - specifically OpenOffice.org ;-) [2] - which while bad, can be done in isolated islands like OO.o. -- [EMAIL PROTECTED] <><, Pseudo Engineer, itinerant idiot diff -u -r -x '*~' -x testsuite -x libjava -x cc-nptl -x build-dir -x '*.orig' -x obj-i586-suse-linux -x texis -x Makeconfig -x version.h -x '*.o' -x '*.1' -x 'Makefile*' -x 'config*' -x libtool -x '*.info' -x '*.tex' pristine-binutils-2.17.50/bfd/elf.c binutils-2.17.50/bfd/elf.c --- pristine-binutils-2.17.50/bfd/elf.c 2008-01-09 16:45:22.0 + +++ binutils-2.17.50/bfd/elf.c 2008-01-23 16:48:45.0 + @@ -1240,6 +1240,7 @@ case DT_USED: name = "USED"; break; case DT_FILTER: name = "FILTER"; stringp = TRUE; break; case DT_GNU_HASH: name = "GNU_HASH"; break; + case DT_SUSE_VTRELOC: name = "SUSE_VTRELOC"; break; } fprintf (f, " %-11s ", name); diff -u -r -x '*~' -x testsuite -x libjava -x cc-nptl -x build-dir -x '*.orig' -x obj-i586-suse-linux -x texis -x Makeconfig -x version.h -x '*.o' -x '*.1' -x 'Makefile*' -x 'config*' -x libtool -x '*.info' -x '*.tex' pristine-binutils-2.17.50/bfd/elflink.c binutils-2.17.50/bfd/elflink.c --- pristine-binutils-2.17.50/bfd/elflink.c 2008-01-09 16:45:22.0 + +++ binutils-2.17.50/bfd/elflink.c 2008-01-23 16:50:07.0 + @@ -5652,6 +5652,13 @@ return FALSE; } + s = bfd_get_section_by_name (output_bfd, ".suse.vtrelocs"); + if (s != NULL) + { + if (!_bfd_elf_add_dynamic_entry (info, DT_SUSE_VTRELOC, 0)) + return FALSE; + } + dynstr = bfd_get_section_by_name (dynobj, ".dynstr"); /* If .dynstr is excluded from the link, we don't want any of these tags. Strictly, we should be checking each section @@ -10869,6 +10876
Bootstrap failure due to a typo in gcc/fwprop.c
While rebuilding gcc I got the following failure: /opt/gcc/i686-darwin/./prev-gcc/xgcc -B/opt/gcc/i686-darwin/./prev-gcc/ -B/opt/gcc/gcc4.4w/i686-apple-darwin9/bin/ -c -g -O2 -fomit-frame-pointer -DIN_GCC -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -DHAVE_CONFIG_H -I. -I. -I../../gcc-4.4-work/gcc -I../../gcc-4.4-work/gcc/. -I../../gcc-4.4-work/gcc/../include -I./../intl -I../../gcc-4.4-work/gcc/../libcpp/include -I/sw/include -I../../gcc-4.4-work/gcc/../libdecnumber -I../../gcc-4.4-work/gcc/../libdecnumber/dpd -I../libdecnumber ../../gcc-4.4-work/gcc/fwprop.c -o fwprop.o cc1: warnings being treated as errors ../../gcc-4.4-work/gcc/fwprop.c:234: error: comma at end of enumerator list make[3]: *** [fwprop.o] Error 1 make[3]: *** Waiting for unfinished jobs make[2]: *** [all-stage2-gcc] Error 2 make[1]: *** [stage2-bubble] Error 2 make: *** [all] Error 2 This is due to revision 133828 and fixed by the following patch: --- ../_gcc_clean/gcc/fwprop.c 2008-04-02 12:12:57.0 +0200 +++ gcc/fwprop.c2008-04-02 13:44:07.0 +0200 @@ -231,7 +231,7 @@ PR_HANDLE_MEM is set when the source of the propagation was not another MEM. Then, it is safe not to treat non-read-only MEMs as ``opaque'' objects. */ - PR_HANDLE_MEM = 2, + PR_HANDLE_MEM = 2 }; Dominique
Re: Bootstrap failure due to a typo in gcc/fwprop.c
This is due to revision 133828 and fixed by the following patch: --- ../_gcc_clean/gcc/fwprop.c 2008-04-02 12:12:57.0 +0200 +++ gcc/fwprop.c2008-04-02 13:44:07.0 +0200 @@ -231,7 +231,7 @@ PR_HANDLE_MEM is set when the source of the propagation was not another MEM. Then, it is safe not to treat non-read-only MEMs as ``opaque'' objects. */ - PR_HANDLE_MEM = 2, + PR_HANDLE_MEM = 2 }; Committed as 133833. Paolo
Re: memory leak on regular expression (regex.c)
amihud bruchim <[EMAIL PROTECTED]> writes: > I found a memory leak on regcomp function - gcc-4.4.2 (i used Memory > validator tool to confirm it) . regcomp is part of glibc (or whatever C library you are using). It is not part of gcc. For more information, including where to report bugs, please see http://sourceware.org/glibc/ . Ian
Re: vtrelocs: large/modular C++ app speedup ...
Michael Meeks <[EMAIL PROTECTED]> writes: > Since almost all function relocations of this type are inside vtables, > I implemented a new way of relocating vtables. This is a new > '.suse.vtrelocs' section. It's an interesting idea. Some comments: * Use GNU instead of SUSE, as this is for the GNU tools. * Don't check for explicit section names. Instead, give the section a magic type. * It seems that this is not backward compatible--an executable built in this way will not work if the dynamic linker does not know about it. The section should have the SHF_OS_NONCONFORMING bit set. * Aren't you going to get a lot of duplicate vtreloc entries? Shouldn't they be grouped with the vtables themselves? * The idea is useless without support in the dynamic linker, so you need to get signoff there first. Ian
Re: vtrelocs: large/modular C++ app speedup ...
Ian Lance Taylor <[EMAIL PROTECTED]> writes: > * It seems that this is not backward compatible--an executable built > in this way will not work if the dynamic linker does not know about > it. The section should have the SHF_OS_NONCONFORMING bit set. I wonder if it could be made backwards compatible. As in keep the old style relocations too, but the new linker would not process them when seeing the new special relocations. That would make it much easier to deploy this in the field because you wouldn't need two sets of executables one for the new linkers and one for old linkers during the transition time. The only drawback would be some more memory and some more disk space for the old relocations (and a little more IO bandwidth), but all of those are cheap. -Andi
Re: vtrelocs: large/modular C++ app speedup ...
Hi Ian / Andi, On Wed, 2008-04-02 at 07:56 -0700, Ian Lance Taylor wrote: > * Use GNU instead of SUSE, as this is for the GNU tools. Ah yes; you noticed the subliminal advertising ;-) If you're happy for me to trample on the GNU section namespace that's fine, but I hesitate to tread there by default. > * Don't check for explicit section names. Instead, give the section a > magic type. > * It seems that this is not backward compatible--an executable built > in this way will not work if the dynamic linker does not know about > it. The section should have the SHF_OS_NONCONFORMING bit set. Not clear how to fix either of those :-) I binned a redundant string section name lookup in the binutils patch though. > * Aren't you going to get a lot of duplicate vtreloc entries? > Shouldn't they be grouped with the vtables themselves? That's entirely possible; perhaps I misunderstand the question, but had I hoped that by making the _ZVTR_ section weak the linker would discard any duplicate vtreloc records for the same vtable. > * The idea is useless without support in the dynamic linker, so you > need to get signoff there first. Naturally :-) On Wed, 2008-04-02 at 17:06 +0200, Andi Kleen wrote: > I wonder if it could be made backwards compatible. As in keep the old > style relocations too, but the new linker would not process them > when seeing the new special relocations. It's certainly possible; of course it looses you any size savings. I imagine that using the dynsort code we could shuffle the relevant relocs to the end of the list fairly easily - that is if we could identify whether they overlapped with the vtrelocs (or not): perhaps some big bit-mask for the whole data section or something (?). Thanks, Michael. -- [EMAIL PROTECTED] <><, Pseudo Engineer, itinerant idiot
Re: vtrelocs: large/modular C++ app speedup ...
> It's certainly possible; of course it looses you any size savings. I If it's all in one place and only on disk it doesn't really matter doesn't it? Even if loaded into memory as long as it is read in one block without much seeking it shouldn't be that bad. Backwards compatibility is always very important, as long as the price to be paid for it is not excessive (which it isn't here I think) -Andi
Re: vtrelocs: large/modular C++ app speedup ...
Michael Meeks <[EMAIL PROTECTED]> writes: > On Wed, 2008-04-02 at 07:56 -0700, Ian Lance Taylor wrote: >> * Use GNU instead of SUSE, as this is for the GNU tools. > > Ah yes; you noticed the subliminal advertising ;-) If you're happy for > me to trample on the GNU section namespace that's fine, but I hesitate > to tread there by default. In as much as the GNU namespace is managed at all, it is managed by the people who read these mailing lists. Don't worry about trampling on it. >> * Don't check for explicit section names. Instead, give the section a >> magic type. >> * It seems that this is not backward compatible--an executable built >> in this way will not work if the dynamic linker does not know about >> it. The section should have the SHF_OS_NONCONFORMING bit set. > > Not clear how to fix either of those :-) I binned a redundant string > section name lookup in the binutils patch though. You need to emit an appropriate .section pseudo-op. >> * Aren't you going to get a lot of duplicate vtreloc entries? >> Shouldn't they be grouped with the vtables themselves? > > That's entirely possible; perhaps I misunderstand the question, but had > I hoped that by making the _ZVTR_ section weak the linker would discard > any duplicate vtreloc records for the same vtable. The linker doesn't work that way. Symbols are weak, not sections. Read up on section groups. Ian
Re: vtrelocs: large/modular C++ app speedup ...
Andi Kleen wrote: It's certainly possible; of course it looses you any size savings. I If it's all in one place and only on disk it doesn't really matter doesn't it? It sure does for embedded applications. And, there backwards compatibility is often less of an issue; you're not concerned about putting a new application into the field on an old OS, but rather about rolling out a new device with kernel, applications, and all. So, I think we want both options here. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
gcc-4.2-20080402 is now available
Snapshot gcc-4.2-20080402 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.2-20080402/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.2 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_2-branch revision 133848 You'll find: gcc-4.2-20080402.tar.bz2 Complete GCC (includes all of below) gcc-core-4.2-20080402.tar.bz2 C front end and core compiler gcc-ada-4.2-20080402.tar.bz2 Ada front end and runtime gcc-fortran-4.2-20080402.tar.bz2 Fortran front end and runtime gcc-g++-4.2-20080402.tar.bz2 C++ front end and runtime gcc-java-4.2-20080402.tar.bz2 Java front end and runtime gcc-objc-4.2-20080402.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.2-20080402.tar.bz2The GCC testsuite Diffs from 4.2-20080326 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.2 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
GCC4 version compatibility?
I have a question regarding GCC4 version compatibility? In general, should two versions with same major version number be compatible? Specifically, I want to confirm whether a C++ library built with gcc 4.1.X will link correctly using gcc 4.2.2? Thanks, Xiaoxiang
Re: GCC4 version compatibility?
On Wed, Apr 02, 2008 at 05:16:23PM -0700, Xiaoxiang Liu wrote: > I have a question regarding GCC4 version compatibility? In general, > should two versions with same major version number be compatible? > Specifically, I want to confirm whether a C++ library built with gcc > 4.1.X will link correctly using gcc 4.2.2? For C++, you need the first two numbers to match. So, 4.1.x with 4.1.y should work, 4.1.x with 4.2.y probably not.
version control process improvement
I was speaking to Andrew Tridgell yesterday about how he uses svn with the Samba project. He mentioned an idea that we could pursue in the GCC project. As you know, Subversion keeps all branches and the trunk under different paths in the repository. Thus, it's possible to check out multiple branches under a single directory tree. eg: ~/source gcc branches/gcc-4.2 branches/gcc-4.3 trunk I don't know if anyone else does it this way; I don't. By doing it this way, it's possible to apply a patch to multiple branches and commit them in a single changeset. This has the advantage of allowing us to track all of the branches a patch was committed to (at least initially; someone may of course backport the patch at a later stage) with svn-log -v. Thoughts? Cheers, Ben
instruction scheduling PowerPC target(s)
Hi I am in the process of verifying object code to source code traceability for gcc (C source): gcc 3.3.2 Wind River VxWorks PowerPC target I need to demonstrate that the difference in instruction scheduling / branch scheduling between PPC604 core and PPC603 core does not introduce untraceable code structure. Ideally I'd like to see the source code for the scheduler but I don't know where to find it. Can someone let me know where to get it, please? Thanks Duncan __ Sent from Yahoo! Mail. A Smarter Inbox http://uk.docs.yahoo.com/nowyoucan.html
Re: instruction scheduling PowerPC target(s)
> Ideally I'd like to see the source code for the scheduler but I don't > know where to find it. Can someone let me know where to get it, > please? See http://gcc.gnu.org/svn.html. You can also look into the version control system via a web interface, but that isn't well suited to grep. ;-) Cheers, Ben