vtrelocs: large/modular C++ app speedup ...

2008-04-02 Thread Michael Meeks
Hi guys,

I spent a little time recently researching ways to reduce the number of
unique named relocations that must be processed at dlopen time for large
C++ libraries[1]. Apologies for spamming all 3 lists like this, but it
touches all 3 projects.

Since almost all function relocations of this type are inside vtables,
I implemented a new way of relocating vtables. This is a new
'.suse.vtrelocs' section.

As we inherit a class across a shared library boundary we construct new
vtables that are often extremely similar to their parents. However -
this similarity is not exposed - instead we fill the new vtable with
many unique named relocations, one per method. This generates lots
of .rel entries, and emits lots of external symbols; worse these symbols
tend to be duplicated across ~all libraries deriving from the base
class.

Instead a vtreloc sections contains (a sorted):

struct {
void **src, **dest;
int  copy_slot_bitmask;
} vtreloc_entries[] = { ... }

The run-time cost of processing these is insignificant in comparison to
the cost of processing the remaining relocations, giving a pleasant
speed win.

A brief slide-deck with the results of my research is here:

http://www.gnome.org/~michael/vtrelocs-gcc.pdf

and has a comparison against the current state of the art wrt. reducing
relocations: -Bsymbolic-functions [ in itself a substantial
optimisation ].

The 3 prototype patches for discussion are attached. There are a number
of trivial hacks in there (of course) - eg. environment variables to
turn the feature on, leaving an empty .vtrelocs section in object files
etc.

The more interesting problems are:

* glibc - the memory protection semantics need adjusting - since
  we need to fixup relocations in 'init' order: shouldn't be
  impossibly hard to fix but I just turn off protection ;-)
+ subsequent dlopens can (I think) avoid touching
  already relocated libraries they don't own avoiding 
  this sort of problem.

* gcc - the code to generate the vtreloc sections is  
  written for comfort not speed. This is a fall-back from having
  initially tried to integrate the work into 
  build_vtbl_initializer & friends with some success, but rather
  a tangling of the code.

* vtreloc section design - the section should be readonly, and 
  prolly refer by offset to .bss relocations that can be re-used
  for implementing indirect calls via. parent vtable to virtual
  functions. That should save relocs, but make each entry 
  slightly larger.

Of course, apart from the run-time speed wins, some of the nicest
potential size wins come from breaking the ABI[2] & depending on the
vtrelocs to fixup vtables: eg. hiding all thunks (implemented), or
potentially hiding all virtual function symbols & invoking them via
their parent vtable (not implemented).

Wrt. testing, I can build & run an OO.o built with this - clearly not a
unit-test ;-) but perhaps helpful.

Feedback much appreciated,

Thanks,

Michael.

[1] - specifically OpenOffice.org ;-)
[2] - which while bad, can be done in isolated islands like OO.o.
-- 
 [EMAIL PROTECTED]  <><, Pseudo Engineer, itinerant idiot

diff -u -r -x '*~' -x testsuite -x libjava -x cc-nptl -x build-dir -x '*.orig' -x obj-i586-suse-linux -x texis -x Makeconfig -x version.h -x '*.o' -x '*.1' -x 'Makefile*' -x 'config*' -x libtool -x '*.info' -x '*.tex' pristine-binutils-2.17.50/bfd/elf.c binutils-2.17.50/bfd/elf.c
--- pristine-binutils-2.17.50/bfd/elf.c	2008-01-09 16:45:22.0 +
+++ binutils-2.17.50/bfd/elf.c	2008-01-23 16:48:45.0 +
@@ -1240,6 +1240,7 @@
 	case DT_USED: name = "USED"; break;
 	case DT_FILTER: name = "FILTER"; stringp = TRUE; break;
 	case DT_GNU_HASH: name = "GNU_HASH"; break;
+	case DT_SUSE_VTRELOC: name = "SUSE_VTRELOC"; break;
 	}
 
 	  fprintf (f, "  %-11s ", name);

diff -u -r -x '*~' -x testsuite -x libjava -x cc-nptl -x build-dir -x '*.orig' -x obj-i586-suse-linux -x texis -x Makeconfig -x version.h -x '*.o' -x '*.1' -x 'Makefile*' -x 'config*' -x libtool -x '*.info' -x '*.tex' pristine-binutils-2.17.50/bfd/elflink.c binutils-2.17.50/bfd/elflink.c
--- pristine-binutils-2.17.50/bfd/elflink.c	2008-01-09 16:45:22.0 +
+++ binutils-2.17.50/bfd/elflink.c	2008-01-23 16:50:07.0 +
@@ -5652,6 +5652,13 @@
 	return FALSE;
 	}
 
+  s = bfd_get_section_by_name (output_bfd, ".suse.vtrelocs");
+  if (s != NULL)
+	{
+  if (!_bfd_elf_add_dynamic_entry (info, DT_SUSE_VTRELOC, 0))
+	return FALSE;
+	}
+
   dynstr = bfd_get_section_by_name (dynobj, ".dynstr");
   /* If .dynstr is excluded from the link, we don't want any of
 	 these tags.  Strictly, we should be checking each section
@@ -10869,6 +10876

Bootstrap failure due to a typo in gcc/fwprop.c

2008-04-02 Thread Dominique Dhumieres
While rebuilding gcc I got the following failure:

/opt/gcc/i686-darwin/./prev-gcc/xgcc -B/opt/gcc/i686-darwin/./prev-gcc/ 
-B/opt/gcc/gcc4.4w/i686-apple-darwin9/bin/ -c  -g -O2 -fomit-frame-pointer 
-DIN_GCC   -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes 
-Wold-style-definition -Wmissing-format-attribute -pedantic -Wno-long-long 
-Wno-variadic-macros  
-Wno-overlength-strings -Werror -fno-common  -DHAVE_CONFIG_H -I. -I. 
-I../../gcc-4.4-work/gcc -I../../gcc-4.4-work/gcc/. 
-I../../gcc-4.4-work/gcc/../include -I./../intl 
-I../../gcc-4.4-work/gcc/../libcpp/include -I/sw/include  
-I../../gcc-4.4-work/gcc/../libdecnumber 
-I../../gcc-4.4-work/gcc/../libdecnumber/dpd -I../libdecnumber  
../../gcc-4.4-work/gcc/fwprop.c -o fwprop.o
cc1: warnings being treated as errors
../../gcc-4.4-work/gcc/fwprop.c:234: error: comma at end of enumerator list
make[3]: *** [fwprop.o] Error 1
make[3]: *** Waiting for unfinished jobs
make[2]: *** [all-stage2-gcc] Error 2
make[1]: *** [stage2-bubble] Error 2
make: *** [all] Error 2

This is due to revision 133828 and fixed by the following patch:

--- ../_gcc_clean/gcc/fwprop.c  2008-04-02 12:12:57.0 +0200
+++ gcc/fwprop.c2008-04-02 13:44:07.0 +0200
@@ -231,7 +231,7 @@
  PR_HANDLE_MEM is set when the source of the propagation was not
  another MEM.  Then, it is safe not to treat non-read-only MEMs as
  ``opaque'' objects.  */
-  PR_HANDLE_MEM = 2,
+  PR_HANDLE_MEM = 2
 };

Dominique


Re: Bootstrap failure due to a typo in gcc/fwprop.c

2008-04-02 Thread Paolo Bonzini



This is due to revision 133828 and fixed by the following patch:

--- ../_gcc_clean/gcc/fwprop.c  2008-04-02 12:12:57.0 +0200
+++ gcc/fwprop.c2008-04-02 13:44:07.0 +0200
@@ -231,7 +231,7 @@
  PR_HANDLE_MEM is set when the source of the propagation was not
  another MEM.  Then, it is safe not to treat non-read-only MEMs as
  ``opaque'' objects.  */
-  PR_HANDLE_MEM = 2,
+  PR_HANDLE_MEM = 2
 };


Committed as 133833.

Paolo


Re: memory leak on regular expression (regex.c)

2008-04-02 Thread Ian Lance Taylor
amihud bruchim <[EMAIL PROTECTED]> writes:

> I found a memory leak on regcomp function - gcc-4.4.2 (i used Memory 
> validator tool to confirm it) . 

regcomp is part of glibc (or whatever C library you are using).  It is
not part of gcc.  For more information, including where to report
bugs, please see http://sourceware.org/glibc/ .

Ian


Re: vtrelocs: large/modular C++ app speedup ...

2008-04-02 Thread Ian Lance Taylor
Michael Meeks <[EMAIL PROTECTED]> writes:

>   Since almost all function relocations of this type are inside vtables,
> I implemented a new way of relocating vtables. This is a new
> '.suse.vtrelocs' section.

It's an interesting idea.  Some comments:

* Use GNU instead of SUSE, as this is for the GNU tools.

* Don't check for explicit section names.  Instead, give the section a
  magic type.

* It seems that this is not backward compatible--an executable built
  in this way will not work if the dynamic linker does not know about
  it.  The section should have the SHF_OS_NONCONFORMING bit set.

* Aren't you going to get a lot of duplicate vtreloc entries?
  Shouldn't they be grouped with the vtables themselves?

* The idea is useless without support in the dynamic linker, so you
  need to get signoff there first.

Ian


Re: vtrelocs: large/modular C++ app speedup ...

2008-04-02 Thread Andi Kleen
Ian Lance Taylor <[EMAIL PROTECTED]> writes:

> * It seems that this is not backward compatible--an executable built
>   in this way will not work if the dynamic linker does not know about
>   it.  The section should have the SHF_OS_NONCONFORMING bit set.

I wonder if it could be made backwards compatible. As in keep the old
style relocations too, but the new linker would not process them
when seeing the new special relocations.

That would make it much easier to deploy this in the field because
you wouldn't need two sets of executables one for the new linkers
and one for old linkers during the transition time. 

The only drawback would be some more memory and some more disk space
for the old relocations (and a little more IO bandwidth), but all of
those are cheap.

-Andi


Re: vtrelocs: large/modular C++ app speedup ...

2008-04-02 Thread Michael Meeks
Hi Ian / Andi,

On Wed, 2008-04-02 at 07:56 -0700, Ian Lance Taylor wrote:
> * Use GNU instead of SUSE, as this is for the GNU tools.

Ah yes; you noticed the subliminal advertising ;-) If you're happy for
me to trample on the GNU section namespace that's fine, but I hesitate
to tread there by default.

> * Don't check for explicit section names.  Instead, give the section a
>   magic type.
> * It seems that this is not backward compatible--an executable built
>   in this way will not work if the dynamic linker does not know about
>   it.  The section should have the SHF_OS_NONCONFORMING bit set.

Not clear how to fix either of those :-) I binned a redundant string
section name lookup in the binutils patch though.

> * Aren't you going to get a lot of duplicate vtreloc entries?
>   Shouldn't they be grouped with the vtables themselves?

That's entirely possible; perhaps I misunderstand the question, but had
I hoped that by making the _ZVTR_ section weak the linker would discard
any duplicate vtreloc records for the same vtable.

> * The idea is useless without support in the dynamic linker, so you
>   need to get signoff there first.

Naturally :-)

On Wed, 2008-04-02 at 17:06 +0200, Andi Kleen wrote:
> I wonder if it could be made backwards compatible. As in keep the old
> style relocations too, but the new linker would not process them
> when seeing the new special relocations.
 
It's certainly possible; of course it looses you any size savings. I
imagine that using the dynsort code we could shuffle the relevant relocs
to the end of the list fairly easily - that is if we could identify
whether they overlapped with the vtrelocs (or not): perhaps some big
bit-mask for the whole data section or something (?).

Thanks,

Michael.

-- 
 [EMAIL PROTECTED]  <><, Pseudo Engineer, itinerant idiot




Re: vtrelocs: large/modular C++ app speedup ...

2008-04-02 Thread Andi Kleen
>   It's certainly possible; of course it looses you any size savings. I

If it's all in one place and only on disk it doesn't really matter doesn't it?
Even if loaded into memory as long as it is read in one block without 
much seeking it shouldn't be that bad.

Backwards compatibility is always very important, as long as the price
to be paid for it is not excessive (which it isn't here I think)

-Andi


Re: vtrelocs: large/modular C++ app speedup ...

2008-04-02 Thread Ian Lance Taylor
Michael Meeks <[EMAIL PROTECTED]> writes:

> On Wed, 2008-04-02 at 07:56 -0700, Ian Lance Taylor wrote:
>> * Use GNU instead of SUSE, as this is for the GNU tools.
>
>   Ah yes; you noticed the subliminal advertising ;-) If you're happy for
> me to trample on the GNU section namespace that's fine, but I hesitate
> to tread there by default.

In as much as the GNU namespace is managed at all, it is managed by
the people who read these mailing lists.  Don't worry about trampling
on it.


>> * Don't check for explicit section names.  Instead, give the section a
>>   magic type.
>> * It seems that this is not backward compatible--an executable built
>>   in this way will not work if the dynamic linker does not know about
>>   it.  The section should have the SHF_OS_NONCONFORMING bit set.
>
>   Not clear how to fix either of those :-) I binned a redundant string
> section name lookup in the binutils patch though.

You need to emit an appropriate .section pseudo-op.


>> * Aren't you going to get a lot of duplicate vtreloc entries?
>>   Shouldn't they be grouped with the vtables themselves?
>
>   That's entirely possible; perhaps I misunderstand the question, but had
> I hoped that by making the _ZVTR_ section weak the linker would discard
> any duplicate vtreloc records for the same vtable.

The linker doesn't work that way.  Symbols are weak, not sections.
Read up on section groups.

Ian


Re: vtrelocs: large/modular C++ app speedup ...

2008-04-02 Thread Mark Mitchell

Andi Kleen wrote:

It's certainly possible; of course it looses you any size savings. I


If it's all in one place and only on disk it doesn't really matter doesn't it?


It sure does for embedded applications.  And, there backwards 
compatibility is often less of an issue; you're not concerned about 
putting a new application into the field on an old OS, but rather about 
rolling out a new device with kernel, applications, and all.


So, I think we want both options here.

--
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


gcc-4.2-20080402 is now available

2008-04-02 Thread gccadmin
Snapshot gcc-4.2-20080402 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.2-20080402/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.2 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_2-branch 
revision 133848

You'll find:

gcc-4.2-20080402.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.2-20080402.tar.bz2 C front end and core compiler

gcc-ada-4.2-20080402.tar.bz2  Ada front end and runtime

gcc-fortran-4.2-20080402.tar.bz2  Fortran front end and runtime

gcc-g++-4.2-20080402.tar.bz2  C++ front end and runtime

gcc-java-4.2-20080402.tar.bz2 Java front end and runtime

gcc-objc-4.2-20080402.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.2-20080402.tar.bz2The GCC testsuite

Diffs from 4.2-20080326 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.2
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


GCC4 version compatibility?

2008-04-02 Thread Xiaoxiang Liu
I have a question regarding GCC4 version compatibility? In general,
should two versions with same major version number be compatible?
Specifically, I want to confirm whether a C++ library built with gcc
4.1.X will link correctly using gcc 4.2.2?


Thanks,

Xiaoxiang


Re: GCC4 version compatibility?

2008-04-02 Thread Joe Buck
On Wed, Apr 02, 2008 at 05:16:23PM -0700, Xiaoxiang Liu wrote:
> I have a question regarding GCC4 version compatibility? In general,
> should two versions with same major version number be compatible?
> Specifically, I want to confirm whether a C++ library built with gcc
> 4.1.X will link correctly using gcc 4.2.2?

For C++, you need the first two numbers to match.  So, 4.1.x with 4.1.y
should work, 4.1.x with 4.2.y probably not.



version control process improvement

2008-04-02 Thread Ben Elliston
I was speaking to Andrew Tridgell yesterday about how he uses svn with
the Samba project.  He mentioned an idea that we could pursue in the GCC
project.

As you know, Subversion keeps all branches and the trunk under different
paths in the repository.  Thus, it's possible to check out multiple
branches under a single directory tree.  eg:

  ~/source
gcc
  branches/gcc-4.2
  branches/gcc-4.3
  trunk

I don't know if anyone else does it this way; I don't.  By doing it this
way, it's possible to apply a patch to multiple branches and commit them
in a single changeset.  This has the advantage of allowing us to track
all of the branches a patch was committed to (at least initially;
someone may of course backport the patch at a later stage) with svn-log
-v.

Thoughts?

Cheers, Ben



instruction scheduling PowerPC target(s)

2008-04-02 Thread Duncan Purll
Hi

I am in the process of verifying object code to source code traceability for 
gcc (C source):

gcc 3.3.2 Wind River VxWorks PowerPC target

I need to demonstrate that the difference in instruction scheduling / branch 
scheduling between PPC604 core and PPC603 core does not introduce untraceable 
code structure.

Ideally I'd like to see the source code for the scheduler but I don't know 
where to find it. Can someone let me know where to get it, please?

Thanks

Duncan





  __
Sent from Yahoo! Mail.
A Smarter Inbox http://uk.docs.yahoo.com/nowyoucan.html



Re: instruction scheduling PowerPC target(s)

2008-04-02 Thread Ben Elliston
> Ideally I'd like to see the source code for the scheduler but I don't
> know where to find it. Can someone let me know where to get it,
> please?

See http://gcc.gnu.org/svn.html.  You can also look into the version
control system via a web interface, but that isn't well suited to
grep.  ;-)

Cheers, Ben