On Tue, 2013-07-02 at 06:37 -0500, Gabriel Dos Reis wrote:
> On Tue, Jul 2, 2013 at 1:16 AM, Marc Glisse <marc.gli...@inria.fr> wrote:
> > On Mon, 1 Jul 2013, Gabriel Dos Reis wrote:
> >
> >> On Mon, Jul 1, 2013 at 10:36 AM, David Malcolm <dmalc...@redhat.com>
> >> wrote:
> >>>
> >>> My plan for removal of global variables in gcc 4.9 [1] calls for several
> >>> hundred new classes, which will be singletons in a classic monolithic
> >>> build, but have multiple instances in a shared-library build.
> >>>
> >>> In order to avoid the register pressure of passing a redundant "this"
> >>> pointer around for the classic case, I've been looking at optimizing
> >>> singletons.
> >>>
> >>> I'm attaching an optimization for this: a new "force_static" attribute
> >>> for the C++ frontend, which when added to a class implicitly adds
> >>> "static" to all members of said class.  This gives a way of avoiding a
> >>> "this" pointer in the classic build (in stages 2 and 3, once the
> >>> attribute is recognized), whilst supporting it in a shared-library
> >>> build, with relatively little boilerplate, preprocessor hackery or
> >>> syntactic differences.
> >>>
> >>> See:
> >>>
> >>> http://dmalcolm.fedorapeople.org/gcc/global-state/singletons.html#another-singleton-removal-optimization
> >>> for more information on how this would be used in GCC itself.
> >>
> >>
> >> Hi David,
> >>
> >> I am still a little bit confused by this.  Help me out:
> >>   1. if we don't need to pass `this', why should we ever find
> >>       ourselves to writing functions that need one in the first place?
> >>       How do shared libraries get into this water?
> >
> >
> > In theory, there is always this regular class with data and member functions
> > that use *this. However, for traditional use (not in a library), there will
> > be a single global instance of this class. For optimization purposes, it
> > seems better in that case to make all members (variables and functions)
> > static to let the compiler use a constant address instead of passing "this"
> > around.
> 
> Thanks, Marc!
> 
> From the description, I have the impression you are saying that the class
> is essentially a singleton class.  Is that right?
> 
> Sorry, I am still confused about what the problem is and why we need this
> solution -- I read several times David's links but I can't get my head
> around the fundamental problem.

Sorry about that.

I want to rework the internals of GCC so that they become thread-safe,
embeddable as a library inside another program.  

For example, a web browser may want to compile JavaScript to machine
code: each tab within the browser could be running on a separate thread
within one process, and each may want to create "compilation contexts"
that turn javascript into gimple, say, create a dedicated optimization
pipeline (perhaps with some custom passes), and send the gimple+pipeline
to the GCC-as-a-shared-library to get machine code back.  [well,
assembler, but binutils-as-a-library is currently out-of-scope for my
plan].

Given that this could all be happening on different threads, we could
just have one big mutex to avoid interference, but the better approach
is to isolate all of the state inside GCC so each compilation context
gets its own world of state.  Ideally there would be no global variables
internally in gcc, but currently there are about 3500 of them.

I want to provide the above, but I don't want to slow down the current
use-case of GCC: the family of monolithic binaries.  For example, simply
building as a shared library slows things down, since you'd have have to
build gcc as position-independent code, which itself incurs a slowdown.
So I envisage a "--enable-shared" configuration switch to opt-in to the
shared library code, but I want as minimize the difference between the
two cases.

The natural way to eliminate the various globals is to group
logically-related global variables into classes, together with the
functions that operate on them, as methods.  For example, the
callgraph-related code could be placed into a new "class callgraph", the
pass-management code into a new "class pipeline", the garbage-collector
internals into a new "class gc_heap", etc.  The various passes with
state typically get their own classes.

I estimate about 300 such classes.

Each client of gcc (e.g. each tab in a web browser) would have its own
context pointer into the GCC shared library, a handle to one "parallel
universe" of state, independent of all other such parallel universes.

Much of GCC's existing functions would become methods of one of the
above classes.  For example, most of the functions in cgraph.c,
cgraphbuild.c, cgraphclones.c and cgraphunit.c would become methods of a
"class callgraph" (much of these methods would be private); each
parallel universe of state would have its own callgraph instance.  The
graph of internal functions implementing the callgraph code become a
graph of *methods*, passing around a "this" pointer.

This should give us a relatively smooth transition path from the current
code to such a parallel-universe world, but there is a performance
concern: thousands of callsites throughout the compiler would gain a
"this" parameter that gets silently passed around, and dereferenced
throughout the code when reading/writing data which would have been a
global in an old version of gcc.

We'd have hundreds of these new classes which would be regular classes
when configured --with-shared, but these classes are all singletons when
configured --without-shared: the "this" pointer is redundant in this
latter configuration (as is the case for a conventional build of gcc, as
binaries).

The purpose of the attribute is to provide a way to write code as
regular classes, but have a relatively simple way of eliminating all of
the "this" uses in the --without-shared build, thus trivially making the
performance characteristics of the new gcc with --without-shared be the
same as the old gcc.

Hence we get relatively clean code (IMHO), the ability to have multiple
independent compilation contexts in a thread-safe gcc shared library,
and the same code shared with a non-shared-library gcc that has the same
performance characteristics as an existing version of gcc.

Hope the above makes more sense
Dave

Reply via email to