On Wed, 2013-06-26 at 20:21 +0000, Joseph S. Myers wrote:
> For a shared library you need a well-defined namespace for GCC functions /
> variables so it doesn't interfere with users. Are you going to put
> everything inside a "gcc" namespace or similar?
FWIW I deliberately avoided talking about API/ABIs in the document:
removal of global state is IMHO a necessary but not sufficient condition
for being able to use GCC as a shared library; I want to focus on the
global variables.
That said, putting everything in a "gcc" namespace sounds reasonable to
me (apart from classes used solely within a single source file, which
can go in anonymous namespaces, right?).
> (You also need to avoid
> host libraries such as libiberty - which have C interfaces - interfering
> with users of the shared library. If the initial hosts for shared library
> builds are ELF, I suppose you can do that with a version map to hide
> everything not in the "gcc" namespace.)
(nods)
> Some observations specifically on your top-40 globals:
FWIW I wonder to what extent the discussions that follow all exhibit a
tradeoff between the desire to provide a clean API vs the desire to
minimize the size of the patch (to reduce backporting pain).
I guess I have a vested interest in the latter approach, since a smaller
patch is likely to be easier for me to write. However it also helps
people backporting to 4.8 and 4.7
> * global_options: where available, using a pointer to a gcc_options
> structure is of course better than using a universe (some attributes could
> be implemented more cleanly if there were per-function gcc_options
> pointers, for example). Note also the various TARGET_* macros that are
> generated to AND a variable with a mask - really, such macros should gain
> a parameter that is a pointer to a struct gcc_options (and then you'd pass
> &global_options to them if no more specific structure is available).
Interesting.
> * For dump_file and associated variables such as dump_flags, I sort of
> think there should be a proper API for code writing to dumps rather than
> directly accessing dump_file all over the compiler. That should massively
> reduce the number of places needing to access those fields of the
> universe.
Right, but doing this would be a big patch, touching numerous files.
At one extreme, a minimal patch (with no new API) would be something
like this:
#define dump_file GET_UNIVERSE().dump_file_
#define dump_flags GET_UNIVERSE().dump_flags_
The approach currently in my plan document is somewhat more invasive, to
try to avoid all of the GET_UNIVERSE() TLS uses in a shared-library
build.
What would a new API look like? The classic problem with
logging/dumping is that you want to avoid doing work in the no-logging
case, so that rather than:
log ("some message: %s", perform_some_expensive_computation ());
with enable/disable in "log" (and thus always doing the expensive work
and typically discarding it), you have things like:
if (logging)
{
log ("some message: %s", perform_some_expensive_computation ());
}
If we're going to have a "logging" conditional there, that's going to
involve passing around or looking up a variable, so why not simply make
it the dump_file?
FWIW I originally came up with a very simple API, but I filed it in the
"rejected ideas" appendix:
http://dmalcolm.fedorapeople.org/gcc/global-state/rejected-ideas.html#rejected-idea-dump-if-details
> * For targetm, make it const (which would move any further enhancements to
> it firmly into the domain of Andrew MacLeod's proposal and out of yours,
> because once it's const it's not global state for a compiler only
> supporting one target at a time). There are only a few places, for a few
> targets, that write to targetm at runtime; I think it should be possible
> to fix those by e.g. changing some data hooks in targetm into function
> hooks.
Interesting; I'll explore this.
> * For stderr and stdout, really the compiler should only be using them
> through narrow diagnostic interfaces and not other code using them
> directly. Put them in global_dc (which would, I suppose, in turn go in
> the universe)? A shared library user should be able to replace them with
> streams opened with open_memstream, for example, rather than having
> diagnostics go direct to stderr/stdout at all. Yes, this requires dealing
> with implicit uses through functions such as printf as well as explicit
> references directly in GCC.
Doublechecking, I think many of the reported uses of stdout/stderr are
false-positives, where stdout is merely used in gen* tools when building
the compiler, which isn't a problem; also in various debug functions
intended to be invoked from inside the debugger - again, not an issue.
> * For input_location, you no doubt know we want almost everywhere to use
> the location of some well-defined source-code construct instead of the
> global (the diagnostic functions that implicitly use input_location should
> go away completely, for example). But changing that is a lot of work so
> things probably do need to start by putting it in the universe as you
> suggest.
Yes; this is a case of minimizing patch size.
> * For flag_isoc99 and associated variables defined in c-common.c, move
> them to Variable entries in c.opt (i.e., into global_options). In
> general, if a variable describes state determined by command-line options,
> moving it into global_options is probably sensible.
>
> * For asm_out_file, see dump_file, stdout and stderr above - there should
> be a well-defined API for writing assembler output and only the
> implementation of that API should refer directly to this global.
In the plan I suggested using TLS for this in the shared-library build,
out of a desire to minimize the patch size.
An alternative approach would be to move much of output.h into a class;
something like this:
class MAYBE_SINGLETON asm_out
{
public:
void putc (char );
void puts (const char *);
void printf (const char *, ...) some_attribute;
/* Most out output.h gets moved to here. */
void assemble_zeros (unsigned HOST_WIDE_INT);
void assemble_align (int);
void assemble_string (const char *, int);
void assemble_external_libcall (rtx);
void assemble_label (FILE *, const char *);
// etc
private:
FILE *asm_out_file;
};
and have an "asm_out_" in the universe.
The TLS approach would be much less work for me, so I prefer it.
> * For lang_hooks, see targetm and make it const. Some hooks are
> deliberately modified in free_lang_data - the answer to that is, I think,
> to stop using those hooks directly except via a wrapper that checks
> whether free_lang_data has been called (one new global) and decides
> whether to call the langhook based on that.
Interesting; again, I'll explore this approach.
Thanks for the great feedback
Dave