On Wed, 2013-06-26 at 20:21 +0000, Joseph S. Myers wrote:
> For a shared library you need a well-defined namespace for GCC functions / 
> variables so it doesn't interfere with users.  Are you going to put 
> everything inside a "gcc" namespace or similar?  

FWIW I deliberately avoided talking about API/ABIs in the document:
removal of global state is IMHO a necessary but not sufficient condition
for being able to use GCC as a shared library; I want to focus on the
global variables.

That said, putting everything in a "gcc" namespace sounds reasonable to
me (apart from classes used solely within a single source file, which
can go in anonymous namespaces, right?).

> (You also need to avoid 
> host libraries such as libiberty - which have C interfaces - interfering 
> with users of the shared library.  If the initial hosts for shared library 
> builds are ELF, I suppose you can do that with a version map to hide 
> everything not in the "gcc" namespace.)

(nods)

> Some observations specifically on your top-40 globals:

FWIW I wonder to what extent the discussions that follow all exhibit a
tradeoff between the desire to provide a clean API vs the desire to
minimize the size of the patch (to reduce backporting pain).

I guess I have a vested interest in the latter approach, since a smaller
patch is likely to be easier for me to write.  However it also helps
people backporting to 4.8 and 4.7

> * global_options: where available, using a pointer to a gcc_options 
> structure is of course better than using a universe (some attributes could 
> be implemented more cleanly if there were per-function gcc_options 
> pointers, for example).  Note also the various TARGET_* macros that are 
> generated to AND a variable with a mask - really, such macros should gain 
> a parameter that is a pointer to a struct gcc_options (and then you'd pass 
> &global_options to them if no more specific structure is available).

Interesting.

> * For dump_file and associated variables such as dump_flags, I sort of 
> think there should be a proper API for code writing to dumps rather than 
> directly accessing dump_file all over the compiler.  That should massively 
> reduce the number of places needing to access those fields of the 
> universe.

Right, but doing this would be a big patch, touching numerous files.

At one extreme, a minimal patch (with no new API) would be something
like this:

  #define dump_file  GET_UNIVERSE().dump_file_
  #define dump_flags GET_UNIVERSE().dump_flags_

The approach currently in my plan document is somewhat more invasive, to
try to avoid all of the GET_UNIVERSE() TLS uses in a shared-library
build.

What would a new API look like?  The classic problem with
logging/dumping is that you want to avoid doing work in the no-logging
case, so that rather than:

   log ("some message: %s", perform_some_expensive_computation ());

with enable/disable in "log" (and thus always doing the expensive work
and typically discarding it), you have things like:

   if (logging)
     {
        log ("some message: %s", perform_some_expensive_computation ());
     }

If we're going to have a "logging" conditional there, that's going to
involve passing around or looking up a variable, so why not simply make
it the dump_file?

FWIW I originally came up with a very simple API, but I filed it in the
"rejected ideas" appendix:
http://dmalcolm.fedorapeople.org/gcc/global-state/rejected-ideas.html#rejected-idea-dump-if-details


> * For targetm, make it const (which would move any further enhancements to 
> it firmly into the domain of Andrew MacLeod's proposal and out of yours, 
> because once it's const it's not global state for a compiler only 
> supporting one target at a time).  There are only a few places, for a few 
> targets, that write to targetm at runtime; I think it should be possible 
> to fix those by e.g. changing some data hooks in targetm into function 
> hooks.

Interesting; I'll explore this.

> * For stderr and stdout, really the compiler should only be using them 
> through narrow diagnostic interfaces and not other code using them 
> directly.  Put them in global_dc (which would, I suppose, in turn go in 
> the universe)?  A shared library user should be able to replace them with 
> streams opened with open_memstream, for example, rather than having 
> diagnostics go direct to stderr/stdout at all.  Yes, this requires dealing 
> with implicit uses through functions such as printf as well as explicit 
> references directly in GCC.

Doublechecking, I think many of the reported uses of stdout/stderr are
false-positives, where stdout is merely used in gen* tools when building
the compiler, which isn't a problem; also in various debug functions
intended to be invoked from inside the debugger - again, not an issue.

> * For input_location, you no doubt know we want almost everywhere to use 
> the location of some well-defined source-code construct instead of the 
> global (the diagnostic functions that implicitly use input_location should 
> go away completely, for example).  But changing that is a lot of work so 
> things probably do need to start by putting it in the universe as you 
> suggest.

Yes; this is a case of minimizing patch size.

> * For flag_isoc99 and associated variables defined in c-common.c, move 
> them to Variable entries in c.opt (i.e., into global_options).  In 
> general, if a variable describes state determined by command-line options, 
> moving it into global_options is probably sensible.
> 
> * For asm_out_file, see dump_file, stdout and stderr above - there should 
> be a well-defined API for writing assembler output and only the 
> implementation of that API should refer directly to this global.

In the plan I suggested using TLS for this in the shared-library build,
out of a desire to minimize the patch size.

An alternative approach would be to move much of output.h into a class;
something like this:

class MAYBE_SINGLETON asm_out
{
public:
  void putc (char );
  void puts (const char *);
  void printf (const char *, ...) some_attribute;

  /* Most out output.h gets moved to here.  */
  void assemble_zeros (unsigned HOST_WIDE_INT);
  void assemble_align (int);
  void assemble_string (const char *, int);
  void assemble_external_libcall (rtx);
  void assemble_label (FILE *, const char *);
  // etc

private:
  FILE *asm_out_file;
};

and have an "asm_out_" in the universe.

The TLS approach would be much less work for me, so I prefer it.

> * For lang_hooks, see targetm and make it const.  Some hooks are 
> deliberately modified in free_lang_data - the answer to that is, I think, 
> to stop using those hooks directly except via a wrapper that checks 
> whether free_lang_data has been called (one new global) and decides 
> whether to call the langhook based on that.

Interesting; again, I'll explore this approach.

Thanks for the great feedback
Dave


Reply via email to