Re: RFC - Alternatives to gengtype

Diego Novillo Fri, 16 Nov 2012 06:22:41 -0800

On Fri, Nov 16, 2012 at 5:06 AM, Basile Starynkevitch
<bas...@starynkevitch.net> wrote:
> On Thu, Nov 15, 2012 at 07:59:36PM -0500, Diego Novillo wrote:
>> As we continue adding new C++ features in the compiler, gengtype
>> is becoming an increasing source of pain.  In this proposal, we
>> want to explore different approaches to GC that we could
>> implement.
>
> Just a minor remark: we don't only speak of Gengtype, but also of Ggc and of 
> PCH.
> I agree that they all closely related (and perhaps even LTO serialization 
> might
> be affected). And I am biased, because GCC MELT is built about the GCC 
> garbage collector
> (if you don't know about MELT see http://gcc-melt.org/ ; MELT is a domain 
> specific
> language to extend GCC). However, I probably will be able to adapt MELT to new
> conventions.
>
>>
>> At this point, we are trying to reach consensus on the general
>> direction that we should take.  Given how intertwined GC and PCH
>> are, the choices we make for one affect the other.
>
> My belief was that PCH (pre-compiled header) is deprecated with PPH
> (preprocessed headers). Will PCH continue to exist once the PPH effort goes 
> mainline.
> Or is PPH abandoned??? How is the idea of "getting rid of GC" related to PPH?


PPH in its current implementation is on hold.  It will come back in a
different shape if/when C++ adds modules to the language.

> I actually disagree with the "Get rid of GC" idea, but I am not sure
> that we all understand the same thing about it (and I have the feeling
> of the opposite). I would probably agree with "Get rid of Gengtype+Ggc+PCH
> and replace it with something better" which might be what "Get rid of GC" 
> mean.

We mean get rid of it.  No garbage collection, whatsoever.  We both
think that it is better to structure the compiler around memory pools.
 However, we also concede that we are probably in the minority and we
will need to keep GC around.

> My strong belief is that a compiler project as gigantic as GCC needs some kind
> of garbage collection. I also believe that the current (4.7) garbage
> collection *implementation* (which is probably what both Diego and
> Lawrence call the "GC" to get rid of) is grossly unsatisfactory
> (so I parse "Get rid of GC" with a big lot of ambiguity).

No.  We mean no garbage collection.  Period.

> To be more specific, I call garbage collection a scheme where (small newbie) 
> GCC contributors
> can contribute easily some code to GCC without having to understand when, and 
> how precisely,
> some data will be freed. If a user adds a pass (or a builtin) which adds e.g. 
> some Gimple,
> he does not immediately know when should his Gimple be freed (and it 
> certainly should be
> freed outside of his pass).

Memory pools.  At the end of your pass, you simply discard the pool
you were using.  Additionally, by using C++ one can use other
type-based memory strategies like smart pointers.

> Thr fact that a compiler deals with a big lot of circular data makes me think
> that naive reference-counting approaches (and in my opinion, reference 
> counting is
> just a very poor method of doing garbage collection, which does not work well
> with circular references) cannot work inside a compiler, why they do work
> inside graphical widget libraries à la Qt or GTK.

> Hence, I don't understand well how a pool-allocator would work in GCC
> without touching to a huge amount of code.

Right.  It would be a large effort to sort out.  Particularly, since
GC has been around for a while and we've gotten lazy and are probably
relying on it quite a bit.  For these reasons, even if we convinced
the community to go in this direction, it would take a while to get
there.

> I would also remark that GC might be related to LTO, even if currently it is 
> not. LTO is
> the serialization of GCC internal representations to disk, and that problem 
> is very
> related to memory management (in both cases, we are dealing with some 
> transitive closure
> of pointer references, so copying GC-s use exactly the same algorithms as 
> serialization).
> I actually don't understand why PCH uses gengtype but not LTO

Because LTO relies on proper bytecode streaming.  PCH simply writes
memory pages out.
PCH uses the wrong approach to streaming (though we understand why it
was implemented this way).

>> === Approach: Limit the Language Used
>>
>> We could avoid the problem by limiting the language we use to
>> near that which gengtype currently understands.  This approach
>> has significant consequences. It will make the standard library
>> incompatible with GTY.
>
> Which standard library are we talking about? I guess it is libstdc++
> and its standard containers like std::map and std::vector, but
> I am not sure. (Maybe is it just libiberty?????)

Yes, libstdc++.

>> Full C++ support would essentially require building a new C++
>> parser.
>
> I tend to disagree with that conclusion. I believe we should strongly
> separate the header parts with the code parts. It seems to me that we
> could perhaps afford require changes to header gcc/*.h & gcc/*.def files,

No, you still need gengtype to crawl through C++ declarations to find
the fields it wants to handle.  It needs to understand enough C++ to
know what is a data field and what isn't.

> An alternative route might be to describe all the data types inside GCC
> in some other files (perhaps using a GTY-friendly syntax, or not),
> a bit like some IDLs 
> http://en.wikipedia.org/wiki/Interface_description_language do.

Which amounts to the same disconnect needed to do the user-marking
functions, without the advantage of better debuggability.

> Such an approach does not limit the use of libstdc++ containers, we could 
> have the
> IDL generate std::vector or std::map things.

Well, then you get in the whole business of having to keep your
meta-annotations in sync with libstdc++ changes.  I don't really like
this idea.

>> This solution would require a first boot stage that did not
>> support PCH, because we cannot rely on the seed compiler
>> supporting GTY.  We would probably need to use the Boehm
>> collector during the first stage as well.
>>
>> Because the first stage would be fundamentally different from the
>> second stage, we may need to add an additional pass for correct
>> object file comparisons.
>
> Did you mean "additional stage" instead of "additional pass"?? I understand 
> it like
> an additional stage.

Yes.

>> Common facilities to mark arrays will be provided via a base GC
>> class. No generated header files to #include at the end of the
>> source file. No new dependencies to add to the Makefile. No need
>> to parse C++.
>
> I don't follow you. Why don't you like generated header files?

Because they introduce magic.  Magic is bad when you are debugging a
compiler failure. Particularly when the bug manifest itself inside the
magic area.

> A big advantage of using Boehm GC is that the interface is familar to 
> everyone,
> and that it handles very nicely local pointers without pain.
> So data inside passes could also be GC-ed.

This may be the most practical approach long term.  Boehm-gc is a well
supported and solid solution for GC.  But we need to make sure it
works for us.


Diego.

Re: RFC - Alternatives to gengtype

Reply via email to