Dear GCC devs,
I hope you don't mind me posting on this list. I'm trying to finish up an
AST to XML converter, which I started porting from GCC-XML (a patched
version of GCC-4.2) to a GCC plugin, quite a while ago now.
I'd really appreciate any help with finishing this up, as there's a lot to
learn about GCC's internal garbage collection mechanisms, and I can't
afford to burn much time on this at the moment, but I would like it to
actually work...
(Please excuse the use of `VEC(tree, gc)` instead of `vec<tree, va_gc>` in
this email; I'm accommodating for both in my code. Also, sorry about the
length of this; I'm not very good at concise...)
Current plugin deficiency, cf. original GCCXML implementation
-----
One of the (last?) limitations of the plugin as it stands, is that each
`cp_binding_level` is missing an extra `VEC(tree, gc)*` member, which was
originally patched in to name-lookup.h. This VEC - i.e.
`cp_binding_level->all_decls` - stored all (grand-)child declarations
passed by `ht_forall(ident_hash, callback, 0);` to `callback`, just before
the XML dump starts. (`callback` is implemented as
`xml_fill_all_decls`[1]).
I've tried generating this `all_decls` vector on the fly, during the main
dump routine, but it seems that the information needs to be gathered in a
separate, preliminary pass of the AST, which is what the `ht_forall` call
achieves. Each `cp_binding_level`'s `all_decls` member is populated by
recursing backwards through each `cxx_binding`'s `previous` member, while
`ht_forall` recurses forward through the AST.
The full `all_decls` member is used during the XML dump, only when writing
out complete `NAMESPACE_DECL`s. (see lines preceding 1673, of
`xml_output_namespace_decl`[2]).
Custom hash table
--------
I've been browsing the GCC code and reading the internals manual, and it
seems to me that one way to replicate this functionality in a plugin,
would be to use `ht_forall(ident_hash, ..)` to populate a separate hash
table, mapping IDENTIFIER_NODE's to VEC's.
I'm sure you're all aware that implementing that, is much easier said than
done! I had a grep for GCC source code using `ht_` functions, and came
across stringpool.c. So I started modifying code from there, and then came
to a bit of a wall: `struct GTY(()) string_pool_data`.
If I understand stringpool.c correctly, one `string_pool_data` instance is
assigned to each `hashnode`, but I don't know how to get
`string_pool_data` out of its hashnode.. Is there some gengtype-generated
function that achieves this, or is a cast all that's required?
If this is the way to go about getting that `all_decls` VEC, please could
someone help me out(!), or point me at some source code that has a GTY'd
mapping of IDENTIFIER_NODE's to VEC's? I've got chapter 22 of the
Internals manual in front of me (Memory Management and Type Information),
but it's a lot to take in. It also looks like I'll have to figure out
chapter 22.4, on how to use the `PLUGIN_GGC_(START|END)` callbacks, which
will also take some time.. Pointers to any existing examples where this is
done, would be really appreciated!
Using the existing hash table, `ident_hash`
----
This would be ideal. I think it would be the least amount of code,
wouldn't require gengtype, or the call to `ht_forall`. If this is possible
(I'm sure it is), I've failed to get a working implementation. At first I
changed `xml_fill_all_decls`, to instead put the VEC of declarations into
each `cxx_binding`s `static_decls`. This gives improper results, however,
I think due to duplicating declarations and messing with things I
shouldn't touch.
My second attempt got rid of the `ht_forall` call, and instead used
`ht_lookup(ident_hash, ...)` during the dump, to get a namespace's
hashnode. But I haven't got this to work, because I haven't found a way to
get a VEC of all declarations, recursively, given the namespace node (as
either a `tree`, `hashnode` or `cxx_binding`). One inefficiency of doing
it this way, is that each time a nested namespace is encountered, then it
would have to repeat itself, as the parent namespace had already recursed
through it when populating its own VEC. Potential benefit: reduced minimum
memory usage during the dump. Still, I can't figure out which are the
necessary API functions or macros. I get lost looking in the tree.h files..
Either way, the method doesn't matter so much, as long as the result is
accurate and the implementation saves some time.
Ways to help..
----
If you're familiar with the GTY datatypes, gengtype, hash tables and / or
`tree.h`, please could you help me decide how I can replace the
`cp_binding_level->all_decls` member, and also with finding usage examples
of the relevant internal GCC API(s). If you'd be happy to contribute code,
that would be even better! All due credit will be given where deserved, of
course.
Current code
---
If you'd like to see the current state of the hash table code I've tried,
please let me know and I can easily create a fork on github with the
`xml_ident_hash`. I haven't figured out the exact gengtype commands I need
to put in the build system files yet, but that's on its way...
The attempt to use `static_decls` in place of `all_decls` is currently
what's live in my github repo[3]. This appears to work fine, when testing
against the 80 C++ STL headers provided by GCC-4.8's libstdc++. Only four
of the tests fail; further digging led me to figure out that missing
`all_decls` is a much bigger problem than I'd initially thought..
Any help, pointers or advice would be really, really appreciated! If /
when it's up to standard, I'd like to propose it for inclusion on the GCC
plugins wiki, but it's not quite there yet...
Yours sincerely,
Alex
[1]: https://github.com/gccxml/gccxml/blob/master/GCC/gcc/cp/xml.c#L3709
[2]: https://github.com/gccxml/gccxml/blob/master/GCC/gcc/cp/xml.c#L1652
[3]: https://github.com/alexleach/gccxml_plugin