Dear GCC devs,

I hope you don't mind me posting on this list. I'm trying to finish up an AST to XML converter, which I started porting from GCC-XML (a patched version of GCC-4.2) to a GCC plugin, quite a while ago now.

I'd really appreciate any help with finishing this up, as there's a lot to learn about GCC's internal garbage collection mechanisms, and I can't afford to burn much time on this at the moment, but I would like it to actually work...


(Please excuse the use of `VEC(tree, gc)` instead of `vec<tree, va_gc>` in this email; I'm accommodating for both in my code. Also, sorry about the length of this; I'm not very good at concise...)


Current plugin deficiency, cf. original GCCXML implementation
-----

One of the (last?) limitations of the plugin as it stands, is that each `cp_binding_level` is missing an extra `VEC(tree, gc)*` member, which was originally patched in to name-lookup.h. This VEC - i.e. `cp_binding_level->all_decls` - stored all (grand-)child declarations passed by `ht_forall(ident_hash, callback, 0);` to `callback`, just before the XML dump starts. (`callback` is implemented as `xml_fill_all_decls`[1]).

I've tried generating this `all_decls` vector on the fly, during the main dump routine, but it seems that the information needs to be gathered in a separate, preliminary pass of the AST, which is what the `ht_forall` call achieves. Each `cp_binding_level`'s `all_decls` member is populated by recursing backwards through each `cxx_binding`'s `previous` member, while `ht_forall` recurses forward through the AST.

The full `all_decls` member is used during the XML dump, only when writing out complete `NAMESPACE_DECL`s. (see lines preceding 1673, of `xml_output_namespace_decl`[2]).


Custom hash table
--------

I've been browsing the GCC code and reading the internals manual, and it seems to me that one way to replicate this functionality in a plugin, would be to use `ht_forall(ident_hash, ..)` to populate a separate hash table, mapping IDENTIFIER_NODE's to VEC's.

I'm sure you're all aware that implementing that, is much easier said than done! I had a grep for GCC source code using `ht_` functions, and came across stringpool.c. So I started modifying code from there, and then came to a bit of a wall: `struct GTY(()) string_pool_data`.

If I understand stringpool.c correctly, one `string_pool_data` instance is assigned to each `hashnode`, but I don't know how to get `string_pool_data` out of its hashnode.. Is there some gengtype-generated function that achieves this, or is a cast all that's required?

If this is the way to go about getting that `all_decls` VEC, please could someone help me out(!), or point me at some source code that has a GTY'd mapping of IDENTIFIER_NODE's to VEC's? I've got chapter 22 of the Internals manual in front of me (Memory Management and Type Information), but it's a lot to take in. It also looks like I'll have to figure out chapter 22.4, on how to use the `PLUGIN_GGC_(START|END)` callbacks, which will also take some time.. Pointers to any existing examples where this is done, would be really appreciated!


Using the existing hash table, `ident_hash`
----

This would be ideal. I think it would be the least amount of code, wouldn't require gengtype, or the call to `ht_forall`. If this is possible (I'm sure it is), I've failed to get a working implementation. At first I changed `xml_fill_all_decls`, to instead put the VEC of declarations into each `cxx_binding`s `static_decls`. This gives improper results, however, I think due to duplicating declarations and messing with things I shouldn't touch.

My second attempt got rid of the `ht_forall` call, and instead used `ht_lookup(ident_hash, ...)` during the dump, to get a namespace's hashnode. But I haven't got this to work, because I haven't found a way to get a VEC of all declarations, recursively, given the namespace node (as either a `tree`, `hashnode` or `cxx_binding`). One inefficiency of doing it this way, is that each time a nested namespace is encountered, then it would have to repeat itself, as the parent namespace had already recursed through it when populating its own VEC. Potential benefit: reduced minimum memory usage during the dump. Still, I can't figure out which are the necessary API functions or macros. I get lost looking in the tree.h files..

Either way, the method doesn't matter so much, as long as the result is accurate and the implementation saves some time.


Ways to help..
----

If you're familiar with the GTY datatypes, gengtype, hash tables and / or `tree.h`, please could you help me decide how I can replace the `cp_binding_level->all_decls` member, and also with finding usage examples of the relevant internal GCC API(s). If you'd be happy to contribute code, that would be even better! All due credit will be given where deserved, of course.


Current code
---

If you'd like to see the current state of the hash table code I've tried, please let me know and I can easily create a fork on github with the `xml_ident_hash`. I haven't figured out the exact gengtype commands I need to put in the build system files yet, but that's on its way...

The attempt to use `static_decls` in place of `all_decls` is currently what's live in my github repo[3]. This appears to work fine, when testing against the 80 C++ STL headers provided by GCC-4.8's libstdc++. Only four of the tests fail; further digging led me to figure out that missing `all_decls` is a much bigger problem than I'd initially thought..


Any help, pointers or advice would be really, really appreciated! If / when it's up to standard, I'd like to propose it for inclusion on the GCC plugins wiki, but it's not quite there yet...

Yours sincerely,
Alex


[1]: https://github.com/gccxml/gccxml/blob/master/GCC/gcc/cp/xml.c#L3709
[2]: https://github.com/gccxml/gccxml/blob/master/GCC/gcc/cp/xml.c#L1652
[3]: https://github.com/alexleach/gccxml_plugin

Reply via email to