[Dwarf-Discuss] About self-referencial sized types
Hi, I'm working on improvements for the DWARF information produced by GCC for Ada, and I have hit the following issue: In Ada, it is possible to define array types inside structures (that are called records). The size of such array types can then depend on members of such records. For instance: type Array_Type is array (Integer range <>) of Integer; type Record_Type (N : Integer) is record A : Array_Type (1 .. N); end record; Here, A's type is a "subtype" of Array_Type. It is an array whose upper bound is the special record member "N". I'm wondering how such bounds should be translated into DW_AT_{lower,upper}_bound attributes. The DWARFv4 specification (Appendix D, subsection 2.2 Ada Example) suggests the following DIEs (I'm stripping a few attributes that are not relevant for this issue): 1$: DW_TAG_structure_type DW_AT_name("Record_Type") 2$: DW_TAG_member DW_AT_name("N") DW_AT_type(reference to Integer) 3$: DW_TAG_array_type DW_AT_type(reference to Integer) 4$: DW_TAG_subrange_type DW_AT_type(reference to Integer) DW_AT_lower_bound(constant 1) DW_AT_upper_bound(reference to member N at 2$) 5$: DW_TAG_member DW_AT_name("A") DW_AT_type(reference to array type at 4$) With this debug info, the upper bound of "A" indeed completely mirrors the value of "N". In GCC, however, computing the upper bound of "A" is more subtle: it is internally represented as: max(0, .N) so that when "N" is negative, 0 is returned. While it is straightforward to reference a DIE from the DW_AT_upper_bound attribute, I struggle doing so inside a DWARF expression, and I do need a DWARF expression to correctly describe the computation of the upper bound. I guess I need an operation sequence that looks like: # Push N, then 0 ??? Get the value of the "N" member; DW_OP_lit0; # Is N > 0? DW_OP_over; DW_OP_over; DW_OP_gt; DW_OP_bra: 1; # If not then return 0, else return N. DW_OP_swap DW_OP_drop So the issue for me is to know what to put instead of the "???" part. It looks like the DW_OP_push_object_address (defined in section 2.5.1.3 Stack Operations) was introduced specifically for this kind of computation, but I'm not sure what it is supposed to mean in this context. Indeed, this operation would appear as part of a DWARF expression under a DW_TAG_subrange_type DIE, itself under a DW_TAG_array_type DIE, itself under a DW_TAG_structure_type. So what address would this operation push on top of the stack? The address of the "A" member, or the address of the embedding record? The offsets of discriminants (the special record members that can be used to determine the size of regular record members) inside the record are statically known, so getting the address of the embedding record would be enough to be able to fetch the value of the discriminant. On the other hand, getting the address of the "A" member would not be sufficient: in more complex cases, the offset of the "A" member can depend on discriminants! I tried to look at the implementation of DW_OP_push_object_address in GDB, but it looks like it's not implemented yet. What do you think about its expected behavior? And if I cannot use this operation for such array bound expressions, what should I use? Thank you in advance for your answers. :-) -- Pierre-Marie de Rodat ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
Re: [Dwarf-Discuss] About self-referencial sized types
On 04/23/2014 04:34 PM, Tom Tromey wrote: Jakub> That's strange, for Fortran arrays GCC emits DW_OP_push_object_address Jakub> heavily. AFAICT it's never come "seriously" to anybody's attention. Looking for more information about DW_OP_push_object_address, I found a thread from 2007[1] in which Jan Kratochvil submits patches for VLAs handling on the gdb-patches@ mailing list, including support for this operation. It seems that these patches were never pushed, though[2]. As Joel said, today we have VLAs handling in GDB, but not this specific hunk. [1] https://sourceware.org/ml/gdb-patches/2007-11/msg00321.html [2] https://sourceware.org/ml/gdb/2011-03/msg00021.html -- Pierre-Marie de Rodat ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
Re: [Dwarf-Discuss] About self-referencial sized types
On 04/23/2014 04:14 PM, Tom Tromey wrote: Just to address the gdb part -- the only reason this isn't implemented is that presumably no gdb developer has ever encountered a compiler that emits it. It shouldn't be hard to add should you need it. Please file a bug in gdb bugzilla. As a matter of fact, I'm working on GCC, and specifically on the part that emits DW_OP_push_object_address operations (as Jakub said). Still, I know nothing about Fortran yet, so I'm not able to file a proper bug report. Besides, I would not be able to say what I expect GDB to do when provided a DW_OP_push_object_address... hence my question in this thread. ;-) -- Pierre-Marie de Rodat ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
Re: [Dwarf-Discuss] About self-referencial sized types
On 04/28/2014 05:29 PM, Agovic, Sanimir wrote: In GCC, however, computing the upper bound of "A" is more subtle: it is internally represented as: max(0, .N) so that when "N" is negative, 0 is returned. I`m not being familiar with internal gcc representations but what is preventing you of referencing the member N? Nothing: actually, the internal GCC representation for Ada arrays bounds (GENERIC trees) is already there and works well for quite a while. My problem here is that I do not know what DWARF operations to output in the DW_AT_{lower,upper}_bound attributes in order to retrieve array "neighbors" members so that we can compute the array bounds using them. So what address would this operation [DW_OP_push_object_address] push on top of the stack? The address of the "A" member, or the address of the embedding record? It pushes the address of the currently evaluated object, in your case it is the address of member "A". You may have a look at 'D.2 Aggregate Examples' and Figure 51 in the latest dwarf standard. DW_OP_push_object_address is usually used to address meta information of a type e.g. bound information of an array. This information is usually part of the array descriptor hence the address of the object is needed and not the embedding type. Can you illustrate the record descriptor representation? e.g. a simple representation in form of a C struct. My knowledge about how vla work is pretty limited to C99 and Fortran. It might help to understand the problem a bit better. Sure, I should have done it earlier. The Record_Type I was refering to in my example would be translated to something like: struct record_type { int n; int a[1 .. max(n, 0)]; }; On the other hand, getting the address of the "A" member would not be sufficient: in more complex cases, the offset of the "A" member can depend on discriminants! So the bounds information of an array is not stored as part of the array descriptor but rather in the embedding type? Indeed, there is no descriptor. That's why the DW_TAG_array_type DIE is put under the DW_TAG_record_type one: the array type has no meaning without the embedding record type. We have implemented the opcode in our vla project at [1], see commits [2] and [3]. [1] https://github.com/intel-gdb/vla/commits/vla-fortran [2] https://github.com/intel-gdb/vla/commit/2d354ffed66a91a0ecf5848f33179c2b4e84c115 [3] https://github.com/intel-gdb/vla/commit/014913ff8433a661ee8a1dfe158d66465c8f343c Thank you for the pointers. So, if I understand correctly, the C99 VLAs rely on "neighbor" local variables, the Fortran ones rely on descriptors (i.e. wrapped pointers that embbed information about the array), while the Ada ones can rely on both _plus_ other cases, like the example I introduced at the beginning of the thread. The DW_OP_push_object_address seems to be suited only for descriptors, and thus it is not adapted to describe such bound computations in DWARF. By the way, I came across an issue on dwarfstd.org[1]: it defines a new operation (DW_OP_implicit_pointer) which can be used to get the address of an object from the corresponding DIE. While it was created to deal with "optimized out" objects (that obviously cannot be referenced with a real pointer), it fits more what we would need to do here than DW_OP_push_object_address. Using it would look more like a hack to me, however... [1] http://dwarfstd.org/ShowIssue.php?issue=100831.1 -- Pierre-Marie de Rodat ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
Re: [Dwarf-Discuss] About self-referencial sized types
Paul, Thank you for your answer. On 05/14/2014 09:48 PM, Robinson, Paul wrote: struct record_type { int n; int a[1 .. max(n, 0)]; }; I had to do something like this for a COBOL compiler once, except it was simply [1 .. N] and so I had the upper bound be a reference to the member DIE for N. If you're computing an expression on N then yes it's more complicated. Agreed. On the other hand, getting the address of the "A" member would not be sufficient: in more complex cases, the offset of the "A" member can depend on discriminants! Does that mean the offset between "A" and "N" is not constant? You'd have to produce a sub-expression to compute that offset... This sounds complicated but not infeasible. When I started to work on this matter, I thought about the following Ada type declaration: type Record_Type2 (N : Integer) is record A1 : Array_Type (1 .. N); A2 : Array_Type (1 .. N); end record; The corresponding memory layout would be as follows: struct record_type2 { int n; int a[1 .. max(n, 0)]; int b[1 .. max(n, 0)]; }; In this case indeed, the offset between the "n" field and the "b" one is not constant, and I can't find a way to compute it from the DW_AT_upper_bound attribute of the DW_TAG_subrange_type DIE corresponding to "b". The point is that this offset depends on the "n" field and if we had its value, we would not need to compute this offset in the first place. -- Pierre-Marie de Rodat ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
Re: [Dwarf-Discuss] About self-referencial sized types
Sanimir, On 05/15/2014 04:22 PM, Agovic, Sanimir wrote: For this particular case you might do something like: DW_TAG_subrange_type: DW_AT_upper_bound: DW_OP_push_object_address; DW_OP_lit4; DW_OP_minus; DW_OP_deref_size 4; Of course it is very limiting, please see this as FYI only. It illustrates how to use DW_OP_push_object_address. Absolutely. I that was my first try in GCC, before I thought about the nasty example with two array fields. An alternative is DW_TAG_subrange_type: DW_AT_upper_bound: DW_OP_call_ref "R"; DW_OP_deref_size 4; The first operation evaluates the DW_AT_location attribute of the referenced DIE. We end up of having the value of n, thus the upper bound. But I`m not sure if I misuse the op here, as it requires a type to be bound/instantiated for a particular variable. Oh, smart: I missed this operation. I just had a look at the DWARFv4 standard, however: These operations transfer control of DWARF expression evaluation to the DW_AT_location attribute of the referenced debugging information entry. If there is no such attribute, then there is no effect. In our case, the "Record_type" DW_TAG_structure type does not have a DW_AT_location attribute, so regardless how DW_AT_push_object_address migth be evaluated in such cases, the DW_OP_call_ref operation would have no effect. The way you implement the record above is quite interesting. It requires member offsets to be computed rather than being constant. Indeed. I only had a look at how it was implemented by GNAT, though: it's working this way for ages. ;-) By the way, variable record field offsets are not an issue in DWARF, even for our example: the evaluation of the DWARF expression in DW_AT_data_member_location starts with a stack that contains the address of the record, so it's easy to compute a field offset (for instance a2) using the discriminants values (here: n). struct record_type { int n; int a1[1]; // array_type with constant member offset int a2[1]; // ... }; In this scenario you have constant offsets, upper_bound for a1,a2 could look like: DW_OP_push_object_address; DW_OP_deref_size 4 The actual location of a1, a2 could be described via DW_AT_data_location. The attribute allows indirection to the actual payload. I hope the examples above make sense to you. I'm not sure what you meant: how could a1 and a2 still be arrays with dynamic bounds? (assuming they have constant member offsets) I don't know if this is what you had in mind, but I thought you were suggesting to use the whole record as a descriptor for each array. That would be, in the DWARF info: 1. setting the DW_AT_member_data_location attribute of the a1 and a2 structure members to an "identity" expression (i.e. DW_OP_plus_uconst: 0); 2. setting the array type DIEs' DW_AT_data_location attribute to add the field offset (which sometimes depend on discriminants, but that isn't an issue anymore since we have the address of the record); 3. likewise for the subrange type DIEs' DW_AT_{lower,upper}_bound attributes. ... It would work pretty well, actually! I'm not sure if this would really be the way to go: - it looks like DWARF hacking: I'm not sure that the record can be properly considered as an array descriptor - it may require to change a bunch of things in the compiler, but I guess this is not an issue if know we will generate quality DWARF info. :-) Btw, gcc accepts vla as members of structures as an extension see ['1] and Tom pointed out ['2] in a source comment: | [...] GCC does have an extension that allows a VLA in the middle of a | structure, but the DWARF it emits is relatively useless | to us, so we can't represent such a type properly -- This might be related to your current work. ['1] https://gcc.gnu.org/onlinedocs/gcc/Variable-Length.html ['2] https://sourceware.org/ml/gdb-patches/2014-05/msg00097.html Interesting. This is still about arrays whose size depends on local variables (or function arguments, which is the same to me), though. So crafting DWARF expressions for the DW_AT_{lower,uppper}_bound attributes looks reasonably easy to me: a sequence of regular register/stack operations and computations on them should be sufficient. -- Pierre-Marie de Rodat ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
Re: [Dwarf-Discuss] About self-referencial sized types
On 05/15/2014 04:30 PM, Jakub Jelinek wrote: Nothing: actually, the internal GCC representation for Ada arrays bounds (GENERIC trees) is already there and works well for quite a while. My problem here is that I do not know what DWARF operations to output in the DW_AT_{lower,upper}_bound attributes in order to retrieve array "neighbors" members so that we can compute the array bounds using them. If you are talking about GCC infrastructure here, look what Fortran uses for it's VLAs, most likely this is just a matter of implementing LANG_HOOKS_GET_ARRAY_DESCR_INFO for Ada for the cases where the bounds live somewhere in some descriptor. Yes, I've actually already started to work with this lang-hook so we can master the DWARF information output for Ada array types (very useful!). However, it does not solve the issue of knowing what DWARF operations to output in order to compute the bounds of VLAs *without* descriptors. (see the end of my 05/14/2014 mail) -- Pierre-Marie de Rodat ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
Re: [Dwarf-Discuss] About self-referencial sized types
On 05/15/2014 06:42 PM, Jakub Jelinek wrote: Yes, I've actually already started to work with this lang-hook so we can master the DWARF information output for Ada array types (very useful!). However, it does not solve the issue of knowing what DWARF operations to output in order to compute the bounds of VLAs *without* descriptors. (see the end of my 05/14/2014 mail) You still build some trees in that langhook that describe the bounds etc. and dwarf2out.c just transforms those trees into DWARF4 expression opcodes. Absolutely. In my use case, dwarf2out.c currently doesn't know how to translate trees from the Ada frontend into DWARF opcodes. I started this thread in order to know how it should do so. -- Pierre-Marie de Rodat ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
Re: [Dwarf-Discuss] About self-referencial sized types
On 05/16/2014 03:00 PM, Agovic, Sanimir wrote: Indeed, therefore you have to reference a DW_TAG_variable. But this introduces a type/variable dependency. So a record type maps to a single variable, you end up with a 1:1 relation for this kind of types. The size of the debug information is already an issue, so I guess such 1:1 relations would make things worse. ;-) Unfortunately gdb only allows constant offsets or constant dwarf expressions. [...] ... It would work pretty well, actually! I'm not sure if this would really be the way to go: It is indeed quite hackish and we should rather add the necessary bits to gdb. Agreed. The problem is that the member offset depend on runtime information similar to sizeof which needs to be evaluated at runtime if the operand is a vla. Given the following snippet: struct foo {int a[n], int b[i];}; &((struct foo *)0)->b; What value do you expect here? And should the value be different if it is evaluated at runtime e.g. &f.b - &f.a? I cannot tell for C, but the corresponding expression is supposed to raise a Constraint_Error exception in Ada (this is how all null pointer deferences are supposed to behave). -- Pierre-Marie de Rodat ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
Re: [Dwarf-Discuss] About self-referencial sized types
Todd, On 05/27/2014 06:16 PM, Todd Allen wrote: Sorry for the last response on this. But perhaps an example from our Ada compiler will help you. Well, thank you very much for this! Anyway, the significant thing is that we're not referencing "n" directly, but rather the stored values for the upper bound. If you store the result of max(0,n) somewhere, you could reference it directly. For the case of general expression upper bounds, you'd have to store the values somewhere (freeze them) in case they might produce different results on subsequent evaluations. But I don't know if the max(0,n) might be a special case where you don't do that. That's interesting. Unfortunately, we do not store the value of the upper bounds in the record (nor anywhere else): the only way to get it at runtime is to compute it from the discriminants. The discriminants are the only runtime arguments that can determine the array bounds, so there is no risk that the bound expressions would produce different results sometimes. -- Pierre-Marie de Rodat ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
Re: [Dwarf-Discuss] About self-referencial sized types
Hi, For the record, here is a status update on this issue: On 06/10/2014 11:30 AM, Pierre-Marie de Rodat wrote: Unfortunately, we do not store the value of the upper bounds in the record (nor anywhere else): the only way to get it at runtime is to compute it from the discriminants. After more investigation, we eventually realized that defining the lower and upper bounds directly as discriminant values (and thus as references to DW_TAG_member DIEs) is actually perfectly fine for the debugger. It is also more correct from the point of view of the Ada standard: getting the bounds at runtime (thanks to the Array_Object'First and 'Last constructs) must yield the discriminants. The original issue may still be there for other languages, but since it's not legal to write the following in Ada: type Record_Type (N : Integer) is record S : String (1 .. N + 1); end record; Bounds are precisely discriminants or aren't based on them at all. So the current DWARF specification is expressive enough to describe Ada arrays bounds in type info. Many thanks to everyone who participated in this interesting discussion! :-) -- Pierre-Marie de Rodat ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
Re: [Dwarf-discuss] Representing vtables in DWARF for downcasting
Hello, On Wed, May 7, 2025 at 2:49 PM Todd Allen via Dwarf-discuss wrote: > In 250506.2, the use of a rnglist is throwing me. I would expect the > lifetime of a vtable to be the whole program. Or did you envision the > rnglist to be the range of data/rodata addresses of the vtable object? 2.17 > clarifies that they're code addresses (i.e. text), though. > > We did have a discussion sometime in the last year about describing > data/rodata address ranges, but that was in .debug_aranges (RIP). And, IIRC, > no actual compiler was generating data/rodata address there either. If it helps the design: there are languages where vtables are not necessarily statically allocated. Here is a small Ada example, involving a tagged type (equivalent to a C++ class) nested in a procedure, and with a primitive (C++ method) that actually has up-level references to the procedure locals (so the vtable is actually tied to the current stack frame): 1 with Ada.Text_IO; use Ada.Text_IO; 2 3 procedure Main is 4 Msg : constant String := "Hello world"; 5 6 package Pkg is 7type T is tagged null record; 8procedure Print (Self : T); 9 end Pkg; 10 11 package body Pkg is 12procedure Print (Self : T) is 13begin 14 Put_Line (Msg); 15end Print; 16 end Pkg; 17 18 Object : Pkg.T; 19 begin 20 Object.Print; 21 end Main; GDB allows us to observe where the vtable for T is stored (tested on a x86_64-linux machine): $ gdb ./main (gdb) b main.adb:20 Breakpoint 1 at 0x6ae9: file main.adb, line 20. (gdb) r […] Breakpoint 1, main () at main.adb:20 20 Object.Print; (gdb) set lang c Warning: the current language does not match this frame. (gdb) print object $1 = {_tag = 0x7fffda30} (gdb) p $rsp $2 = (void *) 0x7fffd7d0 (gdb) p $rbp $3 = (void *) 0x7fffda70 “_tag” is an artificial component for the record T that GCC (currently) generates in the debug info to materialize the vtable: it points to a structure that is in the current stack frame (between $rsp and $rbp). -- Pierre-Marie de Rodat -- Dwarf-discuss mailing list Dwarf-discuss@lists.dwarfstd.org https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss
Re: [Dwarf-discuss] Representing vtables in DWARF for downcasting
On Wed, May 7, 2025 at 4:11 PM Todd Allen wrote: > I think that's orthogonal to the point, which was that a rnglist is > meant to describe a pc range, not a data range. Got it; I did not get this was your point, I was mainly reacting to the part of your message “I would expect the lifetime of a vtable to be the whole program.”. > BTW, I don't think you need to create the vtable on the fly in that > case. The Pkg.Print function will need a static link, but the vtable > doesn't need to encode that. I just checked our Ada compiler. We > stopped development on this compiler after Ada 95, but the only change I > had to make to your example to make it Ada 95-friendly, was the call to > Print: Pkg.Print(Object). (They were avoiding the Object.Function > syntax in Ada 95, but I assume they relented and added some syntactic > sugar in Ada 2005 or 2012.) Our compiler does indeed generate a static > vtable. Interesting: this indeed works if Main.Pkg.Print calls take a static link, but how can the caller find the static link to pass in the general case? This is obvious in my previous example (the call happens in the same scope that owns the static link), but thanks to type derivation, calls to Main.Pkg.Print can actually appear in other places. For instance: package Base is type T is abstract tagged null record; procedure Print (Self : T) is abstract; procedure Call_Print (Self : T'Class); function Get_Msg return String; end Base; package body Base is procedure Call_Print (Self : T'Class) is begin Print (Self); end Call_Print; function Get_Msg return String is begin return "Hello world"; end Get_Msg; end Base; with Ada.Text_IO; use Ada.Text_IO; with Base; procedure Main is Msg : constant String := Base.Get_Msg; package Pkg is type T is new Base.T with null record; overriding procedure Print (Self : T); end Pkg; package body Pkg is overriding procedure Print (Self : T) is begin Put_Line (Msg); end Print; end Pkg; Object : Pkg.T; begin Base.Call_Print (Object); end Main; Since Main.Pkg.Print overrides a library-level primitive, it can’t take a static link so the only way for it to have access to the Main.Msg local is through Self. I guess a compiler could decide to put the static link in each Main.Pkg.T object rather than in their vtable (and thus have a static vtable), but as far as I can tell, GNAT stores the static link in the vtable instead, so the vtable cannot be static. > But if GNAT is generating the vtable at run-time, possibly even on the > stack (maybe there's some other, more compelling, reason?), then we need > to make sure the proposal isn't assuming a static location. Agreed. -- Pierre-Marie de Rodat -- Dwarf-discuss mailing list Dwarf-discuss@lists.dwarfstd.org https://lists.dwarfstd.org/mailman/listinfo/dwarf-discuss