SFN stands for Statement Frontier Notes, a technique I specified several years ago and presented at the GCC Summit to allow for finer-grained location information, so as to enable debug information consumers to compute/observe multiple views of an underlying program's state at the same program counter. More specifically, it would enable debuggers to single-step over source code lines even though no actual instructions were generated between the recommended breakpoints for those lines. For some more details about the idea, please have a look at the paper and the slides of the presentation at http://people.redhat.com/aoliva/papers/sfn/
TL;DR: each state in the source program's execution in the target language's virtual machine amounts to a view of the program state. The idea is to assign identifiers to (some) such views, and to add those view identifiers to the line number table and to location lists. This message is about how to do accomplish that in a compact and backward-compatible way. Although we could have view identifiers global within a CU, if we took the PC as part of the view identifier, we'd just need a new column to discriminate different views associated with the same PC. Depending on details to be discussed below, we might need an explicit increment operation, and an explicit reset-to-zero operation. I don't intend to fundamentally change the format of location lists, so I'm thinking of an indication that a location list is augmented by view identifiers. In the paper, I suggested view pairs corresponding to address pairs to follow the location list (i.e., we'd have L0b, L0e, <DWexpr0>, L1b, L1e, <DWexpr1>, ..., Lnb, Lne, <DWexprn>, 0, 0, V0b, V0e, V1b, V1e, ..., Vnb, Vne) but I'm now reconsidering that detail, for two reasons: (i) a location list entry is occasionally misinterpreted as the end of the list, if it happens to be an empty range at the base address for the list, and (ii) other extensions to location lists might make sense, and this arrangement would stop more than one such extension to follow the location list. So I'm now inclined to use an offset attribute, rather than a boolean attribute, to indicate the presence of views to augment a location list. A leb128 offset from the location list address to the view list address would suggest them to be placed close to each other, but without mandating any specific relative placement, at a reasonably small cost: probably one or two bytes at the user DIE of a typical location list. (A radical departure from current location lists would be to introduce another location list type with view identifier ranges rather than address ranges. That would suggest using global identifiers within a CU, perhaps even an implicit counter that identifiers each explicitly specified line in the line number table. View identifiers would then map to a PC, and look up in location lists would be somewhat indirect. That might work, but such location lists would be unusable by existing consumers, so I'm not inclined to explore this possibility any further.) One of the challenges is to enable either the compiler or the assembler to generate line number programs, while only the compiler can generate location lists (and view lists). Consider this: the compiler can't always know whether two labels are at the same address, if they are separated by alignment padding that turns out to be empty (even between different sections!), or by other pseudo-instructions or asm statements that don't advance the PC. So, if we were to mandate any opcodes that change the PC to reset the view counter, and those that don't to increment the view counter after adding a line to the table, we could end up with out-of-sync view numbers in location lists, because the compiler could guess wrong whether the view counter was reset, and it has to fill in the view numbers itself. This suggests that any opcode that advances the PC by an offset that could be computed by the assembler should NOT reset the view counter, whereas any one that requires the compiler to know the exact offset on its own could do so if the offset is nonzero. Unless we're speaking of VLIW: must the compiler be able to distinguish between operation advances within the same instruction address, and those that change the address? I'm thinking view numbers should advance rather than reset when we advance to another operation within the same instruction pack (i.e., without an address change), but I'm not sure compilers must always keep track of that. I suspect so, given all the other complexities of VLIW, but I'm still a bit concerned about getting compiler and DWARF view numbers out of sync if the compiler advances one operation expecting us to remain at the same address with a higher view number, but the next operation happens to imply a different address, in which case the view counter in the line table would get implicitly reset. Thoughts? Should we even worry about this, considering that line number tables can be handled by assemblers, and then the compiler wouldn't have to worry about any of this? Indeed, once we start using the assembler to deal with view counts and addresses, things get a lot simpler. We could still have view counts handled entirely implicitly, and have the compiler refer to view numbers of labels in augmented location lists, to denote the view assigned by the assembler (or computed by the assembler given the implicit calculations performed as part of the line number program). (I've considered the possibility of having the compiler explicitly supply view numbers to the assembler in .loc directives, to then use them explicitly in location lists, but this seems to make little sense; the only case in which it might be sensible would be to go back to a PC for which we've already emitted line number table entries and reset the view counter so that it doesn't overlap with already-emitted view numbers. I don't see that we might ever have to do this: if we're going back to a PC, even if we've already emitted line entries for it, it must have been as the end of a sequence, with a just-reset view count, so starting over at view number zero, at a different sequence, won't/shouldn't be a problem: it was used as one-past-the-end-of-a-range before, and it's used as the beginning-of-a-range now. Am I missing anything?) As for how to represent view numbers in augmented location lists... We could emit them as a sequence of uleb128-encoded view numbers and be done with it. However, we could make them even more compact if we assumed that we won't have very many views at the same PC very often. Say, we could allow a pair of view numbers to be encoded in a single uleb128 octet, shifting left the second view count by four, the first view count left by one, and setting the LSB to indicate this number encodes a pair of view numbers whose first element fits in 3 bits. If it doesn't, then we just shift it left by one, leaving the LSB reset, and output that amount as uleb128. Now, does this approach make sense, or am I overdoing it? To sum it up, here's the design that I'm leaning towards in a smallish picture: Source program: 1 int f(int a, int b, int c, int d) { 2 int x = a + b; 3 int y = c * d; 4 x -= y; 5 return x; 6 } Optimized asm, output by the compiler: .Ltext: [...] f: .LVU0: .loc 1 1 is_stmt 0 # view 0 mov r4 <- *(sp+12) mov r5 <- *(sp+16) mov r2 <- *(sp+4) mov r3 <- *(sp+8) .LVU1: .loc 1 3 is_stmt 0 # view 0 mul r6 <- r4, r5 .LVU2: .loc 1 2 is_stmt 1 # view 0 add r7 <- r2, r3 .LVU3: .loc 1 3 is_stmt 1 # view 0 .LVU4: .loc 1 4 is_stmt 1 # view 1 sub r1 <- r7, r6 .LVU5: .loc 1 5 is_stmt 1 # view 0 ret .LFE0: [...] .uleb128 <?> # DW_TAG_variable .ascii "x\0" # DW_AT_name x .byte 1 # DW_AT_decl_file .byte 2 # DW_AT_decl_line .long ?? # DW_AT_type .long .LLST0 # DW_AT_location .leb128 .LVST0 - .LLST0 # DW_AT_locviews .uleb128 <?> # DW_TAG_variable .ascii "y\0" # DW_AT_name y .byte 1 # DW_AT_decl_file .byte 3 # DW_AT_decl_line .long ?? # DW_AT_type .long .LLST1 # DW_AT_location .leb128 .LVST1 - .LLST1 # DW_AT_locviews [...] .LVST0: # it could be right before the corresponding LLST .view .LVU3, .LVU5, .LVU5, .LVU6 .LLST0: .long .LVU3 - .Ltext, .LVU5 - .Ltext .byte ... # DW_OP_reg7 .long .LVU5 - .Ltext, .LVU6 - .Ltext .byte ... # DW_OP_reg1 .long 0, 0 .LLST1: .long .LVU4 - .Ltext, .LVU6 - .Ltext .byte ... # DW_OP_reg6 .long 0, 0 .LVST1: # or it could be right after the corresponding LLST (or anywhere) .view .LVU4, .LVU6 Line number program generated by the assembler: [Line = 1, is_stmt = 0] XOp2: set PC to <.LVU0> (resets View) Copy (View++ = and then tentatively increment View for subsequent use) Spec: advance PC by <.LVU1-.LVU0> (resets View) and Line by 2 (to 3) (View++) Negate is_stmt (to 1) Spec: advance PC by <.LVU2-.LVU1> (resets View) and Line by-1 (to 2) (View++) Spec: advance PC by <.LVU3-.LVU2> (resets View) and Line by-1 (to 3) (View++) Spec: advance PC by <.LVU4-.LVU3> (View is 1) and Line by 1 (to 4) (View++) Spec: advance PC by <.LVU5-.LVU4> (resets View) and Line by 1 (to 5) (View++) Advance PC by <.LFE0-.LVU5> (resets View) (*) XOp1: End of Sequence (*) this is DW_LNS_advance_pc, not DW_LNS_fixed_advance_pc; the latter, to be used by a compiler dealing with view computations internally in situations of uncertainty about whether the offset is zero, would NOT reset View. Compact view encodings: .LVST0: .uleb128 (0<<4|0<<1|1), (0<<4|0<<1|1) .LVST1: .uleb128 (0<<4|1<<1|1) Can anyone spot any problems with this proposal, particularly WRT the fully implicit handling of view numbers in line number programs and their use by compilers? Is this (view numbers in line number tables, location list augmentation with view numbers referenced by a new attribute, the compact encoding of view numbers) something that DWARF might want to adopt in a future standard (presumably not version 5)? Are there any amendments that are deemed necessary right away? Thanks in advance, -- Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer _______________________________________________ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org