location list
Hi all, I am trying to parse a location list given as an sec_offset. How do I get this offset value that points to .debug_loc so I can call dwarf_getlocations()? Should I pass this offset as the second parameter of this call? Sasha
Re: location list
Well, I have been trying to use the Dwarf_Attribute. If you see the snippet below, locationAttribute is acquired by doing dwarf_attr(&e, DW_AT_location, &locationAttribute); and &e is the DW_TAG_variable DIE. Dwarf_Op * exprs = NULL; size_t exprlen = 0; std::vector locDescs; ptrdiff_t offset = 0; Dwarf_Addr basep, start, end; do { offset = dwarf_getlocations(&locationAttribute, offset, &basep, &start, &end, &exprs, &exprlen); if(offset==-1) return false; if(offset==0) break; LocDesc ld; ld.ld_lopc = start; ld.ld_hipc = end; ld.dwarfOp = exprs; ld.opLen = exprlen; locDescs.push_back(ld); }while(offset > 0); But what happens here is I always get the very first entry in .debug_loc. Where clearly for this variable, the location list (sec_offset) is at [4a] of that section. Maybe I am using the offset or the basep wrongly? Sasha From: Mark Wielaard Sent: Tuesday, June 2, 2020 12:19 PM To: Sasha Da Rocha Pinheiro ; elfutils-devel@sourceware.org Subject: Re: location list Hi, On Tue, 2020-06-02 at 14:18 +, Sasha Da Rocha Pinheiro wrote: > I am trying to parse a location list given as an sec_offset. > How do I get this offset value that points to .debug_loc so I can > call dwarf_getlocations()? > Should I pass this offset as the second parameter of this call? Normally an offset isn't enough information to resolve an DIE attribute reference. So if at all possible you should try to use an Dwarf_Attribute you got from some DIE. It isn't really supported, but you could try creating a "fake" attribute that carries all information. e.g. Dwarf_Attribute loc_attr; loc_attr.code = DW_AT_location; loc_attr.form = DW_FORM_data4; /* Assuming 32bit DWARF. */ loc_attr.valp = &offset; /* Your offset should be a 32bit type. */ loc_attr.cu = cu; dwarf_getlocations (&loc_attr, offset, ...); Note that the CU needs to be version 3 or lower for the above to work. If the CU is > 3 then the only correct form to use is DW_FORM_sec_offset, and your valp should point to a uleb128 encoded offset value. But in general I would not recommend this approach. It isn't really supported. And some code might do sanity checks on the valp pointer and decide it looks bogus and just error out. Also you have to have a valid Dwarf_CU pointer around because you cannot create a valid fake CU easily. So try to keep a reference (or copy) around of the Dwarf_Attribute from which you got this offset and use that for your dwarf_getlocations call. Cheers, Mark
Re: location list
As you can see the following variables have distinct locations: [81] variable abbrev: 5 name (string) "a" decl_file(data1) sasha.c (1) decl_line(data1) 12 type (ref4) [cd] location (sec_offset) location list [ 0] [9f]variable abbrev: 5 name (string) "g" decl_file(data1) sasha.c (1) decl_line(data1) 15 type (ref4) [cd] location (sec_offset) location list [4a] [bd] variable abbrev: 5 name (string) "z" decl_file(data1) sasha.c (1) decl_line(data1) 16 type (ref4) [cd] location (sec_offset) location list [6e] But when I use the code I sent before to list the three variables, I always get: [main01.cpp:73] - Variable and location found (a), size(1). [main01.cpp:78] - interval: (0x0,0x5) [main01.cpp:78] - interval: (0x5,0xa) [main01.cpp:78] - interval: (0x16,0x24) [main01.cpp:73] - Variable and location found (g), size(1). [main01.cpp:78] - interval: (0x0,0x5) [main01.cpp:78] - interval: (0x5,0xa) [main01.cpp:78] - interval: (0x16,0x24) [main01.cpp:73] - Variable and location found (z), size(1). [main01.cpp:78] - interval: (0x0,0x5) [main01.cpp:78] - interval: (0x5,0xa) [main01.cpp:78] - interval: (0x16,0x24) No matter the locationAttribute the code always get the first location descriptors in .debug_loc: DWARF section [ 7] '.debug_loc' at offset 0x1c6: CU [ b] base: .text+00 [ 0] range 0, 5 .text+00 .. .text+0x0004 [ 0] lit0 [ 1] stack_value range 5, a .text+0x0005 .. .text+0x0009 [ 0] reg1 range 16, 24 .text+0x0016 .. .text+0x0023 [ 0] reg1 [4a] range 0, 5 .text+00 .. .text+0x0004 [ 0] lit0 [ 1] stack_value [6e] range 5, a .text+0x0005 .. .text+0x0009 [ 0] lit0 [ 1] stack_value range a, e .text+0x000a .. .text+0x000d [ 0] const4u 65537 [ 5] breg0 0 [ 7] minus [ 8] stack_value Sasha From: Sasha Da Rocha Pinheiro Sent: Tuesday, June 2, 2020 1:12 PM To: Mark Wielaard ; elfutils-devel@sourceware.org Subject: Re: location list Well, I have been trying to use the Dwarf_Attribute. If you see the snippet below, locationAttribute is acquired by doing dwarf_attr(&e, DW_AT_location, &locationAttribute); and &e is the DW_TAG_variable DIE. Dwarf_Op * exprs = NULL; size_t exprlen = 0; std::vector locDescs; ptrdiff_t offset = 0; Dwarf_Addr basep, start, end; do { offset = dwarf_getlocations(&locationAttribute, offset, &basep, &start, &end, &exprs, &exprlen); if(offset==-1) return false; if(offset==0) break; LocDesc ld; ld.ld_lopc = start; ld.ld_hipc = end; ld.dwarfOp = exprs; ld.opLen = exprlen; locDescs.push_back(ld); }while(offset > 0); But what happens here is I always get the very first entry in .debug_loc. Where clearly for this variable, the location list (sec_offset) is at [ 4a] of that section. Maybe I am using the offset or the basep wrongly? Sasha From: Mark Wielaard Sent: Tuesday, June 2, 2020 12:19 PM To: Sasha Da Rocha Pinheiro ; elfutils-devel@sourceware.org Subject: Re: location list Hi, On Tue, 2020-06-02 at 14:18 +, Sasha Da Rocha Pinheiro wrote: > I am trying to parse a location list given as an sec_offset. > How do I get this offset value that points to .debug_loc so I can > call dwarf_getlocations()? > Should I pass this offset as the second parameter of this call? Normally an offset isn't enough information to resolve an DIE attribute reference. So if at all possible you should try to use an Dwarf_Attribute you got from some DIE. It isn't really supported, but you could try creating a "fake" attribute that carries all information. e.g. Dwarf_Attribute loc_attr; loc_attr.code = DW_AT_location; loc_attr.form = DW_FORM_data4; /* Assuming 32bit DWARF. */ loc_attr.valp = &offset; /* Your offset should be a 32bit type. */ loc_attr.cu = cu; dwarf_getlocations (&loc_attr, offset, ...); Note that the CU needs to be version 3 or lower for the above to work. If th
Re: location list
Hi Mark, first of all, thanks for giving me a direction here. I am now trying to design the changes needed to be done in Dyninst. So far we have only used the functions dwarf_* under libdw. What I understood is that libdw is kinda divided in subsets of functions, dwarf_*, dwfl_* and dwelf_*. I didn't find any documentation about it, or the purpose of these subset of functions. (Whats fl in dwfl for?) But my understanding is that I can't use data structures from one on the other one. That alone will need some design to modify the way we parse dwarf info into Dyninst. Currently the lifetime of a dwarf handle lasts through one execution, because we parse dwarf data when the user needs it. Can you point me to more documentation here or schedule a call so I can get a more clear view of this? Regards, Sasha From: Mark Wielaard Sent: Saturday, June 6, 2020 9:05 AM To: Sasha Da Rocha Pinheiro ; elfutils-devel@sourceware.org Subject: Re: location list Hi Sasha, On Sat, 2020-06-06 at 00:30 +, Sasha Da Rocha Pinheiro wrote: > As you can see the following variables have distinct locations: > [ 81] variable abbrev: 5 > name (string) "a" > decl_file (data1) sasha.c (1) > decl_line (data1) 12 > type (ref4) [ cd] > location (sec_offset) location list > [ 0] > [ 9f] variable abbrev: 5 > name (string) "g" > decl_file (data1) sasha.c (1) > decl_line (data1) 15 > type (ref4) [ cd] > location (sec_offset) location list > [ 4a] > [ bd] variable abbrev: 5 > name (string) "z" > decl_file (data1) sasha.c (1) > decl_line (data1) 16 > type (ref4) [ cd] > location (sec_offset) location list > [ 6e] > > But when I use the code I sent before to list the three variables, I > always get: > > [main01.cpp:73] - Variable and location found (a), size(1). > [main01.cpp:78] - interval: (0x0,0x5) > [main01.cpp:78] - interval: (0x5,0xa) > [main01.cpp:78] - interval: (0x16,0x24) > [main01.cpp:73] - Variable and location found (g), size(1). > [main01.cpp:78] - interval: (0x0,0x5) > [main01.cpp:78] - interval: (0x5,0xa) > [main01.cpp:78] - interval: (0x16,0x24) > [main01.cpp:73] - Variable and location found (z), size(1). > [main01.cpp:78] - interval: (0x0,0x5) > [main01.cpp:78] - interval: (0x5,0xa) > [main01.cpp:78] - interval: (0x16,0x24) > > > No matter the locationAttribute the code always get the first > location descriptors in .debug_loc: > > DWARF section [ 7] '.debug_loc' at offset 0x1c6: > > CU [ b] base: .text+00 > [ 0] range 0, 5 > .text+00 .. > .text+0x0004 > [ 0] lit0 > [ 1] stack_value > range 5, a > .text+0x0005 .. > .text+0x0009 > [ 0] reg1 > range 16, 24 > .text+0x0016 .. > .text+0x0023 > [ 0] reg1 > [ 4a] range 0, 5 > .text+00 .. > .text+0x0004 > [ 0] lit0 > [ 1] stack_value > [ 6e] range 5, a > .text+0x0005 .. > .text+0x0009 > [ 0] lit0 > [ 1] stack_value > range a, e > .text+0x000a .. > .text+0x000d > [ 0] const4u 65537 > [ 5] breg0 0 > [ 7] minus > [ 8] stack_value I think I see what is happening. The fact that is at .text+00 suggests that this is actually an ET_REL file (not linked object file). The libdw dwarf_xxx calls don't do relocations. But eu-readelf does. So while eu-readelf shows some offsets as their relocated values, your program just using dwarf_xxx calls does not. Specifically the DW_AT_location list attributes will all point to zero. Which explains why every location list seems to be the same. We don't have a public function to just apply all relocations to an object file, but opening the file through dwfl_begin () will do it. Something like the attached. Hope that helps, Mark
Re: location list
Hi Mark, this was very useful. Thanks. Since we are now using not only executables and .so, but ".o" files too, I'm trying to decide if I can use the same functions to all of them, like the code you pointed out to deal with ".o". Would that work for EXEC, SHARED, and RELOC? The idea is not to have two codes to parse modules and DIEs, two ways because as you pointed out ".o" files need some relocation to be performed, therefore using dwfl_*. Meanwhile for executables and .so we only use dwarf_* functions. In face of that, do you foresee bigger changes or things we should worry that we would have in case we use only dwfl_* to open all the ELF files with dwarf data, and drop the way we used to open them? Because our code base for a long time has only used the dwarf_* functions, this would be a big change. Sasha From: Mark Wielaard Sent: Wednesday, June 10, 2020 6:33 AM To: Sasha Da Rocha Pinheiro ; elfutils-devel@sourceware.org Subject: Re: location list Hi Sasha, On Tue, 2020-06-09 at 16:38 +, Sasha Da Rocha Pinheiro via Elfutils-devel wrote: > I am now trying to design the changes needed to be done in Dyninst. > So far we have only used the functions dwarf_* under libdw. > What I understood is that libdw is kinda divided in subsets of functions, > dwarf_*, dwfl_* and dwelf_*. > I didn't find any documentation about it, or the purpose of these subset of > functions. > (Whats fl in dwfl for?) > But my understanding is that I can't use data structures from one on the > other one. > That alone will need some design to modify the way we parse dwarf info into > Dyninst. > Currently the lifetime of a dwarf handle lasts through one execution, > because we parse dwarf data when the user needs it. So elfutils contains 4 libraries. libelf, which is a semi-standardize "unix" library to read and manipulate ELF files. libdw, which adds reading of DWARF data, linux process and kernel mappings, and various elf/dwarf utility functions. libasm, which provides a assembler and disassembler interface, but which isn't really finished/recommended at the moment (it only provides a partial x86 assembler/disassembler and a bpf disassembler). And libdebuginfod, which provides a way to fetch remotely stored executables, debuginfo and sources based on build-ids (from a debuginfod server). There used to be non-public, internal, "libebl" backend libraries, for each elfutils supported architecture (libebl_aarch64.so, libebl_riscv.so, etc.) which were loaded dynamically to safe a bit of memory in case the backend/arch wasn't used. But with 0.178 the libraries are build into libdw.so directly and no longer dynamically loaded. libebl was never intended to be used directly. [lib]ebl stands for ELF Backend Library. [lib]dw is short for DWARF. [lib]dwfl then can be read as DWARF Frontend library functions. And [lib]dwelf are the DWARF and ELF utility functions. The main data structure of libelf is the Elf handle which can be used to go through an ELF through sections (Shdrs) or program (Phdrs) headers. The main data structure that the libdw dwarf_* functions work on is the Dwarf handle, which is associated with one Elf handle. The main data structure of the libdwfl dwfl_* functions is the Dwfl handle. A Dwfl represents a program (or kernel) with library (or kernel modules) memory lay out. Each Dwfl_Module represents a piece of executable code mapped at a certain memory range. The Dwfl uses buildids to associate/create Elf images and Dwarf handles associated with each Dwfl_Module (it can optionally use libdebuginfod to download/cache any it doesn't have yet). Since kernel modules are ET_REL file (non-relocated object files), libdwfl also resolves any relocations between .debug_sections (this is the property we abused in the example code I gave you, where we construct a Dwfl from a single ET_REL object file). Given a Dwfl_Module you can get the associated Elf or Dwarf with dwfl_module_getelf or dwfl_module_getdwarf. You will note that those functions also provide a Dwarf_Addr bias which might be non- zero if the address range where the Dwfl_Module is mapped is different (at an offset) from the addresses found in the Elf image or Dwarf data. You would use the libdwfl functions if you want to represent a whole program as it would be mapped into memory (or the kernel and its modules). It is convenient if you got a process map (dwfl_linux_proc_report) or core file (dwfl_core_file_report). The libdwfl functions would automatically associate an Elf image and find the Dwarf data for you. It is even nice to use for "single file" programs like we did in the example with the single file because it does the automatic lookup of the Dwarf handle, and because, if the file is an ET_REL object, you get the relocation between .debug sections for free. It might make sense to provide utility fun
multi debug files and artificial module
Hi, we are currently dealing with multiple separate debug files, the normal stripped ones put in .debug/ folder and now the ones generated by DWZ and put into .dwz/ folder. When loading a normal stripped debug files who has a dwz file, I saw the same DIE (same id) twice with different data. Would it be a bug in DWZ or a correct dwarf state? Also is "" the name of the following compilation unit? Or is it a bug in eu-redealf/libdw? Sasha Thanks Compilation unit at offset 946: Version: 4, Abbreviation section offset: 0, Address size: 8, Offset size: 4 [ 3bd] compile_unit abbrev: 63 producer (strp) "GNU GIMPLE 10.2.1 20200723 (Red Hat 10.2.1-1) -m64 -mtune=generic -march=x86-64 -g -g -O2 -O2 -fno-openmp -fno-openacc -fPIC -fstack-protector-strong -fltrans -fplugin=ann obin" language (data1) C99 (12) name (GNU_strp_alt) "" comp_dir (GNU_strp_alt) "/usr/src/debug/libiscsi-1.19.0-2.fc33.x86_64/lib" low_pc (addr) +0x8030 high_pc (udata) 51811 (+0x00014a93 <.annobin_iscsi_extended_copy_task.end>) stmt_list(sec_offset) 0
Re: multi debug files and artificial module
Hi Mark, Thanks for your response. In libdw.h it says: /* The offset can be computed from the address. */ How do I get the CU DIE offset from the address? Only saving the first CU and subtracting it on the others to get the offset? When we go through the .debug_info using dwarf_nextcu, we are getting partial units too. How should we deal with them? If they're not actually CU, should they not be returned and only used internally by libdw in order to 'complete' the other CUs? It seems that libdw is automatically searching and loading the supplemental dwz file. Given the following: ** file .debug ** <1>: Abbrev Number: 37 (DW_TAG_subprogram) DW_AT_abstract_origin: DW_AT_low_pc : 0x8650 DW_AT_high_pc : 229 DW_AT_frame_base : 1 byte block: 9c (DW_OP_call_frame_cfa) DW_AT_GNU_all_call_sites: 1 DW_AT_sibling : <0x107b> ... <4><5112>: Abbrev Number: 0 ** file dwz ** <1><5112>: Abbrev Number: 81 (DW_TAG_subprogram) <5113> DW_AT_external : 1 <5113> DW_AT_name : (indirect string, offset: 0x9852): iscsi_tcp_free_pdu <5117> DW_AT_decl_file : 51 <5118> DW_AT_decl_line : 142 <5119> DW_AT_decl_column : 1 <511a> DW_AT_prototyped : 1 <511a> DW_AT_sibling : <0x5135> When I'm parsing f0d, libdw automatically fetches the abstract origin on the dwz file. But when we try to parse the abstract origin on our own it gives us the one in the same file, which is 0. Is it because we should look into the form of the DW_AT_abstract_origin? In this case seems to be the GNU_ref_alt, correct? Regards, Sasha From: Mark Wielaard Sent: Wednesday, November 4, 2020 7:35 AM To: Sasha Da Rocha Pinheiro ; elfutils-devel@sourceware.org Cc: Tim Haines ; b...@cs.wisc.edu Subject: Re: multi debug files and artificial module Hi Sasha, On Tue, 2020-11-03 at 21:37 +, Sasha Da Rocha Pinheiro via Elfutils-devel wrote: > we are currently dealing with multiple separate debug files, the > normal stripped ones put in .debug/ folder and now the ones generated > by DWZ and put into .dwz/ folder. > When loading a normal stripped debug files who has a dwz file, I saw > the same DIE (same id) twice with different data. Would it be a bug > in DWZ or a correct dwarf state? > Also is "" the name of the following compilation unit? Or > is it a bug in eu-redealf/libdw? Looking at what you posted you are actually looking at 3 different types of CU DIEs. The "normal" separate .debug DIEs. The supplemental (dwz alt file) DIEs and LTO (gcc -flto generated) DIEs. For the last ones (which have GNU GIMPLE as producer, the internal GCC representation of the program) it is correct to have them marked "artificial", these CUs contain common code/types from the objects combined by LTO (Link Time Optimization). If by "same id" you mean "offset" (the hex value in square brackets) then yes, DIE offsets in separate files (Dwarf objects) can be the same. The DIEs from the .debug file and the DIEs from the .multi (supplemental) file are represented by different Dwarf objects and DIEs with the same offset in separate Dwarf objects are different DIEs. Cheers, Mark > Compilation unit at offset 946: > Version: 4, Abbreviation section offset: 0, Address size: 8, Offset > size: 4 > [ 3bd] compile_unit abbrev: 63 > producer (strp) "GNU GIMPLE 10.2.1 20200723 > (Red Hat 10.2.1-1) -m64 -mtune=generic -march=x86-64 -g -g -O2 -O2 > -fno-openmp -fno-openacc -fPIC -fstack-protector-strong -fltrans > -fplugin=ann > obin" > language (data1) C99 (12) > name (GNU_strp_alt) "" > comp_dir (GNU_strp_alt) > "/usr/src/debug/libiscsi-1.19.0-2.fc33.x86_64/lib" > low_pc (addr) +0x8030 > > high_pc (udata) 51811 (+0x00014a93 > <.annobin_iscsi_extended_copy_task.end>) > stmt_list (sec_offset) 0