Re: [lldb-dev] Making a new symbol provider

Greg Clayton via lldb-dev Thu, 11 Feb 2016 17:35:55 -0800

> On Feb 11, 2016, at 3:41 PM, Zachary Turner via lldb-dev 
> <lldb-dev@lists.llvm.org> wrote:
> 
> Hi,
> 
> I want to make a new symbol provider to teach LLDB to understand microsoft 
> PDB files.  I've been looking over the various symbol APIs, and I have a few 
> questions.  
> 
> 1. Under what circumstances do I need a custom SymbolVendor?  The way pdb 
> works is that generally there is 1 file that contains all the debug info 
> needed for a single binary (so or executable).  Given a list of paths, we can 
> then determine if there is a matching PDB in one of those paths.  Is it 
> better to do this in the CalculateAbilities() function of the symbol file 
> plugin (by just returning 0 if we don't find a match) or do we need to do 
> something more complicated?


I would suggest make a SymbolVendorPDB that only enables itself if you are able 
to find the PDB files for your COFF file. So look at your COFF file, and I 
presume somewhere in there there is a pointer to one or more PDB files inside 
that file? CalculateAbililties is the correct place to see if a COFF file has 
pointers to PDB files and making sure those files exist before you say that you 
can provide any abilities.

> 
> 2. Why is there a function called ParseCompileUnitLanguage?  The CompileUnit 
> class already stores the language when ParseCompileUnit is called, and 
> ParseCompileUnitLanguage is implemented by just getting that value out.  What 
> is the poitn of this function?

If we are constructing CompileUnit instances with a valid language, we will 
never need to call the ParseCompileUnitLanguage function on 
SymbolVendor/SymbolFile, but it it is eLanguageTypeInvalid, we will lazily 
populate this later.

> 
> 3. There's a function called ParseCompileUnitDebugMacros.  Is this referring 
> to C / C++ macros?  Like #define FOO 7?  What is that used for?  I don't 
> believe info about preprocessor definitions are stored in PDB.  Is this going 
> to cause problems?

Nope, just don't implement. Hopefully there is a default implementation that 
does nothing. We should imply that by having a default implementation for this 
that there is nothing wrong with not filling it in.

> 
> 4. ParseCompileUnitSupportFiles.  What are "support files"?  Given a file 
> "foo.cpp" is this supposed to be header files etc?

This is largely mirroring how DWARF structures its data, but in general a 
compile unit might have files that it uses for line tables and decl file for 
things like variables. 

So any files in your line table should be in here. In DWARF the line tables use 
file indexes in their line tables to save space. Also any DWARF info that says 
"I am declared on line 12 of file 'Foo.c'" will use an index to refer to 
'Foo.c'. We use the compile unit support files for this:

    case DW_AT_decl_file:   
        
decl.SetFile(sc.comp_unit->GetSupportFiles().GetFileSpecAtIndex(file_idx));
        break;

The macro support you mention above also uses file indexes when referring to 
files.

So the support files should be a list of files that make sense to your PDB 
parser in case your PDB uses file indexes when referring to files. Since LLDB 
uses a partial parsing style of debug info, we only expand debug info into 
agnostic LLDB info lazily as the information is needed. All symbol files also 
get to pick their own identifiers for everything. For DWARF, we use the DIE 
offset as the identifier. So say you parse  DWARF that looks like:

0x0000000b: TAG_compile_unit [1] *
             AT_producer( "Apple LLVM version 7.0.0 (clang-700.1.72)" )
             AT_language( DW_LANG_C99 )
             AT_name( "main.c" )
             AT_stmt_list( 0x00000000 )
             AT_comp_dir( "/Volumes/work/gclayton/Documents/src/args" )
             AT_low_pc( 0x0000000100000cf0 )
             AT_high_pc( 0x0000000100000e9b )

0x0000002e:     TAG_subprogram [2] *
                 AT_low_pc( 0x0000000100000cf0 )
                 AT_high_pc( 0x0000000100000e9b )
                 AT_frame_base( rbp )
                 AT_name( "main" )
                 AT_decl_file( "main.c" )
                 AT_decl_line( 9 )
                 AT_prototyped( 0x01 )
                 AT_type( {0x000000c6} ( int ) )
                 AT_external( 0x01 )

0x0000004d:         TAG_formal_parameter [3]  
                     AT_location( fbreg -1048 )
                     AT_name( "argc" )
                     AT_decl_file( "main.c" )
                     AT_decl_line( 9 )
                     AT_type( {0x000000c6} ( int ) )

0x0000005c:         TAG_formal_parameter [3]  
                     AT_location( fbreg -1056 )
                     AT_name( "argv" )
                     AT_decl_file( "main.c" )
                     AT_decl_line( 9 )
                     AT_type( {0x000000cd} ( const char** ) )

The ID of the compile unit is 0x0000000b since that is the DIE offset for the 
compile unit. If we ask the compile unit any questions through the 
lldb_private::CompileUnit, we can always extract the ID from the compile unit 
so we know how to dig up the original DWARF info so we can parse more info 
lazily and only as needed.

Likewise, the TAG_subprogram represents a function. We might parse only the 
function "main" at 0x0000002e, and then later be asked to parse the blocks and 
variables inside of it. If we use 0x0000002e for the ID of the function, we can 
quickly find the DWARF for it and parse its child variables and blocks. 

So be sure to pick identifiers that make sense for PDB. Hopefully this will be 
easy.

> 

> 5. ParseCompileUnitLineTable.  On the LineTable class you can add "line 
> sequences" or individual entries.  What's the difference here?  Is there any 
> disadvantage to adding every single line entry in the line table using the 
> InsertLineEntry instead of building a line sequence and inserting the 
> sequence?

The rule follows DWARF line tables: line sequences must be an array of line 
entries whose addresses are always increasing. You can add every line in 
sequence as long as the line entries are in increasing address order. We are 
going to sort the line entries into an array that is sorted for quick lookups. 

> 
> I will probably have some more questions as I continue down this path.  For 
> now I'm planning to implement the minimum amount of functionality required 
> just to make LLDB locate and open a PDB for an executable without actually 
> returning anything useful from it.  So when I start filling out types, 
> functions, etc I may have some more questions.

I am the person you will need to ask as I implemented everything in the symbols 
so far. If you have any questions, feel free to ask and I will get back to your 
as quickly as I can. If I am not around, you can take a look at the DWARF spec, 
or talk to someone that is familiar with DWARF, and you can probably bet we are 
very similar to DWARF in many respects since it is a very powerful and complete 
format. 

Let me know what questions you have! I look forward to seeing the PDB plug-in 
make it into LLDB.

Greg Clayton

_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Re: [lldb-dev] Making a new symbol provider

Reply via email to