clayborg added a comment.

In D113789#3130741 <https://reviews.llvm.org/D113789#3130741>, @labath wrote:

> Maybe it's good for starting a discussion, but I am surprised that we would 
> need to cache symtab information. This is a fairly simple format that needs 
> to be read at application runtime as well, so I am surprised that reading it 
> from the cache is substantially faster than doing it from the object file. 
> What is the slow part there? Is it demangling by any chance?

Symbol tables for LLDB are made up by scouring every bit of data we can out of 
one or more symbol tables (ELF has normal symbol table and the dynamic symbol 
table, and optionally the symbol table in the zipped debug info). There are 
also many runtime sections like .eh_frame and others that can give us the start 
address of functions. If a binary is stripped, then we might not have a symbol 
for local functions, but many runtime sections have information on where 
functions start.

So while the symbol table seems simple, there is a ton of work that goes into 
creating these. Some things that are expensive:

- parse all symbol tables and unique the results
- reduce duplicate symbols as we parse one or more symbol tables (this happens 
a lot in mach-o files)
- keep a map of symbols that have been emitted so we can skip creating symbols 
from runtime information
- mach-o files have many symbols that describe the same thing: normal symbols 
and debug map symbols, so these symbols need to be coalesced into one symbol so 
we don't end up with multiple symbols with the same name and different info

I will post some performance results on loading a large Mac debug session as 
soon as my release build finishes. I was getting some really good performance 
improvements when I first started this patch as I tried this out before taking 
the time to try and finalize this patch.

> Now to the discussion. Overall, I think this is an interesting feature, but I 
> do have some remarks/questions:
>
> - I don't think that  "lldb module cache" is very good name for this feature 
> as we already have the "platform module cache" feature (see 
> `Target/ModuleCache.h` and its callers), which deals with downloading (and 
> caching) modules (object files) from remote systems. And then there's the 
> clang module cache, of course.. (and the llvm modules, but we fortunately 
> don't cache those). It would be better if we chose a slightly less overloaded 
> name. Maybe "index cache" ? (I assume you will also want to store dwarf 
> indexes there)

yes debug info is the next item I want to put into this cache. "index cache" 
sounds fine. I will rename once I get a few NFC patches submitted and rebase.

> - I don't see anything which would ensure cache consistency for the case 
> where multiple debugger instances try to cache the same module 
> simultaneously. What's the plan for that?

I was hoping to learn LLVM already had this kind of thing and using that! I see 
below there are some things already in LLVM I should be able to use.

> - Have you considered the building upon the caching infrastructure in llvm 
> (`llvm/Caching.h`, `llvm/CachePruning.h`)? I believe it already has some size 
> limiting features and multiprocess synchronization built-in, so using it 
> would take care of the previous item, and of @wallace's size limit request.

I will check into that and see if I can use that! I figured it would have been 
done already, glad to see it has!

> - it's moderately strange to be using the gsym file writer for this purpose

When I made gsym I create the FileWriter as something that can easily create 
binary content without having to go with the ASM print/writer stuff in LLVM as 
that requires tons of stuff including LLVM targets and object file support.

What suggestions do people have on this front? Should I make a patch to move 
FileWriter out of GSYM? I don't know how if llvm::gym::FileWriter is useful 
enough for other folks in LLVM. Any thoughts? Should we just make our own 
FileWriter class that only does what we need for the caching? Is there any 
other tech in LLVM that does this kind of thing without having to use the full 
ARM printer/writer stuff?

> I haven't looked at the code in detail, but I noted that the filesystem 
> changes, and some of the unrelated formatting cleanups could be put into 
> separate patches.

I can break those out into separate patches.

So my plan is to:

- make a separate patch for the symbol table parsing code in Module/ObjectFile
- make a separate patch for FileSystem stuff
- figure out what to do with llvm::gsym::FileWriter (input would be appreciated 
here!)
- rebase this patch on all above patches and try using the llvm caching 
libraries



================
Comment at: lldb/include/lldb/Host/FileSystem.h:93
+  /// Set the access and modification time of the given file.
+  bool SetAccessAndModificationTime(const FileSpec &file_spec,
+                                    llvm::sys::TimePoint<> time);
----------------
wallace wrote:
> this should return an llvm::Error and the caller should decide whether to 
> consume the error or not.
I can return a Status as like many other calls to be more consistent with the 
functions in this directory.


================
Comment at: lldb/source/Core/Module.cpp:1676
+  if (mtime > 0)
+    id_strm << mtime;
+  return llvm::djbHash(id_strm.str());
----------------
Sounds good.


================
Comment at: lldb/source/Core/Module.cpp:1680
+
+llvm::Optional<FileSpec> Module::GetCacheDirectory() {
+  using namespace llvm;
----------------
wallace wrote:
> this function is doing many things and reading is harder. Besides splitting 
> it, we should log in the target stats if the cache was enabled or not, if the 
> cache was used or not in a per module basis and any errors associated. This 
> will help us fix issues without having to wait for users' reports.
> you should also print the error in a log channel
Yes, stats should log if the cache was enabled, but each module should also 
emit stats for anything that was loaded from the cache. If the cache is enabled 
and nothing is in the cache yet, then we should emit stats for anything that 
was loaded from the cache.


================
Comment at: lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp:2722
       m_symtab_up = std::make_unique<Symtab>(symtab->GetObjectFile());
+      if (m_symtab_up->LoadFromCache())
+        return m_symtab_up.get();
----------------
labath wrote:
> Can we make it so that we don't need to add these bits to every object file 
> implementation? Maybe create a protected `CreateSymtab` function, which would 
> get only called if the Symtab really needs to be created?
Yeah, I was thinking about this as well as I was doing it. I will make a patch 
that does this as a NFC patch first and then rebase this patch on that.


================
Comment at: lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp:2723
+      if (m_symtab_up->LoadFromCache())
+        return m_symtab_up.get();
       symbol_id += ParseSymbolTable(m_symtab_up.get(), symbol_id, symtab);
----------------
wallace wrote:
> log here in the target stats if the cache was effectively used or not
Yeah, each object file should remember what/if it loads anything from a cache 
and emit stats. This patch was already getting too big, and I was planning on 
doing a quick follow up patch.


================
Comment at: lldb/source/Symbol/Symbol.cpp:645
+  file.writeU16(m_type_data);
+  file.writeU16((&m_type_data)[1]);
+  m_mangled.Encode(file);
----------------
labath wrote:
> It would be (much) better to put the bitfields in an (anonymous) union that 
> can be accessed both as a uint16_t and as individual bitfields.
Agreed. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113789/new/

https://reviews.llvm.org/D113789

_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

Reply via email to