[Lldb-commits] [PATCH] D115324: Added the ability to cache the finalized symbol tables subsequent debug sessions to start faster.

Greg Clayton via Phabricator via lldb-commits Wed, 15 Dec 2021 14:04:02 -0800

clayborg added inline comments.


================
Comment at: lldb/source/Core/Module.cpp:1679
+
+std::string Module::GetCacheKey() {
+  std::string key;
----------------
labath wrote:
> The cache key and cache signature algorithms are fairly similar but subtly 
> different (IIUC, one is used to locate the cache entry on the disk and the 
> second one to verify that it refers to the correct file). I think it'd be 
> better to have the two functions next to each other, to highlight the 
> differences. If you moved the signature construction function here (by 
> changing it from a Module constructor to a CacheSignature getter function), 
> then that would also the dependency problem I mentioned last time as I 
> believe these constructors were the only non-utility dependency of that code 
> (I don't think that code has to be moved back to Utility, but it *could* be 
> moved, it need arises).
So the signature only contains an optional UUID, optional mod time, and 
optional object mod time, so they really are not related. The cache key must be 
the same for the same file on disk even if the file gets updated. So the cache 
key is designed to stay the same for the same file on disk, but also allow for 
the same file on disk to have more than one architecture (universal files) or 
for a file to contain multiple files like .a files that contain .o files. Since 
we only include the basename of the path for the module file, we also include 
the hash from Module::Hash() which _does_ include the full file path. That way 
we can have many "a.out" cache files from different directories and the main 
thing that will differ is the hash that is appended at the end. 

The DataCacheFile.cpp was relying on the ModuleList.cpp in order to read the 
"symbols.* preferences via "ModuleList::GetGlobalModuleListProperties()", so 
the ModuleList is actually what was the part that violated the dependency issue.


================
Comment at: lldb/source/Symbol/Symbol.cpp:649
+/// The only tricky thing in this encoding is encoding all of the bits in the
+/// bitfields. We use a trick to store all bitfields as a 16 bit value and we
+/// do the same thing when decoding the symbol. There are test that ensure this
----------------
labath wrote:
> I guess this isn't a "trick" anymore, just manual encoding of boolean 
> bitfields into an int.
It is. I tried doing your approach of making an anonymous union and an 
anonymous struct but I got a bunch of warnings that these were GNU extensions 
and I feared that some buildbots would fail with warnings as errors. I also 
didn't know if this would mess up the windows buildbots in case they didn't 
support these. So to be safe I just manually encode stuff to be safe.


================
Comment at: lldb/source/Symbol/Symtab.cpp:118-147
+        if (!pair.second.IsEmpty()) {
+          std::multimap<uint32_t, ConstString> index_to_names;
+          s->Printf("\n0x%2.2x name to index table:\n", pair.first);
+          for (const auto &entry: pair.second)
+            index_to_names.insert(std::make_pair(entry.value, entry.cstring));
+          std::set<ConstString> names;
+          uint32_t last_idx = UINT32_MAX;
----------------
labath wrote:
> If I managed to understand what this is doing correctly, it could be more 
> simply implemented as:
> ```
> if (pair.second.IsEmpty())
>   continue;
> s->Printf("\n0x%2.2x name to index table:\n", pair.first);
> std::map<uint32_t, std::vector<ConstString>> index_to_names;
> for (const auto &entry: pair.second)
>   index_to_names[entry.value].push_back(entry.cstring); // automatically 
> creates a new map entry for the first string
> for (const auto &pair: index_to_names) {
>   s->Printf("0x%8.8x: ", pair.first);
>   for (const auto &name: pair.second) // vector always non-empty
>     s->Printf("\"%s\" ", name.AsCString());
>   s->EOL();
> }
> ```
Whoops this was just dumping code I added to verify that the name tables were 
being reproduced correctly. No one will want to see these. I will remove the 
code.


================
Comment at: 
lldb/test/API/functionalities/module_cache/universal/TestModuleCacheUniversal.py:21
+        self.cache_dir = os.path.join(self.getBuildDir(), 'lldb-module-cache')
+        print('build-dir: ' + self.getBuildDir())
+        # Set the lldb module cache directory to a directory inside the build
----------------
labath wrote:
> leftover from debugging?
yes! good catch


================
Comment at: 
lldb/test/API/functionalities/module_cache/universal/TestModuleCacheUniversal.py:57-58
+
+        self.assertEqual(len(cache_files), 2,
+                "make sure there are two files in the module cache directory 
(%s) for %s" % (self.cache_dir, exe_basename))
----------------
labath wrote:
> IIUC, the python tests check the caching behavior, while the (c++) unit tests 
> check that the actual contents of the cache entries can be read/written 
> correctly. I like that.
Exactly. I wanted to make sure that encoding could be done with unit testing as 
that is easier. I could also verify that every bit in the Symbol class was 
encoded and decoded properly since I was playing tricks before with the 
bitfields, but it did help issues when I did things manually as well.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115324/new/

https://reviews.llvm.org/D115324

_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [PATCH] D115324: Added the ability to cache the finalized symbol tables subsequent debug sessions to start faster.

Reply via email to