jansvoboda11 marked 2 inline comments as done.
jansvoboda11 added a comment.

In D114966#3168108 <https://reviews.llvm.org/D114966#3168108>, @dexonsmith 
wrote:

> Thanks for working on this; seems like a great start.

Thanks a lot for the extensive feedback! I'll work through it and create prep 
patches where sensible. It seems like things can be simplified quite a bit.

> At a high-level:
>
> - We should check overhead. It'd be good to benchmark scanning LLVM with 
> `clang-scan-deps` before and after this change.

In my testing, this patch causes ~20% increase in memory usage.

> - The locking is getting harder to track, since the acquisition and release 
> are disconnected. I'd rather use a pattern that kept this simple.

I agree. I think I'll explore the direction you suggested in one of your inline 
comments.

> - Identified a few pre-existing issues that might be exacerbated by (and/or 
> harder to fix after) this refactor.

That makes sense.



================
Comment at: 
clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:73
+/// The value type for original contents cache.
+using OriginalContents = Contents;
+
----------------
dexonsmith wrote:
> This should be the `std::unique_ptr<MemoryBuffer>` from disk. There's no 
> reason to `memcpy` it into a new allocation.
Fixed in new prep-patch: D115043.


================
Comment at: 
clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:131-132
+    llvm::StringMap<SharedStat, llvm::BumpPtrAllocator> StatCache;
+    /// Cache of original file contents. Using \c std::map to ensure references
+    /// are not invalidated on insertion. 
+    std::map<llvm::sys::fs::UniqueID, OriginalContents> OriginalContentsCache;
----------------
dexonsmith wrote:
> You don't need the heavyweight std::map for reference validation. You can 
> just use a `DenseMap<KeyT, std::unique_ptr<ValueT>>`. That's pretty expensive 
> due to allocation traffic, but it's still cheaper than a `std::map`.
> 
> But you can also avoid the allocation traffic by using a BumpPtrAllocator, 
> the same pattern as the StringMap above. E.g.:
> ```
> lang=c++
> llvm::SpecificBumpPtrAllocator<OriginalContents> OriginalContentsAlloc;
> llvm::DenseMap<llvm::sys::fs::UniqueID, OriginalContents *> 
> OriginalContentsCache;
> 
> // insert into shard:
> OriginalContents &getOriginalContentContainer(...) {
>   std::scoped_lock<std::mutex> L(CacheMutex);
>   OriginalContents *&OC = OriginalContents[UID];
>   if (!OC)
>     OC = new (OriginalContentsAlloc) OriginalContents;
>   return *OC;
> }
> 
> // get original content:
> StringRef getOriginalContentBuffer(...) {
>   OriginalContents &OC = getOriginalContentContainer(...);
>   if (OC.IsInitialized)
>     return OC->Content->getBuffer();
> 
>   // Could put this after the lock I guess...
>   std::unique_ptr<MemoryBuffer> Content = readFile(...);
> 
>   // check IsInitialized again after locking in case there's a race
>   std::scoped_lock<std::mutex> L(SharedStat.Mutex);
>   if (OC->IsInitialized)
>     return OC->Content->getBuffer();
> 
>   OC->Content = std::move(Content);
>   OC->IsInitialized = true;
>   return OC->Content->getBuffer();
> }
> ```
> Same pattern for minimized content cache. Since the locks are only held 
> briefly there's no need to pass them around and lose clarity about how long 
> it's open. Also, IIRC, `std::unique_lock` is more expensive than 
> `std::scoped_lock` (but my memory could be faulty).
> 
> 
I didn't think of using `SpecificBumpPtrAllocator` this way, seems really neat, 
thanks for the suggestion!


================
Comment at: 
clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:131-132
+    llvm::StringMap<SharedStat, llvm::BumpPtrAllocator> StatCache;
+    /// Cache of original file contents. Using \c std::map to ensure references
+    /// are not invalidated on insertion. 
+    std::map<llvm::sys::fs::UniqueID, OriginalContents> OriginalContentsCache;
----------------
jansvoboda11 wrote:
> dexonsmith wrote:
> > You don't need the heavyweight std::map for reference validation. You can 
> > just use a `DenseMap<KeyT, std::unique_ptr<ValueT>>`. That's pretty 
> > expensive due to allocation traffic, but it's still cheaper than a 
> > `std::map`.
> > 
> > But you can also avoid the allocation traffic by using a BumpPtrAllocator, 
> > the same pattern as the StringMap above. E.g.:
> > ```
> > lang=c++
> > llvm::SpecificBumpPtrAllocator<OriginalContents> OriginalContentsAlloc;
> > llvm::DenseMap<llvm::sys::fs::UniqueID, OriginalContents *> 
> > OriginalContentsCache;
> > 
> > // insert into shard:
> > OriginalContents &getOriginalContentContainer(...) {
> >   std::scoped_lock<std::mutex> L(CacheMutex);
> >   OriginalContents *&OC = OriginalContents[UID];
> >   if (!OC)
> >     OC = new (OriginalContentsAlloc) OriginalContents;
> >   return *OC;
> > }
> > 
> > // get original content:
> > StringRef getOriginalContentBuffer(...) {
> >   OriginalContents &OC = getOriginalContentContainer(...);
> >   if (OC.IsInitialized)
> >     return OC->Content->getBuffer();
> > 
> >   // Could put this after the lock I guess...
> >   std::unique_ptr<MemoryBuffer> Content = readFile(...);
> > 
> >   // check IsInitialized again after locking in case there's a race
> >   std::scoped_lock<std::mutex> L(SharedStat.Mutex);
> >   if (OC->IsInitialized)
> >     return OC->Content->getBuffer();
> > 
> >   OC->Content = std::move(Content);
> >   OC->IsInitialized = true;
> >   return OC->Content->getBuffer();
> > }
> > ```
> > Same pattern for minimized content cache. Since the locks are only held 
> > briefly there's no need to pass them around and lose clarity about how long 
> > it's open. Also, IIRC, `std::unique_lock` is more expensive than 
> > `std::scoped_lock` (but my memory could be faulty).
> > 
> > 
> I didn't think of using `SpecificBumpPtrAllocator` this way, seems really 
> neat, thanks for the suggestion!
Yeah, `scoped_lock` should be cheaper, I'll create a prep-patch for that.


================
Comment at: 
clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:133-135
+    std::map<llvm::sys::fs::UniqueID, OriginalContents> OriginalContentsCache;
+    /// Cache of minimized file contents.
+    std::map<llvm::sys::fs::UniqueID, MinimizedContents> 
MinimizedContentsCache;
----------------
dexonsmith wrote:
> I wonder if these should really be separate.
> 
> Seems better to have something like:
> ```
> lang=c++
> struct SharedContent {
>   // Even better: unique_atomic_ptr<MemoryBuffer>, to enable lock-free 
> access/updates.
>   atomic<bool> HasOriginal;
>   std::unique_ptr<MemoryBuffer> Original; 
> 
>   // Even better: std::atomic<MinimizedContent *>, with the latter 
> bumpptrallocated, to
>   // enable lock-free access/updates.
>   atomic<bool> HasMinimized;
>   SmallString<0> Minimized; // would be nice to bumpptrallocate this string...
>   PreprocessorSkippedRangeMapping PPSkippedRangeMapping;
> };
> SpecificBumpPtrAllocator<SharedCachedContent> ContentAlloc;
> DenseMap<llvm::sys::fs::UniqueID, SharedCachedContent *> ContentCache;
> ```
> With that in place, seems like the `SharedStat` can have 
> `std::atomic<SharedContent *>`, which caches the result of the UID lookup. 
> This way the UID `DenseMap` lookup is once per stat name, saving reducing 
> contention on the per-shard lock.
> 
> Then in the local cache, the only map storage would be:
> ```
> llvm::StringMap<SharedStat *, llvm::BumpPtrAllocator> LocalStatCache;
> ```
> No need to duplicate the UID-keyed caches, since the lookups there would set 
> the pointer for the SharedContent.
I really like idea of keeping a pointer to `SharedContent` in `SharedStat` and 
avoiding locking & lookup in the content caches. Merging original and minimized 
contents would probably simplify things quite a bit as well.


================
Comment at: 
clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:249
+
+  ensureLocked(Filename, SharedStat, Lock);
 
----------------
dexonsmith wrote:
> I don't love the lack of clarity between when the lock is taken and when it's 
> released caused by this being an out parameter. I don't have a specific 
> suggestion, but maybe there's another way to factor the code overall?
Fair point, I'll try to simplify this.


================
Comment at: 
clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:253-255
+  if (SharedOriginalContents.WasInserted)
+    SharedStat->Stat =
+        readFile(Filename, getUnderlyingFS(), SharedOriginalContents.Value);
----------------
dexonsmith wrote:
> Calling `readFile()` behind a lock doesn't seem great. I did confirm that the 
> original code seems to do the same thing (lock outside of 
> `createFilesystemEntry`), but this refactor seems to bake the pattern into a 
> few more places.
> 
> When races aren't very likely it's usually cheaper to:
> - lock to check cache, returning cached result if so
> - without a lock, compute result
> - lock to set cache, but if the cache has been filled in the meantime by 
> another thread, return that and throw out the just-computed one
> 
> Maybe it'd be useful to add:
> ```
> std::atomic<bool> IsInitialized;
> ```
> to the MinimizedContents and OriginalContents structures stored in the shared 
> cache. This could make it easier to decouple insertion in the shared cache 
> from initialization. I.e., it'd be safe to release the lock while doing work; 
> another thread won't think the default-constructed contents are correct.
Could you expand on this a bit more? If we have a lock for each file, how is 
locking, reading, unlocking slower than locking, unlocking, reading, locking, 
unlocking?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114966/new/

https://reviews.llvm.org/D114966

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to