jansvoboda11 marked 2 inline comments as done. jansvoboda11 added a comment.
In D114966#3168108 <https://reviews.llvm.org/D114966#3168108>, @dexonsmith wrote: > Thanks for working on this; seems like a great start. Thanks a lot for the extensive feedback! I'll work through it and create prep patches where sensible. It seems like things can be simplified quite a bit. > At a high-level: > > - We should check overhead. It'd be good to benchmark scanning LLVM with > `clang-scan-deps` before and after this change. In my testing, this patch causes ~20% increase in memory usage. > - The locking is getting harder to track, since the acquisition and release > are disconnected. I'd rather use a pattern that kept this simple. I agree. I think I'll explore the direction you suggested in one of your inline comments. > - Identified a few pre-existing issues that might be exacerbated by (and/or > harder to fix after) this refactor. That makes sense. ================ Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:73 +/// The value type for original contents cache. +using OriginalContents = Contents; + ---------------- dexonsmith wrote: > This should be the `std::unique_ptr<MemoryBuffer>` from disk. There's no > reason to `memcpy` it into a new allocation. Fixed in new prep-patch: D115043. ================ Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:131-132 + llvm::StringMap<SharedStat, llvm::BumpPtrAllocator> StatCache; + /// Cache of original file contents. Using \c std::map to ensure references + /// are not invalidated on insertion. + std::map<llvm::sys::fs::UniqueID, OriginalContents> OriginalContentsCache; ---------------- dexonsmith wrote: > You don't need the heavyweight std::map for reference validation. You can > just use a `DenseMap<KeyT, std::unique_ptr<ValueT>>`. That's pretty expensive > due to allocation traffic, but it's still cheaper than a `std::map`. > > But you can also avoid the allocation traffic by using a BumpPtrAllocator, > the same pattern as the StringMap above. E.g.: > ``` > lang=c++ > llvm::SpecificBumpPtrAllocator<OriginalContents> OriginalContentsAlloc; > llvm::DenseMap<llvm::sys::fs::UniqueID, OriginalContents *> > OriginalContentsCache; > > // insert into shard: > OriginalContents &getOriginalContentContainer(...) { > std::scoped_lock<std::mutex> L(CacheMutex); > OriginalContents *&OC = OriginalContents[UID]; > if (!OC) > OC = new (OriginalContentsAlloc) OriginalContents; > return *OC; > } > > // get original content: > StringRef getOriginalContentBuffer(...) { > OriginalContents &OC = getOriginalContentContainer(...); > if (OC.IsInitialized) > return OC->Content->getBuffer(); > > // Could put this after the lock I guess... > std::unique_ptr<MemoryBuffer> Content = readFile(...); > > // check IsInitialized again after locking in case there's a race > std::scoped_lock<std::mutex> L(SharedStat.Mutex); > if (OC->IsInitialized) > return OC->Content->getBuffer(); > > OC->Content = std::move(Content); > OC->IsInitialized = true; > return OC->Content->getBuffer(); > } > ``` > Same pattern for minimized content cache. Since the locks are only held > briefly there's no need to pass them around and lose clarity about how long > it's open. Also, IIRC, `std::unique_lock` is more expensive than > `std::scoped_lock` (but my memory could be faulty). > > I didn't think of using `SpecificBumpPtrAllocator` this way, seems really neat, thanks for the suggestion! ================ Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:131-132 + llvm::StringMap<SharedStat, llvm::BumpPtrAllocator> StatCache; + /// Cache of original file contents. Using \c std::map to ensure references + /// are not invalidated on insertion. + std::map<llvm::sys::fs::UniqueID, OriginalContents> OriginalContentsCache; ---------------- jansvoboda11 wrote: > dexonsmith wrote: > > You don't need the heavyweight std::map for reference validation. You can > > just use a `DenseMap<KeyT, std::unique_ptr<ValueT>>`. That's pretty > > expensive due to allocation traffic, but it's still cheaper than a > > `std::map`. > > > > But you can also avoid the allocation traffic by using a BumpPtrAllocator, > > the same pattern as the StringMap above. E.g.: > > ``` > > lang=c++ > > llvm::SpecificBumpPtrAllocator<OriginalContents> OriginalContentsAlloc; > > llvm::DenseMap<llvm::sys::fs::UniqueID, OriginalContents *> > > OriginalContentsCache; > > > > // insert into shard: > > OriginalContents &getOriginalContentContainer(...) { > > std::scoped_lock<std::mutex> L(CacheMutex); > > OriginalContents *&OC = OriginalContents[UID]; > > if (!OC) > > OC = new (OriginalContentsAlloc) OriginalContents; > > return *OC; > > } > > > > // get original content: > > StringRef getOriginalContentBuffer(...) { > > OriginalContents &OC = getOriginalContentContainer(...); > > if (OC.IsInitialized) > > return OC->Content->getBuffer(); > > > > // Could put this after the lock I guess... > > std::unique_ptr<MemoryBuffer> Content = readFile(...); > > > > // check IsInitialized again after locking in case there's a race > > std::scoped_lock<std::mutex> L(SharedStat.Mutex); > > if (OC->IsInitialized) > > return OC->Content->getBuffer(); > > > > OC->Content = std::move(Content); > > OC->IsInitialized = true; > > return OC->Content->getBuffer(); > > } > > ``` > > Same pattern for minimized content cache. Since the locks are only held > > briefly there's no need to pass them around and lose clarity about how long > > it's open. Also, IIRC, `std::unique_lock` is more expensive than > > `std::scoped_lock` (but my memory could be faulty). > > > > > I didn't think of using `SpecificBumpPtrAllocator` this way, seems really > neat, thanks for the suggestion! Yeah, `scoped_lock` should be cheaper, I'll create a prep-patch for that. ================ Comment at: clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:133-135 + std::map<llvm::sys::fs::UniqueID, OriginalContents> OriginalContentsCache; + /// Cache of minimized file contents. + std::map<llvm::sys::fs::UniqueID, MinimizedContents> MinimizedContentsCache; ---------------- dexonsmith wrote: > I wonder if these should really be separate. > > Seems better to have something like: > ``` > lang=c++ > struct SharedContent { > // Even better: unique_atomic_ptr<MemoryBuffer>, to enable lock-free > access/updates. > atomic<bool> HasOriginal; > std::unique_ptr<MemoryBuffer> Original; > > // Even better: std::atomic<MinimizedContent *>, with the latter > bumpptrallocated, to > // enable lock-free access/updates. > atomic<bool> HasMinimized; > SmallString<0> Minimized; // would be nice to bumpptrallocate this string... > PreprocessorSkippedRangeMapping PPSkippedRangeMapping; > }; > SpecificBumpPtrAllocator<SharedCachedContent> ContentAlloc; > DenseMap<llvm::sys::fs::UniqueID, SharedCachedContent *> ContentCache; > ``` > With that in place, seems like the `SharedStat` can have > `std::atomic<SharedContent *>`, which caches the result of the UID lookup. > This way the UID `DenseMap` lookup is once per stat name, saving reducing > contention on the per-shard lock. > > Then in the local cache, the only map storage would be: > ``` > llvm::StringMap<SharedStat *, llvm::BumpPtrAllocator> LocalStatCache; > ``` > No need to duplicate the UID-keyed caches, since the lookups there would set > the pointer for the SharedContent. I really like idea of keeping a pointer to `SharedContent` in `SharedStat` and avoiding locking & lookup in the content caches. Merging original and minimized contents would probably simplify things quite a bit as well. ================ Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:249 + + ensureLocked(Filename, SharedStat, Lock); ---------------- dexonsmith wrote: > I don't love the lack of clarity between when the lock is taken and when it's > released caused by this being an out parameter. I don't have a specific > suggestion, but maybe there's another way to factor the code overall? Fair point, I'll try to simplify this. ================ Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:253-255 + if (SharedOriginalContents.WasInserted) + SharedStat->Stat = + readFile(Filename, getUnderlyingFS(), SharedOriginalContents.Value); ---------------- dexonsmith wrote: > Calling `readFile()` behind a lock doesn't seem great. I did confirm that the > original code seems to do the same thing (lock outside of > `createFilesystemEntry`), but this refactor seems to bake the pattern into a > few more places. > > When races aren't very likely it's usually cheaper to: > - lock to check cache, returning cached result if so > - without a lock, compute result > - lock to set cache, but if the cache has been filled in the meantime by > another thread, return that and throw out the just-computed one > > Maybe it'd be useful to add: > ``` > std::atomic<bool> IsInitialized; > ``` > to the MinimizedContents and OriginalContents structures stored in the shared > cache. This could make it easier to decouple insertion in the shared cache > from initialization. I.e., it'd be safe to release the lock while doing work; > another thread won't think the default-constructed contents are correct. Could you expand on this a bit more? If we have a lock for each file, how is locking, reading, unlocking slower than locking, unlocking, reading, locking, unlocking? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D114966/new/ https://reviews.llvm.org/D114966 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits