jansvoboda11 marked 3 inline comments as done.
jansvoboda11 added inline comments.
================
Comment at:
clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h:106-108
+ std::unique_ptr<llvm::MemoryBuffer> OriginalContents;
+ std::unique_ptr<llvm::MemoryBuffer> MinimizedContents;
PreprocessorSkippedRangeMapping PPSkippedRangeMapping;
----------------
dexonsmith wrote:
> jansvoboda11 wrote:
> > dexonsmith wrote:
> > > jansvoboda11 wrote:
> > > > dexonsmith wrote:
> > > > > I'm finding it a bit subtle detecting if there are races on access /
> > > > > setting of these, but I think it's correct.
> > > > > - I think I verified that they are "set once".
> > > > > - All the setters are guarded by locks.
> > > > > - The first getter per "local" cache is guarded by a lock.
> > > > > - Subsequent getters are not.
> > > > >
> > > > > The key question: are the subsequent getters ONLY used when the first
> > > > > getter was successful?
> > > > >
> > > > > One way to make it more obvious:
> > > > > ```
> > > > > lang=c++
> > > > > struct ContentWithPPRanges {
> > > > > std::unique_ptr<llvm::MemoryBuffer> Content;
> > > > > PreprocessorSkippedRangeMapping PPSkippedRangeMapping;
> > > > > };
> > > > >
> > > > > private:
> > > > > // Always accessed,mutated under a lock. Not mutated after they
> > > > > escape.
> > > > > std::unique_ptr<llvm::MemoryBuffer> Original;
> > > > > std::unique_ptr<llvm::MemoryBuffer> Minimized;
> > > > > PreprocessorSkippedRangeMapping PPSkippedRangeMapping;
> > > > >
> > > > > // Used in getters. Pointed-at memory immutable after these are set.
> > > > > std::atomic<const llvm::MemoryBuffer *> OriginalAccess;
> > > > > std::atomic<const llvm::MemoryBuffer *> MinimizedAccess;
> > > > > std::atomic<const PreprocessorSkippedRangeMapping *> PPRangesAccess;
> > > > > ```
> > > > > I don't think this is the only approach though.
> > > > >
> > > > I think there are no races on the original contents. The pointer is
> > > > unconditionally set on creation of `CachedFileSystemEntry` under a lock
> > > > that no threads get past without having set the pointer (or failing and
> > > > never accessing the pointer).
> > > >
> > > > For minimized contents, the latest revision adds check at the beginning
> > > > of the main function (`needsMinimization`) outside the critical
> > > > section. There are three paths I can think of:
> > > > * The check returns `true` in thread A (pointer is `null`), thread A
> > > > enters critical section, minimizes the contents and initializes the
> > > > pointer.
> > > > * The check returns `true` in thread A, but thread B entered the
> > > > critical section, minimized contents and initialized the pointer. When
> > > > thread A enters the critical section, it performs the check again,
> > > > figures that out and skips minimization.
> > > > * The check returns `false` and the local cache entry is returned.
> > > >
> > > > So there definitely is a race here, but I believe it's benign. Do you
> > > > agree? Do you think it's worth addressing?
> > > I don't trust myself to evaluate whether it's benign, but if there's no
> > > atomic mutation, then I think it's possible that when the setter changes
> > > a value from "x" to "y" then the racing reader can see a third value "z".
> > > You might gain some confidence by using `-fsanitize=thread` on a workload
> > > that's going to include this sort of thing -- probably hard to exercise:
> > > PCH already exists, try minimizing something that uses the PCH, and then
> > > try minimizing something that doesn't.
> > >
> > > I'd rather just avoid the race entirely so we don't need to guess though.
> > Interesting...
> >
> > After reading up on this a bit, my understanding is that reads of
> > `MinimizedContents` cannot be torn, because it's pointers-sized and
> > aligned. So we should never see a third value "z". Am I wrong?
> >
> > The potential data race is IMO somewhat independent from the read tearing
> > aspect and is avoided by defensively checking `MinimizedContents` again
> > under lock.
> >
> > To confirm, I ran the following test case with and without thread
> > sanitizer, never seeing data races or incorrect results.
> >
> > {F20978137}
> >
> > I'm happy to use the `std::atomic` pattern you suggested, but I want to be
> > sure I understand why that's necessary.
> Heh, I don't know what can and cannot tear (especially on different
> architectures/etc.), I'm just wary. I'll trust your homework, but please add
> a comment documenting why it's thread-safe to read without atomics/locks.
I believe I got to the bottom of it. The pattern I'm using is called
"double-checked locking" and is considered unsafe without atomics, since
compiler optimizations can break it on some platforms. Adding a memory fence
(`std::atomic`) here should make this safe and portable.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D115346/new/
https://reviews.llvm.org/D115346
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits