NoQ added a comment.
Had a look at the actual code, came up with some comments, most are fairly
minor, so good job! You did a good job explaining the overall idea, but a lot
of specifics were difficult to understand, so i made a few comments on how to
make the code a bit easier to read.
> I know, and I tried to look for ways to split this checker into smaller
> logical chunks, but I didn't find a satisfying solution.
What i would have done:
1. A checker that straightforwardly iterates through immediate fields of the
current object and reports uninitialized fields.
2. Add base classes support.
3. Heuristic for skipping sub-constructors could use a separate review.
4. Start skipping arrays and unions.
5. Add simple pointer dereference support - introduce chains of fields.
6. More heuristics - the one for symbolic regions, the one for system header
fields, etc.
etc.
I'm asking for this because it's, like, actually difficult and painful to
understand formal information in large chunks. One-solution-at-a-time would
have been so much easier. It is really a bad idea to start by presenting a
working solution; the great idea is to present a small non-working solution
with a reasonable idea behind it, and then extend it "in place" as much as it
seems necessary. It is very comfortable when parts of the patch with different
"status" (main ideas, corner-case fixes, heuristics, refactoring) stay in
separate patches. Alpha checkers are essentially branches: because our svn
doesn't have branches, we have alpha checkers and off-by-default flags. Even
before step 1, you should be totally fine with committing an empty checker with
a single empty callback and a header comment - even on such "step 0." we would
be able to discuss if your approach to the checker seems right or might have
inevitable issues, and it could have saved a lot of time.
Essentially every part of the solution that you implement, if you think it
might deserve a separate discussion, also deserves a separate patch. It is
reasonable to split the work into logical chunks before you start it. It's very
unlikely that you have a single idea that takes 500 non-trivial lines of code
to express. Splitting up ideas so that they were easy to grasp is an invaluable
effort. I had very positive experience with that recently, both as a coder and
as a reviewer, so i encourage that a lot.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:47
+class FieldChainInfo {
+ using FieldChain = llvm::SmallVector<const FieldRegion *, 10>;
+
----------------
At a glance it looks like a great use case for an immutable list. It's easy to
use the immutable list as a stack, and it's also O(1) to copy/move and take
snapshots of it.
Note that you copy 10 pointers every time you copy //or move// a `SmallVector`.
Small vectors don't make much sense as fields of actively used data structures
- their main purpose is to live on the stack as local variables.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:51-53
+ /// If this is a fieldchain whose last element is an uninitialized region of
a
+ /// pointer type, this variable will store whether the pointer itself or the
+ /// pointee is uninitialized.
----------------
Please move this doc to the getter/setter function. That's where people expect
to find it when they're reading code.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:58-59
+ FieldChainInfo() = default;
+ /// Delegates to the copy ctor and adds FR to Chain. Note that Chain cannot
be
+ /// modified in a FieldChainInfo object after its construction.
+ FieldChainInfo(const FieldChainInfo &Other, const FieldRegion *FR);
----------------
Do we really need the other field (`IsDereferenced`) to be mutable? Or maybe we
can keep the whole object immutable? I.e., instead of `dereference()` we'll get
a constructor that constructs a `FieldChainInfo` with the same chain but with
the dereferenced flag set.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:86-89
+ StoreManager &StoreMgr;
+ MemRegionManager &MrMgr;
+ SourceManager &SrcMgr;
+ Store S;
----------------
All of these can be replaced with a single `ProgramStateRef` object.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:93
+ const bool IsPedantic;
+ bool IsChecked = false;
+ bool IsAnyFieldInitialized = false;
----------------
This field is worth documenting.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:156-157
+ //
+ // We'll traverse each node of the above graph with the appropiate one of
+ // these methods:
+ bool isUnionUninit(const TypedValueRegion *R);
----------------
Please explain what these methods do in a more direct manner. Something like
"This method returns true if an uninitialized field was found within the
region. It should only be used with regions of union-type objects".
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:170-171
+
+/// Returns the with the object that was constructed by CtorDecl, or None if
+/// that isn't possible.
+Optional<nonloc::LazyCompoundVal>
----------------
Something's missing before `with`.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:176-177
+/// Checks whether the constructor under checking is called by another
+/// constructor. This avoids essentially the same error being reported multiple
+/// times.
+bool isCalledByConstructor(const CheckerContext &Context);
----------------
The second sentence would look great at the call site.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:181-182
+/// Returns whether FD can be (transitively) dereferenced to a void pointer
type
+/// (void*, void**, ...). The type of the region behind a void pointer isn't
+/// known, and thus FD can not be analyzed.
+bool isVoidPointer(const FieldDecl *FD);
----------------
Type of an object pointed to by a void pointer is something that our
path-sensitive engine is sometimes able to provide. It is not uncommon that a
void pointer points to a concrete variable on the current path, and we may know
it in the analyzer. I'm not sure this sort of check is necessary (i.e. require
more information).
You may also want to have a look at `getDynamicTypeInfo()` which is sometimes
capable of retrieving the dynamic type of the object pointed to by a pointer or
a reference - i.e. even if it's different from the pointee type of the pointer.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:185
+
+} // namespace
+
----------------
We're traditionally writing "end anonymous namespace" or "end of anonymous
namespace".
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:223-225
+ ExplodedNode *Node = Context.generateNonFatalErrorNode(Context.getState());
+ if (!Node)
+ return;
----------------
I suspect that a fatal error is better here. We don't want the user to receive
duplicate report from other checkers that catch uninitialized values; just one
report is enough.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:233-236
+ llvm::Twine WarningMsg = llvm::Twine(UninitFields.size()) +
+ llvm::Twine(" uninitialized field") +
+ llvm::Twine(UninitFields.size() == 1 ? "" : "s") +
+ llvm::Twine(" at the end of the constructor call");
----------------
I believe that you only need the first `Twine`; then all `operator+` calls will
be resolved to `operator+(Twine, ...)` automatically. Or just use a string and
a stream like pretty much all other checkers.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:244
+ SmallString<200> FieldChainBuf;
+ FieldChain.toString(FieldChainBuf);
+ SmallString<200> NoteBuf;
----------------
This is the third approach to concatenating strings in the last 10 lines of
code. I think you should make a method `FieldChainInfo::print(llvm::raw_ostream
&)` and use streams everywhere.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:258-260
+ Report->addNote(NoteBuf,
+ PathDiagnosticLocation::create(FieldChain.getEndOfChain(),
+
Context.getSourceManager()));
----------------
Aha, ok, got it, so you're putting the warning at the end of the constructor
and squeezing all uninitialized fields into a single warning by presenting them
as notes.
Because for now notes aren't supported everywhere (and aren't always supported
really well), i guess it'd be necessary to at least provide an option to make
note-less reports before the checker is enabled by default everywhere. In this
case notes contain critical pieces of information and as such cannot be really
discarded.
I'm not sure that notes are providing a better user experience here than simply
saying that "field x.y->z is never initialized". With notes, the user will have
to scroll around (at least in scan-build) to find which field is uninitialized.
Notes will probably also not decrease the number of reports too much because in
most cases there will be just one uninitialized field. Though i agree that when
there is more than one report, the user will be likely to deal with all fields
at once rather than separately.
So it's a bit wonky.
We might want to enable one mode or another in the checker depending on the
analyzer output flag. Probably in the driver (`RenderAnalyzerOptions()`).
It'd be a good idea to land the checker into alpha first and fix this in a
follow-up patch because this review is already overweight.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:292
+ if (!IsChecked)
+ hasUnintializedFields();
+ return UninitFields;
----------------
Why is checking done lazily in the first place? Are there many situations in
which we want to construct the object but never do the actual scan?
If it was actually necessary, i would have split out the side effect of this
getter into a separate private method.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:346
+ // If this is a pointer or reference type.
+ if (V.getAs<Loc>()) {
+ if (isPointerOrReferenceUninit(FR, {LocalChain, FR}))
----------------
Please check the type explicitly, and then assert that the value is a `Loc`.
There are a lot of nuances that you don't want to be dealing with here (eg.
`ObjCObjectPointerType`, which can be present in C++ as well because
Objective-C++ is a thing, is not a `PointerType` or a `ReferenceType`, but it
is modeled with a `Loc`).
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:352
+
+ // At this point the field is a fundamental type.
+ if (isFundamentalUninit(V)) {
----------------
Please formalize what you mean by a fundamental type and assert that. You
probably mean something like "an integer or a floating-point number or maybe an
enum or, well, bool".
I suspect that many cases here are missed (complex types, vector types, ...).
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:359-372
+ // Checking bases.
+ const auto *CXXRD = dyn_cast<CXXRecordDecl>(RD);
+ if (!CXXRD)
+ return ContainsUninitField;
+
+ for (const CXXBaseSpecifier BaseSpec : CXXRD->bases()) {
+ const auto *Base = BaseSpec.getType()->getAsCXXRecordDecl();
----------------
Are there many warnings that will be caused by this but won't be caused by
conducting the check for the constructor-within-constructor that's currently
disabled? Not sure - is it even syntactically possible to not initialize the
base class?
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:384
+bool FindUninitializedFields::isPointerOrReferenceUninit(
+ const FieldRegion *FR, FieldChainInfo LocalChain) {
+
----------------
Please assert here that `FR` is a pointer/reference-type field.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:386
+
+ SVal V = StoreMgr.getBinding(S, loc::MemRegionVal(FR));
+
----------------
As i mentioned above, you can simply store a `ProgramStateRef` within
`FindUninitializedFields`. In this case this code will be simplified to
`State->getSVal(FR)`.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:405-406
+ SVal DerefdV = StoreMgr.getBinding(S, V.castAs<Loc>());
+ while (Optional<Loc> L = DerefdV.getAs<Loc>())
+ DerefdV = StoreMgr.getBinding(S, *L);
+
----------------
Is this loop guaranteed to terminate? Is something like `void *v; v = &v;`
possible?
I think this loop should unwrap the AST type on every iteration and dereference
the pointer accordingly.
In general, i believe that every symbolic pointer dereference should be made
with awareness of the type of the object we're trying to retrieve. Which is an
optional argument for `State->getSVal(R, ...)`, but it seems that it should be
made mandatory because in a lot of cases omitting it causes problems.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:417
+ // constructors for list-like classes are checked without being called, and
+ // the CSA will conjure a symbolic region for Node *next; or similar code
+ // snippets.
----------------
CSA is not a commonly used acronym around the codebase.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:417
+ // constructors for list-like classes are checked without being called, and
+ // the CSA will conjure a symbolic region for Node *next; or similar code
+ // snippets.
----------------
NoQ wrote:
> CSA is not a commonly used acronym around the codebase.
I'd like to change the word "conjure" to something else because in most cases
it won't be an actual `SymbolConjured`, but it'd be a `SymbolRegionValue` or
`SymbolDerived`. Conjuring occurs in different circumstances.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:430
+ if (T->isUnionType()) {
+ // TODO: does control ever reach here?
+ if (isUnionUninit(R)) {
----------------
Add `llvm_unreachable("")` to find out :)
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:432
+ if (isUnionUninit(R)) {
+ LocalChain.dereference();
+ return addFieldToUninits(std::move(LocalChain));
----------------
Needs a comment on why the `IsDereferenced` flag on this path.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:480
+
+// TODO: This function constructs an incorrect fieldchain string in the
+// following case:
----------------
Can we at least detect these situations and suppress the warning? It should be
relatively easy for lambdas (captured variable mapping should be stored
somewhere, as far as i remember), not sure how easy it'd be for double
inheritance (there are fancy methods for scanning this stuff, but that's
equivalent to actually implementing the correct warning message).
Weird warning messages are pretty scary to show to the users.
It'd be a good idea to land the checker into alpha first and fix this in a
follow-up patch because this review is already overweight.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:559-560
+ while (LC) {
+ if (isa<CXXConstructorDecl>(LC->getDecl()))
+ return true;
+
----------------
This doesn't check that the constructor on the parent stack frame is anyhow
related to the current constructor, so it may introduce false negatives. For
instance, a class may lazy-initialize a singleton but never store a reference
to it within itself (because, well, it's a singleton, you can always obtain the
reference to it). In this case we'll never find the bug in the constructor of
the singleton.
================
Comment at: test/Analysis/cxx-uninitialized-object.cpp:931
+// FIXME: As of writing this checker, there is no good support for union types
+// in the CSA. Here is non-exhaustive list of cases.
+// Note that the rules for unions are different in C++ and C.
----------------
CSA is not a commonly used acronym around the codebase.
https://reviews.llvm.org/D45532
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits