Szelethus added a comment.
I'm afraid it'll be even more time before I post a new diff. There are some
things that `ProgramState` is lacking, so I'll get that fixed first in a
different pull request first.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:258-260
+ Report->addNote(NoteBuf,
+ PathDiagnosticLocation::create(FieldChain.getEndOfChain(),
+
Context.getSourceManager()));
----------------
NoQ wrote:
> NoQ wrote:
> > Szelethus wrote:
> > > NoQ wrote:
> > > > Aha, ok, got it, so you're putting the warning at the end of the
> > > > constructor and squeezing all uninitialized fields into a single
> > > > warning by presenting them as notes.
> > > >
> > > > Because for now notes aren't supported everywhere (and aren't always
> > > > supported really well), i guess it'd be necessary to at least provide
> > > > an option to make note-less reports before the checker is enabled by
> > > > default everywhere. In this case notes contain critical pieces of
> > > > information and as such cannot be really discarded.
> > > >
> > > > I'm not sure that notes are providing a better user experience here
> > > > than simply saying that "field x.y->z is never initialized". With
> > > > notes, the user will have to scroll around (at least in scan-build) to
> > > > find which field is uninitialized.
> > > >
> > > > Notes will probably also not decrease the number of reports too much
> > > > because in most cases there will be just one uninitialized field.
> > > > Though i agree that when there is more than one report, the user will
> > > > be likely to deal with all fields at once rather than separately.
> > > >
> > > > So it's a bit wonky.
> > > >
> > > > We might want to enable one mode or another in the checker depending on
> > > > the analyzer output flag. Probably in the driver
> > > > (`RenderAnalyzerOptions()`).
> > > >
> > > > It'd be a good idea to land the checker into alpha first and fix this
> > > > in a follow-up patch because this review is already overweight.
> > > > [...]i guess it'd be necessary to at least provide an option to make
> > > > note-less reports before the checker is enabled by default everywhere.
> > > > In this case notes contain critical pieces of information and as such
> > > > cannot be really discarded.
> > > This can already be achieved with `-analyzer-config
> > > notes-as-events=true`. There no good reason however not to make an
> > > additional flag that would emit warnings instead of notes.
> > > > I'm not sure that notes are providing a better user experience here
> > > > than simply saying that "field x.y->z is never initialized". With
> > > > notes, the user will have to scroll around (at least in scan-build) to
> > > > find which field is uninitialized.
> > > This checker had a previous implementation that emitted a warning for
> > > each uninitialized field, instead of squeezing them in a single warning
> > > per constructor call. One of the major reasons behind rewriting it was
> > > that it emitted so many of them that it was difficult to differentiate
> > > which warning belonged to which constructor call. Also, in the case of
> > > the LLVM/Clang project, the warnings from this checker alone would be in
> > > the thousands.
> > > > Notes will probably also not decrease the number of reports too much
> > > > because in most cases there will be just one uninitialized field.
> > > While this is true to some extent, it's not uncommon that a single
> > > constructor call leaves 7 uninitialized -- in the LLVM/Clang project I
> > > found one that had over 30!
> > > Since its practically impossible to determine whether this was a
> > > performance enhancement or just poor programming, the few cases where it
> > > is intentional, an object would emit many more warnings would make make
> > > majority of the findings (where an object truly had only 1 or 2 fields
> > > uninit) a lot harder to spot in my opinion.
> > >
> > > In general, I think one warning per constructor call makes the best user
> > > experience, but a temporary solution should be implemented until there's
> > > better support for notes.
> > >
> > > I agree, this should be a topic of a follow-up patch.
> > >
> > > This can already be achieved with `-analyzer-config notes-as-events=true`.
> >
> > Yep. But the resulting user experience seems pretty bad to me. It was a
> > good quick proof-of-concept trick to test things, but the fact that notes
> > aren't path pieces is way too apparent in case of this checker. So i guess
> > this flag was a bad idea.
> >
> > > Also, in the case of the LLVM/Clang project, the warnings from this
> > > checker alone would be in the thousands.
> >
> > This makes me a bit worried, i wonder what's the reason why does the
> > checker shout so loudly on a codebase that doesn't seem to have actual
> > uninitialized value bugs all over the place.
> >
> > Are any of these duplicates found in the same constructor for the same
> > field in different translation units? Suppose we discard the duplicates (if
> > any) and warn exactly once (across the whole LLVM) for every uninitialized
> > field (which is probably more than once per constructor). Then I wonder:
> > 1. How many uninitialize fields would we be able to find this way?
> > 2. How many of such fields were intentionally left uninitialized?
> > 3. How many were found by deep inspection of the heap through field chains
> > involving pointer dereference "links"?
> (when i'm asking this sort of stuff, i don't mean you should look through
> thousands of positives, a uniform random sample of ~50 should be just fine)
> (but we really do need this info before enabling stuff by default)
>>This can already be achieved with -analyzer-config notes-as-events=true.
>Yep. But the resulting user experience seems pretty bad to me. It was a good
>quick proof-of-concept trick to test things, but the fact that notes aren't
>path pieces is way too apparent in case of this checker. So i guess this flag
>was a bad idea.
I totally agree, I meant that a note-less solution should only be a (arguably
better) workaround just as `notes-as-events=true`.
>Are any of these duplicates found in the same constructor for the same field
>in different translation units? Suppose we discard the duplicates (if any) and
>warn exactly once (across the whole LLVM) for every uninitialized field (which
>is probably more than once per constructor).
Sure, having the same field field reported from the same constructor call in
different TUs happens quite a lot, but they can easily be uniqued.
In the checkers current state, meaning that one warning per ctor call, the
LLVM/Clang project in pedantic mode resulted in **409 non-unique warnings**,
and using CodeChecker I found that **181 of these is unique**.
>How many uninitialize fields would we be able to find this way?
In its current state, in pedantic mode, after uniqueing **~581**. Now that's
quite a bit off from what I remembered //"the warnings from this checker alone
would be in the thousands"//, but having one warning per uninit field would
still result in a ~320% increase in reports.
>How many of such fields were intentionally left uninitialized?
(I checked around 95 reports a couple weeks back before uploading this diff)
**Almost all of them.** Note that this is a very performance-critical project.
From what I saw, many times several fields are left uninitialized, but these
fields are inaccessible due to a `kind` field being asserted at their getter
functions.
Also, I found cases where the constructor wasn't defaulted with `= default;`.
In fact, I struggled to find a case where a field was 100% left out by
accident. Maybe in `llvm/lib/IR/ConstantsContext.h`, calling
`ConstantExprKeyType(ArrayRef<Constant *> Operands, const ConstantExpr *CE)`.
This is not the case however in other projects I checked. In Xerces for example
every find was a true positive.
>How many were found by deep inspection of the heap through field chains
>involving pointer dereference "links"?
This is somewhat harder to measure. The best I could come up with is uniqueing
the fieldchains from the plist files with lexicographical sorting. Your
question can be answered by measuring the length of the fieldchains. I found
* 3 fieldchains with the length of 5
* 18 fieldchains with the length of 4
* 62 fieldchains with the lenfth of 3
* 108 fieldchains with the length of 2
* 137 fieldchains with the length of 1
(that is a total of 328 unique fieldchains lexicographically)
Because heap objects are not modeled, only 3 of the 328 fieldchains contains
pointer dereferences. However, reference objects are very common from what I
saw, but sadly I can't give you an exact number on those.
I hope these answers were satisfying, but I'll follow up with more information
if needed.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:359-372
+ // Checking bases.
+ const auto *CXXRD = dyn_cast<CXXRecordDecl>(RD);
+ if (!CXXRD)
+ return ContainsUninitField;
+
+ for (const CXXBaseSpecifier BaseSpec : CXXRD->bases()) {
+ const auto *Base = BaseSpec.getType()->getAsCXXRecordDecl();
----------------
NoQ wrote:
> Are there many warnings that will be caused by this but won't be caused by
> conducting the check for the constructor-within-constructor that's currently
> disabled? Not sure - is it even syntactically possible to not initialize the
> base class?
I'm not a 100% sure what you mean. Can you clarify?
>Not sure - is it even syntactically possible to not initialize the base class?
If I understand the question correctly, no, as far as I know.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:430
+ if (T->isUnionType()) {
+ // TODO: does control ever reach here?
+ if (isUnionUninit(R)) {
----------------
NoQ wrote:
> Add `llvm_unreachable("")` to find out :)
It does! :)
https://reviews.llvm.org/D45532
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits