[PATCH] D72705: [analyzer] Added new checker 'alpha.unix.ErrorReturn'.

Gabor Marton via Phabricator via cfe-commits Thu, 14 May 2020 01:35:32 -0700

martong added a comment.

I think this is a useful concept and it can nicely complement the ongoing 
summary based work in `StdLibraryFunctionsChecker`. Still, in my opinion, there 
are more discussions and work to be done here before we get to the final 
implementation.


Our analyzer does not reason about all paths, we try to find only ONE path 
where an error occurs. This leads to a non-trivial question: Is it an error if 
we have some paths that do not check for the return value and at least one path 
where we do?
(1) One answer could be that there is bug if we find out that we use a rotten 
file descriptor or a nullptr. But, if we don't "use" them then that is not bug. 
How do we define "use"? We could say that any function that takes the return 
value is using that, but that may lead to false positives if we cannot see into 
the function body (that function may check the value). I think we should define 
"use" in the terms of preconditions of other functions. In the previous example 
(`use(*P)`) we should be able to specify that the parameter of `use` must not 
be null and if that condition is violated only then should we diagnose the bug. 
In the CERT description we see that the different functions are related. For 
instance, we should use `fgetc` only with a valid `FILE*` that must not be 
NULL. This approach is being implemented in `StdLibraryFunctionsChecker`.

(2) On the other hand, there are cases when there is no such visible data 
dependency between the functions. E.g. the example from ERR33-C:

  size_t read_at(FILE *file, long offset,
                 void *buf, size_t nbytes) {
    fseek(file, offset, SEEK_SET);
    return fread(buf, 1, nbytes, file);
  }

Here, there are no argument constraints to fread that may be violated by 
calling fseek! But, let's just remove the call of fread:

  size_t read_at(FILE *file, long offset,
                 void *buf, size_t nbytes) {
    return fseek(file, offset, SEEK_SET);
  }

The bug has gone! I suggest that we should create/describe function groups. In 
a group we would define those functions that are related. E.g. fseek, fread, 
fopen, etc. If they operate on the same stream and we can find a path where 
there is an fseek and then an fread without the return value checked then that 
is a real bug. What do you think? Maybe we should have a generic infrastructure 
that could be reused in StreamChecker?

> A problem with this approach is that it works only if the value (the return 
> value of the function) is not constrained already after the function call (at 
> least not for the error value), otherwise it will interpret (wrongly) this 
> constrainment as result of an error check branch in the code. So it is not 
> possible to make a branch with fixed error return value (if other checker do 
> this this is probably bad too). And it is not possible (?) to make a new 
> branch without constraint.
>  ...
>  For example a postCall put constraints on it if it is known that the 
> returned value is always non-negative. Or more worse, an error path is 
> generated when the return value is exactly the error return code. In the 
> mentioned case if the f function is modeled and a path is added where it has 
> return value zero the checker will "think" that there was an if (X == 0) 
> condition in the code.

In my understanding, if the return value is already constrained (to not have 
the error value) on a path that means we are "ready" on that path, i.e. we 
don't have to check again for the return value on that path. However, there may 
be other paths where we have to seek for the check.

> Generally, is it a problem if there are multiple checkers that may detect 
> same defects but with different approach? Or should the code of this checker 
> (for example) made more complicated only to filter out cases that may be 
> detected by other checkers too?

If we can avoid it then do it, but, in my opinion, that is not a problem. There 
will be one checker which finds the error first then emits an error node and 
then the analysis stops, the next checker will not find the same bug, because 
execution had already stopped. We need to define the order of the colliding 
checkers though. Actually, there is an example for such a case: D79420 
<https://reviews.llvm.org/D79420> (Note, I am not sure about non-fatal errors, 
does the analysis continue in that case?)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72705/new/

https://reviews.llvm.org/D72705



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D72705: [analyzer] Added new checker 'alpha.unix.ErrorReturn'.

Reply via email to