[PATCH] D154838: [analyzer] Add check for null pointer passed to %p of printf family

Artem Dergachev via Phabricator via cfe-commits Tue, 25 Jul 2023 14:56:38 -0700

NoQ added a comment.

> Should I cover non-standard (i.e., non ISO C standard) functions from the 
> `printf_s` family? Should I cover non-standard functions like `dprintf`?

The static analyzer, unlike the compiler proper, isn't required to treat all 
code fairly. It's ok to have different behavior depending solely on the 
function's name. So if you want to support non-standard functions, and you know 
that they have the same portability issues, totally go for it!

> This check is part of the UnixAPIPortability checker, which is already 
> registered. Do you suggest creating a separate checker for this check?

Probably a separate check would be better. The consequences of `malloc(0)` are 
likely to be much more dire than consequences of `printf("%p", 0)`, so people 
may want to enable/disable them separately.

> What is the correct way to update expected plists for test inputs?

The run-lines are mostly self-explanatory. Just run it through the `grep` 
command in the other run-line. It filters out all the non-transferable stuff. 
(You clearly don't want to have paths like `/Users/georgiy.lebedev/Work/...` in 
test files.)

Or, put your test in a separate file with a simpler run-line that doesn't 
additionally verify plist output. We already have enough tests for plist output.

In D154838#4532929 <https://reviews.llvm.org/D154838#4532929>, @MitalAshok 
wrote:

> Instead of checking for hard-coded names, you can check functions with the 
> format(printf, x, y) 
> <https://clang.llvm.org/docs/AttributeReference.html#format> attribute:
>
>   if (auto *Format = FD->getAttr<FormatAttr>())
>     CheckPrintfPointerConversionSpecifierNULL(C, CE, Format->getFormatIdx());

Is the behavior necessarily implementation-defined for all such functions? I'm 
pretty sure you can find this attribute on a project-local function that will 
never have more than one implementation.

================
Comment at: clang/lib/StaticAnalyzer/Checkers/UnixAPIChecker.cpp:479-480
+    const clang::Expr *arg) const {
+  ExplodedNode *N =
+      C.generateNonFatalErrorNode(nullState ? nullState : C.getState());
+  if (!N)
----------------
> What is the right way to emit a bug report? I have studied the code base and 
> found several patterns: in some cases, an error node is simply generated, in 
> others a new transition is also added based on the analyzed state. In some 
> cases an expression value is also tracked.

You always need to track the value when reporting things about values, it adds 
notes to the report explaining why do we think the variable has that value 
(eg., it was assigned null or compared to null).

Most checkers need this, though for some checkers it's more important than for 
other checkers.

----

Whether to generate a non-fatal error node and what state to put into it, this 
is indeed a very case-by-case decision. In your case the behavior is not 
undefined, the program doesn't crash when the bug happens, it continues to run 
normally, so analysis should probably also continue to find more defects. There 
are a few unknowns about the program's behavior (in particular, we don't know 
what it actually printed), but we never promised to figure this out in the 
first place. And even if there are unknowns that matter, well, unknowns are 
part of life and the rest of the static analyzer is already built under the 
assumption that not everything is known.

If it was an actual null pointer dereference, then you'd have to terminate the 
analysis by generating a fatal error node. The user won't be happy to learn 
about a "bug" that only occurs under the assumption that the program has 
already crashed with null dereference. And if it doesn't *only* occur under 
that assumption, then we'll probably find the same bug anyway while exploring a 
different execution path that doesn't crash.

---

A separate question is, do we want to assume that the pointer `x` is non-null 
after its first use in `printf("%p", x)`? We may or may not need to assume this 
regardless of whether we see a path on which it's //definitely// null. Eg., 
consider:
```lang=c
void foo(int *x) {
  // (1) We probably can't want to warn here, because there's no indication 
that 'x' can be null.
  printf("%p\n", x);

  // This is an indication that 'x' can be null.
  //
  // Technically it applies retroactively to the first line, but retroactive 
warnings are hard
  // because multiple different paths could reach this check, and it's not 
necessarily there
  // specifically for our current path.
  if (x == NULL) {
    printf("WARNING: null passed to foo()!");
  }

  // (2) Now, do we want to warn here? We probably do.
  //
  // But note that if we reached (2), we've already exhibited the same 
non-portable behavior
  // at (1) where no warning was emitted.
  printf("%p\n", x);

  // (3) Do we want to warn again here?
  printf("%p\n", x);

  // (4) What if it was a plain null dereference?
  printf("%d\n", *x);
}
```

If at (1), (2), (3) instead of `printf("%p, x)` we had a plain null dereference 
`printf("%d\n", *x);`, the null dereference checker will not emit any warnings. 
Indeed, the true-branch of the if-statement is dead code because the only way 
to reach it would be to go through undefined behavior at (1), but we also can't 
prove that there's any undefined behavior at (1). Yes we could separately warn 
about dead code, but that's a job for a different checker.

The null dereference checker accomplishes this by invoking 
`addTransition(nonNullState)` regardless of whether the warning is emitted 
(i.e. whenever `notNullState` is an actual state), and making the 
`generateErrorNode(nullState)` fatal. In particular, after (1) the pointer will 
continue to be unconditionally non-null.

In the example as written, you probably want a warning at (2). You also really 
want a null dereference warning at (4). So it's really counter-productive to 
transition to the `notNullState` at (1).

You probably don't care all that much about (3). Indeed, if you reach (3) in 
this analysis, you'll probably also reach it once (2) is addressed. The value 
of emitting all warnings at once is usually secondary to the nuisance of having 
too many warnings at once, so we'd probably still prefer it to warn only at (2).

But in order to have a warning at (4), you will also need to allow a warning at 
(3). This is because //what has been assumed, cannot be unassumed//. You can 
suppress the warning at (3) artificially by tracking a set of values or 
variables about which your checker has already warned, but that'd require a bit 
of work and I don't think it's necessary. It's probably ok to just warn at (3).

So I think your solution is correct.

================
Comment at: clang/lib/StaticAnalyzer/Checkers/UnixAPIChecker.cpp:508-512
+    if (arg->isNullPointerConstant(C.getASTContext(),
+                                   Expr::NPC_ValueDependentIsNull)) {
+      ReportPrintfPointerConversionSpecifierNULL(C, nullptr, arg);
+      return;
+    }
----------------
This really doesn't accomplish anything, `assume()` already does a lot more 
than that. Just rely on `assume()`, or make it a completely path-insensitive 
check if you want to stick to literal constants.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154838/new/

https://reviews.llvm.org/D154838

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D154838: [analyzer] Add check for null pointer passed to %p of printf family

Reply via email to