[clang] [analyzer] CStringChecker: do not branch assuming the arguments of `str(n)cmp` are equal (PR #161644)

Donát Nagy via cfe-commits Thu, 02 Oct 2025 06:40:45 -0700

https://github.com/NagyDonat commented:


I see that the old behavior had problems, but the new code also has a 
problematic aspect: in the case when the pointers may or may not be same (that 
is, when `StSameBuf && StNotSameBuf` is true) it always assumes that the 
pointers are not equal (because it assigns `state = StNotSameBuf;` in the line 
just below the diff context displayed by github).

To test this, you should replace the `// ...` placeholders in your new tests 
with `clang_analyzer_eval(a == b); // expected-warning{{UNKNOWN}}`. Ideally 
these checks should pass (if the strings are identical, the pointers may or may 
not be equal) but with your code they will produce FALSE (because the 
`StSameBuf` branch is silently discarded).

To avoid losing the "pointers may be equal" case you should remove the 
statement `state = StNotSameBuf;`. Unfortunately, deleting this statement also 
has a serious side effect:  after it `clang_analyzer_eval(a == b);` would also 
be `UNKNOWN` on the branch where `strcmp` is nonzero. (However, this side 
effect is just "not recording knowledge" which is IMO less problematic than 
"arbitrarily discarding a case that would be possible".)

This side effect can be easily mitigated in the case when the analyzer can 
evaluate the comparison of known strings (just re-add `state = StNotSameBuf;` 
on the branch where the checker records the assumption that the result is GT or 
LT compared to zero).

Mitigating the side effect is also possible in the case when the analyzer 
cannot evaluate the comparison (which includes e.g. the tests added in this 
PR): to do this we should:
- call `auto [StResultEqual, StResultNonEqual] = state->assume(resultVal)` 
(where `resultVal` is the freshly conjured otherwise unconstrained result value)
  - these states will be both non-null because the value was unconstrained
- add a transition to `StResultEqual` (i.e. the string values equal, pointers 
may or may not be equal)
- starting from `StResultNotEqual` (when the string values differ) assume that 
the two pointers are different, then add a second transition to the resulting 
state (if it is non-null ... in theory it should be always non-null, but there 
might be some rare corner case were it isn't).

This second mitigation introduces a state split, but IMO this is a much more 
"benign" state split than the one that was introduced by the old code. 
Separating "`strcmp` returns zero" and "`strcmp` returns nonzero" is a 
reasonable after a call to `strcmp` (in fact, this is very similar to the 
`eagerly-assume` heuristic, which works correctly and is enabled by default) -- 
while the old "pointers are equal" and "pointers are different" state split was 
indeed a surprising and unjustified consequence of calling `strcmp`.

_I admit that these suggestions are a bit complex, feel free to ask for details 
if you don't understand something..._

https://github.com/llvm/llvm-project/pull/161644
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [analyzer] CStringChecker: do not branch assuming the arguments of `str(n)cmp` are equal (PR #161644)

Reply via email to