[clang] [analyzer] Enhance array bound checking for `ConstantArrayType` (PR #159357)

Donát Nagy via cfe-commits Thu, 18 Sep 2025 09:49:17 -0700

NagyDonat wrote:

@alejandro-alvarez-sonarsource @steakhal What do you think about the code 
duplication situation? What are your reasons for proposing this implementation?


If I understand correctly the "new" type-based bounds checking differs from the 
"old" region/extent-based logic in the following areas:
- it performs calculations with an index instead of a byte offset,
- it uses a different logic to calculate the array size (which will be compared 
with the index),
- in multidimensional arrays (and probably other similar situations?) it has a 
different (more strict) understanding about the beginning of the array,
- it reports slightly different messages (I admit that I didn't analyze the 
differences in detail).

To achieve these differences the current version of the PR duplicates the 
following components of the code:
- the "main method" `performCheck` / `performCheckArrayTypeIndex`,
- the helper class `StateUpdateReporter` / `StateIndexUpdateReporter`,
- the various helper functions that produce the `Messages`.

Among these my rough impression is that the state update handler and message 
generation helpers are relatively straightforward and they could be unified by 
introducing a few well-placed additional parameters, boolean flags etc. (or 
even a BaseStateUpdateReporter class with two subclasses that implement 
slightly different behavior).

In the "main" methods I see that the "old" and "new" approaches begin with 
completely different logic for calculating the index/offset value, the base 
region (i.e. the "full array" to which the accessed location is compared) and 
the array size. However, after that I find it very unfortunate that lots of 
complex logic (the deeply nested `if`s that check underflow and then overflow, 
plus past-the-end expressions, taint etc.) is duplicated between the two 
variants. I hope that it would be possible to factor out a "check whether this 
index/offset is between zero and this extent value" method which could be 
shared between the "new" and "old" checks.

I see that there are some additional differences in this latter part of the 
"main" `performCheck` methods, but I feel that these are not _neccessary_ 
differences:
- I would guess that the new heuristics for better fake-FAM handling would be 
valuable in the old code as well (although `getDynamicExtent` has some flexible 
array member heuristics);
- I don't immediately see why can the new variant omit my 
"isObviouslyNonnegative" hack (and if you do have a better solution, then I 
would be very happy to see it applied in the "old" check as well).

Moreover I feel that we could and perhaps should reduce the amount of 
duplicated code by **removing some parts of the old logic** and keeping only 
the new approach:
- Perhaps we don't need the complex old logic that handles multidimensional 
arrays by calculating the offset from the beginning of the whole array (by 
combining multiple `ElementRegion`s) -- IIUC that logic was introduced when 
this checker was still using `check::Location` and `check::Bind` callbacks that 
provided less information than the current callbacks. 
- Perhaps we could switch to using indices instead of byte offsets by default 
-- because this would produce simpler symbolic expressions that would be "more 
useful" for the rest of the analyzer (e.g. instead of assuming `4 * x < 40` we 
could assume `x < 10` which can also occur as e.g. an `if` condition).

In this review I'll try to be as constructive as possible and I'm open to 
investing work into refactoring this checker to get an end result that provides 
the features that you need with as little code duplication as possible.

Note that I'm also planning significant changes in this file: I'm planning to 
reimplement `alpha.security.ReturnPtrRange` as a separate `CheckerFrontend` of 
`ArrayBound` (which would become a checker family) because `ReturnPtrRange` and 
`ArrayBound` check the exactly same thing (is this location in bounds?) in two 
different situations (returned pointer value vs accessed location). (In fact, I 
started to work on this change two days ago, but I'll probably need to restart 
that work because this PR affects lots of code within `ArrayBoundChecker.cpp`.) 
On a longer time horizon, I'm also planning to use the `ArrayBound` logic as a 
"backend" for the `CStringOutOfBound` checker, so perhaps we might also think 
about separating some core logic into a library-like bounds checking source 
file.

https://github.com/llvm/llvm-project/pull/159357
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [analyzer] Enhance array bound checking for `ConstantArrayType` (PR #159357)

Reply via email to