NagyDonat wrote: @alejandro-alvarez-sonarsource @steakhal What do you think about the code duplication situation? What are your reasons for proposing this implementation?
If I understand correctly the "new" type-based bounds checking differs from the "old" region/extent-based logic in the following areas: - it performs calculations with an index instead of a byte offset, - it uses a different logic to calculate the array size (which will be compared with the index), - in multidimensional arrays (and probably other similar situations?) it has a different (more strict) understanding about the beginning of the array, - it reports slightly different messages (I admit that I didn't analyze the differences in detail). To achieve these differences the current version of the PR duplicates the following components of the code: - the "main method" `performCheck` / `performCheckArrayTypeIndex`, - the helper class `StateUpdateReporter` / `StateIndexUpdateReporter`, - the various helper functions that produce the `Messages`. Among these my rough impression is that the state update handler and message generation helpers are relatively straightforward and they could be unified by introducing a few well-placed additional parameters, boolean flags etc. (or even a BaseStateUpdateReporter class with two subclasses that implement slightly different behavior). In the "main" methods I see that the "old" and "new" approaches begin with completely different logic for calculating the index/offset value, the base region (i.e. the "full array" to which the accessed location is compared) and the array size. However, after that I find it very unfortunate that lots of complex logic (the deeply nested `if`s that check underflow and then overflow, plus past-the-end expressions, taint etc.) is duplicated between the two variants. I hope that it would be possible to factor out a "check whether this index/offset is between zero and this extent value" method which could be shared between the "new" and "old" checks. I see that there are some additional differences in this latter part of the "main" `performCheck` methods, but I feel that these are not _neccessary_ differences: - I would guess that the new heuristics for better fake-FAM handling would be valuable in the old code as well (although `getDynamicExtent` has some flexible array member heuristics); - I don't immediately see why can the new variant omit my "isObviouslyNonnegative" hack (and if you do have a better solution, then I would be very happy to see it applied in the "old" check as well). Moreover I feel that we could and perhaps should reduce the amount of duplicated code by **removing some parts of the old logic** and keeping only the new approach: - Perhaps we don't need the complex old logic that handles multidimensional arrays by calculating the offset from the beginning of the whole array (by combining multiple `ElementRegion`s) -- IIUC that logic was introduced when this checker was still using `check::Location` and `check::Bind` callbacks that provided less information than the current callbacks. - Perhaps we could switch to using indices instead of byte offsets by default -- because this would produce simpler symbolic expressions that would be "more useful" for the rest of the analyzer (e.g. instead of assuming `4 * x < 40` we could assume `x < 10` which can also occur as e.g. an `if` condition). In this review I'll try to be as constructive as possible and I'm open to investing work into refactoring this checker to get an end result that provides the features that you need with as little code duplication as possible. Note that I'm also planning significant changes in this file: I'm planning to reimplement `alpha.security.ReturnPtrRange` as a separate `CheckerFrontend` of `ArrayBound` (which would become a checker family) because `ReturnPtrRange` and `ArrayBound` check the exactly same thing (is this location in bounds?) in two different situations (returned pointer value vs accessed location). (In fact, I started to work on this change two days ago, but I'll probably need to restart that work because this PR affects lots of code within `ArrayBoundChecker.cpp`.) On a longer time horizon, I'm also planning to use the `ArrayBound` logic as a "backend" for the `CStringOutOfBound` checker, so perhaps we might also think about separating some core logic into a library-like bounds checking source file. https://github.com/llvm/llvm-project/pull/159357 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
