donat.nagy added a comment.
As I thought more about this commit I realized that I have two vague but
significant concerns. I'm sorry that I wasn't able to describe these before you
started to dig into the details of the heuristics.
(1) I don't think that code like `int *ptr; scanf("%ld", &ptr);` should be
covered by an ArrayBound checker: this is not just a "be careful with the value
of the index" issue, but a raw pointer that's completely controlled by an
untrusted source. I'd try to cover this kind of bug with either the StdLibrary
checker or a small new checker [or perhaps just a clang-tidy check] which would
report pointers coming from `scanf` or other user input even if it doesn't see
a dereference of the pointer value.
(2) Taintedness of a memory region (i.e. `isTainted(state, location)` where
`location` is presumably a `MemRegionVal`) is poorly defined and might mean
three different things:
(a) **The 'begin' pointer value of the region is tainted** (controlled by an
untrusted source). This may be a "we're already dead" situation (e.g. the
pointer is an arbitrary value from `scanf`) or something completely harmless
(we are examining something like the second subscript operator in
`large_array[user_input].field[idx]` after verifying that the first subscript
with the tainted index is valid), but in either case, the reliability of the
'begin' pointer should not affect the comparison between the region extent and
the index value.
(b)** The extent of the region is tainted** (controlled by an untrusted source,
e.g. it's an user input string with an unknown length). In this case it's
reasonable to turn on the "paranoid" comparison mode -- however, this situation
should be recorded by marking the //extent symbol// of the region as tainted
instead of marking the region itself as tainted. I'd be happy to see a commit
that ensures that the extent symbol of user input strings is tainted and its
taintedness is handled in ArrayBoundsV2.
(c)** The contents of the region are tainted** (controlled by an untrusted
source). Arguably this should be recorded by marking the //contents// as
tainted, but that's difficult to implement on a string/array, so it's a
somewhat reasonable shortcut to put the taint on the region itself. This kind
of taint is completely irrelevant for out of bounds memory access and should
not affect the comparisons.
In summary, I'd say that this checker should not be affected by the taintedness
of the memory region itself (but it should continue to monitor tainted indices,
and if possible, it should be extended to handle tainted extent values). I'm
not completely opposed to merging this patch if you add enough heuristics and
workarounds to make it stable in practice, but I feel that this is not the
right way forward.
It's fortunate that we'll meet in person on Monday because there we can discuss
this topic (and other plans) in more detail.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D159105/new/
https://reviews.llvm.org/D159105
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits