donat.nagy added a comment.

As I thought more about this commit I realized that I have two vague but 
significant concerns. I'm sorry that I wasn't able to describe these before you 
started to dig into the details of the heuristics.
(1) I don't think that code like `int *ptr; scanf("%ld", &ptr);` should be 
covered by an ArrayBound checker: this is not just a "be careful with the value 
of the index" issue, but a raw pointer that's completely controlled by an 
untrusted source. I'd try to cover this kind of bug with either the StdLibrary 
checker or a small new checker [or perhaps just a clang-tidy check] which would 
report pointers coming from `scanf` or other user input even if it doesn't see 
a dereference of the pointer value.

(2) Taintedness of a memory region (i.e. `isTainted(state, location)` where 
`location` is presumably a `MemRegionVal`) is poorly defined and might mean 
three different things:
(a) **The 'begin' pointer value of the region is tainted** (controlled by an 
untrusted source). This may be a "we're already dead" situation (e.g. the 
pointer is an arbitrary value from `scanf`) or something completely harmless 
(we are examining something like the second subscript operator in 
`large_array[user_input].field[idx]` after verifying that the first subscript 
with the tainted index is valid), but in either case, the reliability of the 
'begin' pointer should not affect the comparison between the region extent and 
the index value.
(b)** The extent of the region is tainted** (controlled by an untrusted source, 
e.g. it's an user input string with an unknown length). In this case it's 
reasonable to turn on the "paranoid" comparison mode -- however, this situation 
should be recorded by marking the //extent symbol// of the region as tainted 
instead of marking the region itself as tainted. I'd be happy to see a commit 
that ensures that the extent symbol of user input strings is tainted and its 
taintedness is handled in ArrayBoundsV2.
(c)** The contents of the region are tainted** (controlled by an untrusted 
source). Arguably this should be recorded by marking the //contents// as 
tainted, but that's difficult to implement on a string/array, so it's a 
somewhat reasonable shortcut to put the taint on the region itself. This kind 
of taint is completely irrelevant for out of bounds memory access and should 
not affect the comparisons.

In summary, I'd say that this checker should not be affected by the taintedness 
of the memory region itself (but it should continue to monitor tainted indices, 
and if possible, it should be extended to handle tainted extent values). I'm 
not completely opposed to merging this patch if you add enough heuristics and 
workarounds to make it stable in practice, but I feel that this is not the 
right way forward.

It's fortunate that we'll meet in person on Monday because there we can discuss 
this topic (and other plans) in more detail.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159105/new/

https://reviews.llvm.org/D159105

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to