DonatNagyE wrote:

I agree that it's a blocking issue that some numerical arguments act as taint 
sinks unconditionally (even if the analyzer knows about constraints that show 
that the actual value is valid). I'd suggest that those concrete issues should 
be addressed by separate improvement commits, and we should reconsider this 
"move out of alpha" commit afterwards.

As I briefly scrolled through the list of sink functions, it seems that there 
are two affected function families: the `malloc`-like functions and 
`memcpy`-like functions.

IMO in the `malloc` function family the ideal behavior would be:
1. Report a _tainted buffer size can be huge_ warning when we cannot deduce 
that the allocated memory area is less than a certain threshold (e.g. 1 MiB, 
the user should be able to configure this). I'd say that it's a  bug (denial of 
service opportunity) if untrusted data can trigger huge memory allocations 
(gigabytes, terabytes, `SIZE_MAX` etc.).
2. Mark the extent symbol of the allocated memory area as tainted and ensure 
that this triggers the the taint-aware (a/k/a paranoid) mode of e.g. 
ArrayBoundsV2 and similar checkers.

I think these checks should be implemented in MallocChecker, where we already 
have lots of logic related to the size of allocated memory. 

The `memcpy` function family is a simpler case, there we should just eliminate 
the current "generic taint sink" behavior (= if the number of copied bits is 
tainted, always report a bug) and replace it with taint-aware 
optimism/pessimism in CStringChecker (where AFAIK we already have logic that 
checks whether the memcopied chunks fits into the target memory region). This 
logic would let us accept
```c++
char buf[100], srcBuf[100] = {0};
size_t size = tainted();
if (size > 100)
  return;
memcpy(buf, srcBuf, size);
```
while creating a bug report for the variant where the early return condition is 
`(size > 101)`.

@haoNoQ I don't really understand your remark that
> The report may be technically correct but I think the entire idea of the 
> checker never made sense in the first place. It doesn't matter that untrusted 
> data is used to specify buffer size once; it matters that data with 
> _different levels of trust_, or _coming from different sources_, is used to 
> specify size of the _same buffer on different occasions_.

What does "size of the same buffer on different occasions" mean here? I'd guess 
that it refers to something like
```c++
void *get_mem(bool use_default_size) {
  size_t size = 1024;
  if (!use_default_size)
    size = get_tainted_number_from_config_file();
  // do something important that we want to do before each allocation...
  return malloc(size);
}
```
but I'd argue that this is actually valid (although a bit unusual) code.

Personally I feel that the _core idea of this checker_ is that it marks the 
return value of some functions as tainted to let security-conscious users may 
wish to enable pessimistic handling of that data. This is a direct and 
unavoidable "replace FNs with FPs tradeoff" and I think the users should have 
this option even if it produces a significant amout of FPs.

However you're completely right that even in "pessimistic handling" the 
analyzer should at least consult the current state before reporting that "this 
number is tainted, the `memcpy` / `memset` / etc. may cause a buffer overrun".

(Also note that this the simplistic "tainted data + sink = bug report" logic is 
appropriate for functions like `system` where the potentially tainted data is a 
string; because we cannot store meaningful constraints about strings.)

https://github.com/llvm/llvm-project/pull/67352
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to