gamesh411 added inline comments.
================
Comment at: clang/docs/analyzer/checkers.rst:2358
Default sources defined by ``GenericTaintChecker``:
-``fdopen``, ``fopen``, ``freopen``, ``getch``, ``getchar``,
``getchar_unlocked``, ``gets``, ``scanf``, ``socket``, ``wgetch``
+ ``_IO_getc``, ``fdopen``, ``fopen``, ``freopen``, ``get_current_dir_name``,
``getch``, ``getchar``, ``getchar_unlocked``, ``getcw``, ``getcwd``,
``getgroups``, ``gethostname``, ``getlogin``, ``getlogin_r``, ``getnameinfo``,
``getopt``, ``getopt_long``, ``getopt_only``, ``gets``, ``getseuserbyname``,
``readlink``, ``scanf``, ``scanf_s``, ``socket``, ``wgetch``
----------------
steakhal wrote:
> typo/dup?
> I cannot recognize the `getcw()` call. Could you please refer to the
> specification or an instance where it was defined?
`getwd` is the right one instead of `getcw`
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:546-548
{{"gets"}, TR::Source({{0}, ReturnValueIndex})},
{{"scanf"}, TR::Source({{}, 1})},
+ {{"scanf_s"}, TR::Source({{}, {1}})},
----------------
steakhal wrote:
> If we handle `gets`, `scanf`, we should also model the `*_s` versions as well.
> ```lang=C
> char *gets_s(char *s, rsize_t n);
> int scanf_s(const char *restrict format, ...);
> int fscanf_s(FILE *restrict stream, const char *restrict format, ...);
> int sscanf_s(const char *restrict s, const char *restrict format, ...);
> int vscanf_s(const char *restrict format, va_list arg);
> int vfscanf_s(FILE *restrict stream, const char *restrict format, va_list
> arg);
> int vsscanf_s(const char *restrict s, const char *restrict format, va_list
> arg);
> ```
I have added gets_s, and will add the _s variants for the others in the other
patch that deals with the propagatorsj.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:550-552
+ {{"getopt"}, TR::Source({{ReturnValueIndex}})},
+ {{"getopt_long"}, TR::Source({{ReturnValueIndex}})},
+ {{"getopt_long_only"}, TR::Source({{ReturnValueIndex}})},
----------------
steakhal wrote:
> IMO these functions are highly domain-specific.
> On errors, they return specific/well-defined values e.g. `-1`, `'?'` or `':'`.
> That being said, the analyzer does not have this knowledge, thus it will
> model these as `conjured` symbols.
> If these values were modeled as tainted, we would likely introduce the number
> of false-positives regarding our limited capabilities of modeling the
> function accurately.
>
> tldr; I'm against these three rules; or alternatively prove that my concerns
> are not issues on real code bases.
I agree that the handling of these should be in another checker, I remember
some false positives ( mainly uninteresting, typical "just won't fix" errors )
relating to `getopt`, and a domain-specific checker could be more appropriate
here.
Removed them.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:556
+ {{"getwd"}, TR::Source({{0, ReturnValueIndex}})},
+ {{"readlink"}, TR::Source({{1, ReturnValueIndex}})},
+ {{"get_current_dir_name"}, TR::Source({{ReturnValueIndex}})},
----------------
steakhal wrote:
> We should check `readlinkat` as well.
Added
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:559
+ {{"gethostname"}, TR::Source({{0}})},
+ {{"getnameinfo"}, TR::Source({{2, 4}})},
+ {{"getseuserbyname"}, TR::Source({{1, 2}})},
----------------
steakhal wrote:
> In what cases can this function introduce taint?
The getnameinfo converts from
```
struct sockaddr_in {
sa_family_t sin_family; /* address family: AF_INET */
in_port_t sin_port; /* port in network byte order */
struct in_addr sin_addr; /* internet address */
};
/* Internet address */
struct in_addr {
uint32_t s_addr; /* address in network byte order */
};
```
to hostname and servername strings.
One could argue that by crafting a specific IP address, that is known to
resolve to a specific hostname in the running environment could lead an
attacker injecting a chosen (in some circumstances arbitrary) string into the
code at the point of this function.
I know this is a bit contrived, and more on the cybersecurity side of things,
so I am not sure whether to add this here, or add this in a specific checker,
or just leave altogether. Please share your opinion about this.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:561
+ {{"getseuserbyname"}, TR::Source({{1, 2}})},
+ {{"getgroups"}, TR::Source({{1}})},
+ {{"getlogin"}, TR::Source({{ReturnValueIndex}})},
----------------
steakhal wrote:
> >On success, `getgroups()` returns the number of supplementary group IDs. On
> >error, -1 is returned, and `errno` is set appropriately.
>
> According to this, the return value index should be also tainted.
Added
================
Comment at: clang/test/Analysis/taint-generic.c:385
+ struct option long_opts[] = {{0, 0, 0, 0}};
+ int opt = getopt_long_only(argc, argv, "a:b:02", long_opts, &option_index);
+ return 1 / opt; // expected-warning {{Division by a tainted value, possibly
zero}}
----------------
steakhal wrote:
> Well, this can never return zero.
> What we should do is to do a state-split for the failure case; since the
> application should definitely handle a failure in this part; thus the split
> would be justified.
Removed
================
Comment at: clang/test/Analysis/taint-generic.c:392
+int underscore_IO_getc_is_source(_IO_FILE *fp) {
+ char c = _IO_getc(fp);
+ return 1 / c; // expected-warning {{Division by a tainted value, possibly
zero}}
----------------
steakhal wrote:
> Sometimes we taint the `fd` and propagate based on that, and othertimes, we
> simply just return taint.
> However, I think it still looks like a better tradeoff this way.
> I just wanted to highlight this. Maybe a comment on the CallDescription would
> be beneficial in describing this discrepancy.
Added a comment
================
Comment at: clang/test/Analysis/taint-generic.c:417
+char *get_current_dir_name(void);
+int get_current_dir_name_is_source() {
+ char *d = get_current_dir_name();
----------------
steakhal wrote:
> Please avoid spelling the given function in the name of the test directly in
> a verbatim manner.
> If we would use the `CDF_MaybeBuiltin` matching mode, the
> `get_current_dir_name_is_source` would match for the `CallDescription
> {"get_current_dir_name"}` due to the way we fuzzy match for builtins.
> Prefixing with `Test` like `testGet_current_dir_name` would be fine though,
> only the underscores are handled differently.
> This applies to the rest of the test cases as well.
fixed the test names
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D120236/new/
https://reviews.llvm.org/D120236
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits