On Thursday, May 9, 2019 9:14:40 PM CEST Paul Eggert wrote: > On 5/7/19 7:22 AM, Kamil Dudka wrote: > > It triggered the following false positives in the cryptsetup project: > > > > Error: TAINTED_SCALAR: > > lib/luks2/luks2_digest_pbkdf2.c:117: tainted_data_argument: Calling > > function "crypt_random_get" taints argument "salt". > > lib/luks2/luks2_digest_pbkdf2.c:157: tainted_data: Passing tainted > > variable "salt" to a tainted sink. > > > > Error: TAINTED_SCALAR: > > lib/luks2/luks2_keyslot_luks2.c:445: tainted_data_argument: Calling > > function "crypt_random_get" taints argument "salt". > > lib/luks2/luks2_keyslot_luks2.c:448: tainted_data: Passing tainted > > variable "salt" to a tainted sink. > > > > > > ... but it can affect other gnulib-based projects, too. > > Can you explain in more detail what these diagnostic messages mean,
1. crypt_random_get() reads data from untrusted source (file descriptor). 2. The data is given to base64_encode() without being checked first. > and why this is Gnulib's "fault"? Nobody said it was Gnulib's "fault". I am just proposing a one-line comment that would significantly help to many direct and indirect users of Gnulib. > For example, how do you know that the reports are false positives and not > true positives? I think it was obvious from my previous explanation: (1) You need to check (by manual review) that the source of data is really untrusted. (2) You need to check (by manual review) that there is no sufficient check on the data. (3) You need to check (by manual review) that the sink function is really vulnerable to data from untrusted source. When doing step (3), I verified that Gnulib's base64_encode() can safely process data from untrusted source. Then I wanted to record this information into the source code so that other users of Gnulib do not need to verify this each time they run Coverity on a project that bundles Gnulib's implementation of base64_encode(). > That sort of thing. > Normally I'm a fan of static analysis, but it can go too far sometimes.... This is a matter of taste. I do not think that all users of Gnulib share your opinion on this. > > gnulib's base64_encode() as seen by Coverity Analysis represents (3) > > because its implementation uses byte swaps. > > base64_encode does not actually swap bytes. For example, it wouldn't be > reasonable for base64.c to include byteswap.h in order to clarify > base64_encode, because the byteswap.h's byteswapping operations wouldn't > help. Also, even if base64_encode were swapping bytes, byte-swapping by > itself is not a problem and I don't see why Heartbleed is relevant here, > as the resulting subscripting operations are obviously in range. So I > would like to to know exactly why Coverity infers that this particular > byte-manipulation is problematic. > > For example, does the problem go away if you replace 'b64c[to_uchar > (in[0]) >> 2]' with 'b64c[(to_uchar (in[0]) >> 2) & 0x3f]'? If so, it > would indicate that Coverity merely needs to improve their range analysis. Who said that range analysis is actually used for the TAINTED_SCALAR checker implemented by Coverity? While the technical details are not public, analyses like this can also work more or less on a syntax level, yet still provide useful enough results. > More generally, does it help if you make simple changes to base64.c that > might clarify the code a bit and in addition pacify Coverity? If so, > that might be preferable. Something like the attached untested patch, > say. The code change you proposed does not make the false positives go away. > The idea is to somehow convince Coverity that the code is OK, while > not making it worse (and maybe making it better). I do not think it is a good idea to change a piece of working code to make a static analysis false positives magically disappear. Even if you were able to eliminate the false positives now, they can reappear with a future version of Coverity. The implementation of the sink detector in the TAINTED_SCALAR checker is a black box for us whereas the inline annotations are documented and usually backward compatible in future releases of Coverity. > > /* coverity[-tainted_data_sink: arg-0] */ > > Just adding a line like that would be too cryptic. We can't expect the > reader to know the ins and outs of a proprietary 3rd-party tool. OK, got it. > Instead, the comment would have to explain what's going on and why the > comment is needed, well enough for the Gnulib audience. Exactly what Bruno proposed and I like the idea. > However, I'm hoping that we don't need the line at all, because it > sounds like Coverity might be buggy here. It is important to realize that static analysis algorithms like this are successful because they are computationally cheap and cope easily with unmodified real-world code. They suffer from both false positives and false negatives by design. Getting precise results for checkers like this is computationally expensive and in the general case impossible. Kamil