Re: Coverity false positives triggered by gnulib's implementation of base64

Kamil Dudka Fri, 10 May 2019 04:32:21 -0700

On Thursday, May 9, 2019 9:14:40 PM CEST Paul Eggert wrote:
> On 5/7/19 7:22 AM, Kamil Dudka wrote:
> > It triggered the following false positives in the cryptsetup project:
> > 
> > Error: TAINTED_SCALAR:
> > lib/luks2/luks2_digest_pbkdf2.c:117: tainted_data_argument: Calling
> > function "crypt_random_get" taints argument "salt".
> > lib/luks2/luks2_digest_pbkdf2.c:157: tainted_data: Passing tainted
> > variable "salt" to a tainted sink.
> > 
> > Error: TAINTED_SCALAR:
> > lib/luks2/luks2_keyslot_luks2.c:445: tainted_data_argument: Calling
> > function "crypt_random_get" taints argument "salt".
> > lib/luks2/luks2_keyslot_luks2.c:448: tainted_data: Passing tainted
> > variable "salt" to a tainted sink.
> > 
> > 
> > ... but it can affect other gnulib-based projects, too.
> 
> Can you explain in more detail what these diagnostic messages mean,


1. crypt_random_get() reads data from untrusted source (file descriptor).

2. The data is given to base64_encode() without being checked first.

> and why this is Gnulib's "fault"?

Nobody said it was Gnulib's "fault".  I am just proposing a one-line comment 
that would significantly help to many direct and indirect users of Gnulib.

> For example, how do you know that the reports are false positives and not
> true positives?

I think it was obvious from my previous explanation:

(1) You need to check (by manual review) that the source of data is really 
untrusted.

(2) You need to check (by manual review) that there is no sufficient check
on the data.

(3) You need to check (by manual review) that the sink function is really 
vulnerable to data from untrusted source.

When doing step (3), I verified that Gnulib's base64_encode() can safely 
process data from untrusted source.  Then I wanted to record this information 
into the source code so that other users of Gnulib do not need to verify this 
each time they run Coverity on a project that bundles Gnulib's implementation 
of base64_encode().

> That sort of thing.
> Normally I'm a fan of static analysis, but it can go too far sometimes....

This is a matter of taste.  I do not think that all users of Gnulib share
your opinion on this.

> > gnulib's base64_encode() as seen by Coverity Analysis represents (3)
> > because its implementation uses byte swaps.
> 
> base64_encode does not actually swap bytes. For example, it wouldn't be
> reasonable for base64.c to include byteswap.h in order to clarify
> base64_encode, because the byteswap.h's byteswapping operations wouldn't
> help. Also, even if base64_encode were swapping bytes, byte-swapping by
> itself is not a problem and I don't see why Heartbleed is relevant here,
> as the resulting subscripting operations are obviously in range. So I
> would like to to know exactly why Coverity infers that this particular
> byte-manipulation is problematic.
> 
> For example, does the problem go away if you replace 'b64c[to_uchar
> (in[0]) >> 2]' with 'b64c[(to_uchar (in[0]) >> 2) & 0x3f]'? If so, it
> would indicate that Coverity merely needs to improve their range analysis.

Who said that range analysis is actually used for the TAINTED_SCALAR checker 
implemented by Coverity?

While the technical details are not public, analyses like this can also work 
more or less on a syntax level, yet still provide useful enough results.

> More generally, does it help if you make simple changes to base64.c that
> might clarify the code a bit and in addition pacify Coverity? If so,
> that might be preferable. Something like the attached untested patch,
> say.

The code change you proposed does not make the false positives go away.

> The idea is to somehow convince Coverity that the code is OK, while
> not making it worse (and maybe making it better).

I do not think it is a good idea to change a piece of working code to make
a static analysis false positives magically disappear.  Even if you were able 
to eliminate the false positives now, they can reappear with a future version 
of Coverity.  The implementation of the sink detector in the TAINTED_SCALAR 
checker is a black box for us whereas the inline annotations are documented 
and usually backward compatible in future releases of Coverity.

> > /* coverity[-tainted_data_sink: arg-0] */
> 
> Just adding a line like that would be too cryptic. We can't expect the
> reader to know the ins and outs of a proprietary 3rd-party tool.

OK, got it.

> Instead, the comment would have to explain what's going on and why the
> comment is needed, well enough for the Gnulib audience.

Exactly what Bruno proposed and I like the idea.

> However, I'm hoping that we don't need the line at all, because it
> sounds like Coverity might be buggy here.

It is important to realize that static analysis algorithms like this are 
successful because they are computationally cheap and cope easily with 
unmodified real-world code.  They suffer from both false positives and
false negatives by design.  Getting precise results for checkers like
this is computationally expensive and in the general case impossible.

Kamil

Re: Coverity false positives triggered by gnulib's implementation of base64

Reply via email to