On Thursday, May 9, 2019 10:35:18 PM CEST Bruno Haible wrote: > Hi Kamil, > > > There are 3 important state-transitions in the data-flow analysis: > > > > (1) obtaining data from untrusted source > > (2) sanitizing the data (checking bounds etc.) > > (3) unsafe use of untrusted data > > > > gnulib's base64_encode() as seen by Coverity Analysis represents (3) > > because its implementation uses byte swaps. This is a heuristic that > > is not always correct, so false positives may happen. If you ask why > > byte swaps are checked, I believe it was inspired by Heartbleed: > > > > https://www.synopsys.com/blogs/software-security/detecting-heartbleed-with > > -static-analysis/ > > > > The inline annotation that I proposed as a patch gives Coverity a hint > > that gnulib's implementation of base64_encode() can safely process data > > from untrusted sources. The annotation is specific to the implementation > > of the function, not to users of the function. > > Ah, thanks for explaining. Now I agree: base64_encode produces the > warning because of the (x << n) | (y >> m) expression patterns that > resemble a byte swap. It would do so also for any other program that > contains a base64_encode invocation with untrusted input as argument. > > > > Does it need to be done in the source code at all? > > > > Yes, in case of gnulib this is the only sensible option. > > ... > > Yes, various tools exist to waive false positives. The problem is that > > instances of these tools do not share data with each other in the > > universe. > > Consequently, developers have to repeatedly review these false positives > > and waive them in each single instance of these tools. And even worse > > with > > gnulib because these false positives are usually not matched across > > different project that bundle gnulib, even if you have a single instance > > of the waiving tool in your organisation. > > So, I propose to bite the bullet, but at least put a reasonable comment. > > > 2019-05-09 Kamil Dudka <kdu...@redhat.com> > Bruno Haible <br...@clisp.org> > > base64: Avoid false positive warning from Coverity. > * lib/base64.c (base64_encode): Add special comment for Coverity. > > diff --git a/lib/base64.c b/lib/base64.c > index f3f7298..80428bb 100644 > --- a/lib/base64.c > +++ b/lib/base64.c > @@ -84,6 +84,11 @@ base64_encode_fast (const char *restrict in, size_t > inlen, char *restrict out) If OUTLEN is less than BASE64_LENGTH(INLEN), > write as many bytes as possible. If OUTLEN is larger than > BASE64_LENGTH(INLEN), also zero terminate the output buffer. */ > +/* Tell Coverity that this function works fine when called with IN > + pointing to untrusted input. By default, Coverity, seeing the value > + shift expressions below, thinks that it is dangerous to call this > + function with untrusted input. > + coverity[-tainted_data_sink: arg-0] */ > void > base64_encode (const char *restrict in, size_t inlen, > char *restrict out, size_t outlen)
Perfect. I like the idea. Unfortunately, Coverity seems to be picky about the format, so it needs to be spelled like this: diff --git a/lib/base64.c b/lib/base64.c index bb4dce8..5cbef9c 100644 --- a/lib/base64.c +++ b/lib/base64.c @@ -84,6 +84,11 @@ base64_encode_fast (const char *restrict in, size_t inlen, char *restrict out) If OUTLEN is less than BASE64_LENGTH(INLEN), write as many bytes as possible. If OUTLEN is larger than BASE64_LENGTH(INLEN), also zero terminate the output buffer. */ +/* Tell Coverity that this function works fine when called with IN + pointing to untrusted input. By default, Coverity, seeing the value + shift expressions below, thinks that it is dangerous to call this + function with untrusted input. */ +/* coverity[-tainted_data_sink: arg-0] */ void base64_encode (const char *restrict in, size_t inlen, char *restrict out, size_t outlen)