Re: base64.c vs. newlines

Simon Josefsson Thu, 17 Jan 2008 06:19:23 -0800

Jim Meyering <[EMAIL PROTECTED]> writes:

> Hi Simon,
>
> coreutils' `base64 --decode's inability to decode its own encoded
> output (without the sledgehammer of --ignore-garbage) finally got
> to me :-)


Hi Jim!  Finally, reviewing this got to me. :-)

> So I bit the bullet and changed gnulib's base64.c and coreutils'
> src/base64.c so they decode embedded newlines unconditionally.

While that behaviour is likely better for src/base64.c (i.e., the
command line interface), it is not a good thing for lib/base64.c, where
the changes is done now.  I would prefer that this behaviour was
optional in the library part.  Either a flags parameter, or a new API?

> Note that it doesn't yet do anything about the examples
> in comments at the top of lib/base64.c -- IMHO, examples
> belong in test cases that can be checked automatically, so I'd
> like to remove those.  Ok with you?

Do you refer to the following snippet?  This was intended as
documentation for using the API, and I think it is useful to have it
somewhere.  I don't regard it as an example.  The error handling
interaction is a bit tricky to get right.

 * Be careful with error checking.  Here is how you would typically
 * use these functions:
 *
 * bool ok = base64_decode_alloc (in, inlen, &out, &outlen);
 * if (!ok)
 *   FAIL: input was not valid base64
 * if (out == NULL)
 *   FAIL: memory allocation error
 * OK: data in OUT/OUTLEN
 *
 * size_t outlen = base64_encode_alloc (in, inlen, &out);
 * if (out == NULL && outlen == 0 && inlen != 0)
 *   FAIL: input too long
 * if (out == NULL)
 *   FAIL: memory allocation error
 * OK: data in OUT/OUTLEN.

> Note also that I haven't yet updated tests/test-base64.c.
> However, I have added quite a few to coreutils' tests/misc/base64.

I don't think merging all of these back is important.  But I've just
noticed that there is no modules/base64-tests in gnulib, so the self
test that we have is never invoked.  I'm fixing that separately.

> I've paid careful attention to the efficiency of the resulting code.
> The new decoder is just as fast as the original, at 170MB/s when
> the input contains no newlines.  However, when the input contains
> newlines, the new "base64 --decode" is more than 5 times faster than
> the old "base64 --decode --ignore-garbage" (163.6MB/s vs. 29.0MB/s).

Neat!  The --ignore-garbage was quite inefficient, so I'm not surprised.

> This is mainly to see if you're ok with the spirit of the change.
> I can imagine you may prefer not to change the existing base64_decode
> interfaces, and instead use a new pair of functions.  Let me know,
> and I'll finish the job.  E.g., I haven't updated the base64 documentation
> in coreutils yet, either.

Indeed, I think being able to base64 decode with just one API call is
important.  So I'd prefer not to change the API.  How about this
approach:

-extern void base64_decode_ctx_init (struct base64_decode_context *ctx);
-extern bool base64_decode (struct base64_decode_context *ctx,
-                          const char *restrict in, size_t inlen,
+extern bool base64_decode (const char *restrict in, size_t inlen,
                           char *restrict out, size_t *outlen);
 
-extern bool base64_decode_alloc (struct base64_decode_context *ctx,
-                                const char *in, size_t inlen,
+extern bool base64_decode_alloc (const char *in, size_t inlen,
                                 char **out, size_t *outlen);
 
+extern void base64_decode_ctx_init (struct base64_decode_context *ctx);
+extern bool base64_decode_ctx_work (struct base64_decode_context *ctx,
+                                   const char *restrict in, size_t inlen,
+                                   char *restrict out, size_t *outlen);
+extern bool base64_decode_ctx_alloc (struct base64_decode_context *ctx,
+                                    const char *in, size_t inlen,
+                                    char **out, size_t *outlen);
+

As for going forward, perhaps we could commit the coreutils base64.{c,h}
into gnulib, and then discuss particular patches that achieve:

  1) make sure the old non-context API continue to work.  It would be
     implemented by calling the context-init and then the context-work
     functions.

  2) make the old API not remove \n's, and,

  3) add a new API that silently removes \n's.  It can either take a
     'int flags' variable which can hold a BASE64_REMOVE_LF flag, or 0,
     or that the new function always remove \n's.  If we chose the flags
     approach, the old API could be implemented by calling this new API
     with a flag of 0.

What do you think?

/Simon

Re: base64.c vs. newlines

Reply via email to