Jim Meyering <[EMAIL PROTECTED]> writes: > Hi Simon, > > coreutils' `base64 --decode's inability to decode its own encoded > output (without the sledgehammer of --ignore-garbage) finally got > to me :-)
Hi Jim! Finally, reviewing this got to me. :-) > So I bit the bullet and changed gnulib's base64.c and coreutils' > src/base64.c so they decode embedded newlines unconditionally. While that behaviour is likely better for src/base64.c (i.e., the command line interface), it is not a good thing for lib/base64.c, where the changes is done now. I would prefer that this behaviour was optional in the library part. Either a flags parameter, or a new API? > Note that it doesn't yet do anything about the examples > in comments at the top of lib/base64.c -- IMHO, examples > belong in test cases that can be checked automatically, so I'd > like to remove those. Ok with you? Do you refer to the following snippet? This was intended as documentation for using the API, and I think it is useful to have it somewhere. I don't regard it as an example. The error handling interaction is a bit tricky to get right. * Be careful with error checking. Here is how you would typically * use these functions: * * bool ok = base64_decode_alloc (in, inlen, &out, &outlen); * if (!ok) * FAIL: input was not valid base64 * if (out == NULL) * FAIL: memory allocation error * OK: data in OUT/OUTLEN * * size_t outlen = base64_encode_alloc (in, inlen, &out); * if (out == NULL && outlen == 0 && inlen != 0) * FAIL: input too long * if (out == NULL) * FAIL: memory allocation error * OK: data in OUT/OUTLEN. > Note also that I haven't yet updated tests/test-base64.c. > However, I have added quite a few to coreutils' tests/misc/base64. I don't think merging all of these back is important. But I've just noticed that there is no modules/base64-tests in gnulib, so the self test that we have is never invoked. I'm fixing that separately. > I've paid careful attention to the efficiency of the resulting code. > The new decoder is just as fast as the original, at 170MB/s when > the input contains no newlines. However, when the input contains > newlines, the new "base64 --decode" is more than 5 times faster than > the old "base64 --decode --ignore-garbage" (163.6MB/s vs. 29.0MB/s). Neat! The --ignore-garbage was quite inefficient, so I'm not surprised. > This is mainly to see if you're ok with the spirit of the change. > I can imagine you may prefer not to change the existing base64_decode > interfaces, and instead use a new pair of functions. Let me know, > and I'll finish the job. E.g., I haven't updated the base64 documentation > in coreutils yet, either. Indeed, I think being able to base64 decode with just one API call is important. So I'd prefer not to change the API. How about this approach: -extern void base64_decode_ctx_init (struct base64_decode_context *ctx); -extern bool base64_decode (struct base64_decode_context *ctx, - const char *restrict in, size_t inlen, +extern bool base64_decode (const char *restrict in, size_t inlen, char *restrict out, size_t *outlen); -extern bool base64_decode_alloc (struct base64_decode_context *ctx, - const char *in, size_t inlen, +extern bool base64_decode_alloc (const char *in, size_t inlen, char **out, size_t *outlen); +extern void base64_decode_ctx_init (struct base64_decode_context *ctx); +extern bool base64_decode_ctx_work (struct base64_decode_context *ctx, + const char *restrict in, size_t inlen, + char *restrict out, size_t *outlen); +extern bool base64_decode_ctx_alloc (struct base64_decode_context *ctx, + const char *in, size_t inlen, + char **out, size_t *outlen); + As for going forward, perhaps we could commit the coreutils base64.{c,h} into gnulib, and then discuss particular patches that achieve: 1) make sure the old non-context API continue to work. It would be implemented by calling the context-init and then the context-work functions. 2) make the old API not remove \n's, and, 3) add a new API that silently removes \n's. It can either take a 'int flags' variable which can hold a BASE64_REMOVE_LF flag, or 0, or that the new function always remove \n's. If we chose the flags approach, the old API could be implemented by calling this new API with a flag of 0. What do you think? /Simon