Hello Marc, > > (A) Using a pipe at the shell level: > > iconv -t UTF-8 | my-program > > > > (B) Using a programming language that has a coroutines concept. > > This way, both the decoder and the consumer can be programmed in > > a straightforward manner. > > > > (C) In C, with multiple threads. > > > > (D) In C, with a decoder programmed in a straightforward manner > > and a consumer that is written as a callback with state. > > > > (E) In C, with a decoder written as a callback with state > > and a consumer programmed in a straightforward manner. > > > > > Thus, I am wondering whether it makes sense to offer a stateful > > > decoder that takes byte by byte and signals as soon as a decoded byte > > > sequence is ready. > > > > It seems that you are thinking of approach (D). > > > I think (D) is the worst, because writing application code in a callback > > style with state is hard and error-prone. I would favour (E) instead, > > if (A) is not possible. > > If I understand your classification correctly, I meant something more > like (E) than (D), I think. As an interface, I would propose would be > something along the following lines: > > decoder_t d = decoder_create (iconveh_t *cd); > switch (decoder_push (d, byte)) > { > case DECODER_BYTE_READ: > char *res = decoder_result (d); > size_t len = decoder_length (d); > ...
What does the programmer do here with res and len? This is where things get complex. > case DECODER_EOF: > ... > case DECODER_INCOMPLETE: > ... > case DECODER_ERROR: > ... > } > ... > decoder_destroy (d); What you describe here is (D), in my view. (E) would look like this: extern decoder_t create_decoder_context (void); extern void push_bytes_into_decoder (const char *p, size_t n, decoder_t); extern void free_decoder_context (decoder_t); > > (B) means to use a different programming language. I can't recommend C++ > > [1]. > > The main problem I see with C++'s coroutines is that they are > stackless coroutines; their expressiveness is tiny compared to > languages with full coroutine support, to say nothing of programming > languages like Scheme with its first-class continuations. It doesn't surprise me. 'constexpr', another new addition to C++, similarly does only a fraction of what would be useful. > > (C) is possible, but complex. See e.g. gnulib's pipe-filter-ii.c or > > pipe-filter-gi.c. Generally, threads are overkill when all you need are > > coroutines. > > I agree. Unfortunately, Posix's response to dropping makecontext and > friends seems to be to use threads. It would be great if C had a > lightweight context-swapping mechanism. Maybe. I think setcontext() has a severe problem; see <https://www.gnu.org/software/gnulib/manual/html_node/setcontext.html>. > By the way, libunistring's u??_conv_from_encoding does not seem to be > adapted to consuming buffers. The problem is that one doesn't know in > advance where boundaries of multi-byte sequences are so > u??_conv_from_encoding will likely signal a decoding error. Yes, u??_conv_from_encoding is made for converting entire strings. If you want to restart conversion after some bytes that are part of a multibyte character, you need the low-level iconv(). Bruno