Sorry ignore that, I hadn't checked your playground link.
On Tuesday, 14 January 2025 at 10:07:53 UTC Brian Candler wrote:
> > AS I wrote earlier, I'm trying to avoid reading the entire email part
> into memory to discover if I should use base64.StdEncoding or
> base64.RawStdEncoding.
>
> As I asked before, why would you ever need to use RawStdEncoding? It just
> means the MIME part was invalid, most likely corrupted/truncated.
>
> > One odd thing is that I'm getting extraneous newlines (shown by stars in
> the output), eg:
>
> You are feeding two different inputs which do not differ by truncation
> alone.
>
> % echo -n "Qm9uam91ciwgam95ZXV4IGxpb24K" | base64 -D | hexdump -c
> 0000000 B o n j o u r , j o y e u x
> 0000010 l i o n \n
> 0000015
>
> % echo -n "IkJvbmpvdXIsIGpveWV1eCBsaW9uIg==" | base64 -D | hexdump -c
> 0000000 " B o n j o u r , j o y e u x
> 0000010 l i o n "
> 0000016
>
> The second one has encoded double-quotes before and after the content.
>
> On Monday, 13 January 2025 at 22:43:51 UTC Rory Campbell-Lange wrote:
>
>> AS I wrote earlier, I'm trying to avoid reading the entire email part
>> into memory to discover if I should use base64.StdEncoding or
>> base64.RawStdEncoding.
>>
>> The following seems to work reasonably well:
>>
>> type B64Translator struct {
>> br *bufio.Reader
>> }
>>
>> func NewB64Translator(r io.Reader) *B64Translator {
>> return &B64Translator{
>> br: bufio.NewReader(r),
>> }
>> }
>>
>> // Read reads off the buffered reader expecting base64.StdEncoding bytes
>> // with (potentially) 1-3 '=' padding characters at the end.
>> // RawStdEncoding can be used for both StdEncoded and RawStdEncoded data
>> // if the padding is removed.
>> func (b *B64Translator) Read(p []byte) (n int, err error) {
>> h := make([]byte, len(p))
>> n, err = b.br.Read(h)
>> if err != nil {
>> return n, err
>> }
>> // to be optimised
>> c := bytes.Count(h, []byte("="))
>> copy(p, h[:n-c])
>> // fmt.Println(string(h), n, string(p), n-c)
>> return n - c, nil
>> }
>>
>> https://go.dev/play/p/H6ii7Vy-8as
>>
>> One odd thing is that I'm getting extraneous newlines (shown by stars in
>> the output), eg:
>>
>> --
>> raw: Bonjour joyeux lion
>> Qm9uam91ciwgam95ZXV4IGxpb24K
>> ok: false
>> decoded: Bonjour, joyeux lion* <-------------------- e.g. here
>> --
>> std: "Bonjour, joyeux lion"
>> IkJvbmpvdXIsIGpveWV1eCBsaW9uIg==
>> ok: true
>> decoded: "Bonjour, joyeux lion"
>> --
>>
>> Any thoughts on that would be gratefully received.
>>
>> Rory
>>
>>
>> On 13/01/25, Rory Campbell-Lange ([email protected]) wrote:
>> > Thanks very much for the playground link and thoughts.
>> >
>> > The use case is reading base64 email parts, which could be of a very
>> large size. It is unclear when processing these parts if they are base64
>> padded or not.
>> >
>> > I'm trying to avoid reading the entire email part into memory.
>> Consequently I think your earlier idea of adding padding (or removing it)
>> in a wrapper could work. Perhaps wrapping the reader with another using a
>> bufio.Reader to track bytes read and detect EOF. At EOF the wrapper could
>> add padding if needed.
>> >
>> > Rory
>> >
>> > On 13/01/25, Axel Wagner ([email protected]) wrote:
>> > > Just realized: If you twist the idea around, you get something easy
>> to
>> > > implement and more correct.
>> > > Instead of stripping padding if it exist, you can ensure that the
>> body *is*
>> > > padded to a multiple of 4 bytes: https://go.dev/play/p/SsPRXV9ZfoS
>> > > You can then feed that to base64.StdEncoding. If the wrapped Reader
>> returns
>> > > padded Base64, this does nothing. If it returns unpadded Base64, it
>> adds
>> > > padding. If it returns incorrect Base64, it will create a padded
>> stream,
>> > > that will then get rejected by the Base64 decoder.
>> > >
>> > > On Mon, 13 Jan 2025 at 10:31, Axel Wagner <[email protected]>
>>
>> > > wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > one way to solve your problem is to wrap the body into an io.Reader
>> that
>> > > > strips off everything after the first `=` it finds. That can then
>> be fed to
>> > > > base64.RawStdEncoding. This approach requires no extra buffering or
>> copying
>> > > > and is easy to implement: https://go.dev/play/p/CwcVz7oietI
>> > > >
>> > > > The downside is, that this will not verify that the body is
>> *either*
>> > > > correctly padded Base64 *or* unpadded Base64. So, it will not
>> report an
>> > > > error if fed something like "AAA=garbage".
>> > > > That can be remedied by buffering up to four bytes and, when
>> encountering
>> > > > an EOF, check that there are at most three trailing `=` and that
>> the total
>> > > > length of the stream is divisible by four. It's more finicky to
>> implement,
>> > > > but it should also be possible without any extra copies and only
>> requires a
>> > > > very small extra buffer.
>> > > >
>> > > > On Sun, 12 Jan 2025 at 22:29, Rory Campbell-Lange <
>> [email protected]>
>> > > > wrote:
>> > > >
>> > > >> Thanks very much for the links, pointers and possible solution.
>> > > >>
>> > > >> Trying to read base64 standard (padded) encoded data with
>> > > >> base64.RawStdEncoding can produce an error such as
>> > > >>
>> > > >> illegal base64 data at input byte <n>
>> > > >>
>> > > >> Reading base64 raw (unpadded) encoded data produces the EOF error.
>> > > >>
>> > > >> I'll go with trying to read the standard encoded data up to maybe
>> 1MB and
>> > > >> then switch to base64.RawStdEncoding if I hit the "illegal base64
>> data"
>> > > >> problem, maybe with reference to bufio.Reader which has most of
>> the methods
>> > > >> suggested below.
>> > > >>
>> > > >> Yes, the use of a "Rewind" method would be crucial. I guess this
>> would
>> > > >> need to:
>> > > >> 1. error if more than one buffer of data has been read
>> > > >> 2. else re-read from byte 0
>> > > >>
>> > > >> Thanks again very much for these suggestions.
>> > > >>
>> > > >> Rory
>> > > >>
>> > > >> On 12/01/25, robert engels ([email protected]) wrote:
>> > > >> > Also, see this
>> > > >>
>> https://stackoverflow.com/questions/69753478/use-base64-stdencoding-or-base64-rawstdencoding-to-decode-base64-string-in-go
>>
>> > > >> as I expected the error should be reported earlier than the end of
>> stream
>> > > >> if the chosen format is wrong.
>> > > >> >
>> > > >> > > On Jan 12, 2025, at 2:57 PM, robert engels <
>> [email protected]>
>> > > >> wrote:
>> > > >> > >
>> > > >> > > Also, this is what Gemini provided which looks basically
>> correct -
>> > > >> but I think encapsulating it with a Rewind() method would be
>> easier to
>> > > >> understand.
>> > > >> > >
>> > > >> > >
>> > > >> > >
>> > > >> > > While Go doesn't have a built-in PushbackReader like some
>> other
>> > > >> languages (e.g., Java), you can implement similar functionality
>> using a
>> > > >> custom struct and a buffer.
>> > > >> > >
>> > > >> > > Here's an example implementation:
>> > > >> > >
>> > > >> > > package main
>> > > >> > >
>> > > >> > > import (
>> > > >> > > "bytes"
>> > > >> > > "io"
>> > > >> > > )
>> > > >> > >
>> > > >> > > type PushbackReader struct {
>> > > >> > > reader io.Reader
>> > > >> > > buffer *bytes.Buffer
>> > > >> > > }
>> > > >> > >
>> > > >> > > func NewPushbackReader(r io.Reader) *PushbackReader {
>> > > >> > > return &PushbackReader{
>> > > >> > > reader: r,
>> > > >> > > buffer: new(bytes.Buffer),
>> > > >> > > }
>> > > >> > > }
>> > > >> > >
>> > > >> > > func (p *PushbackReader) Read(b []byte) (n int, err error) {
>> > > >> > > if p.buffer.Len() > 0 {
>> > > >> > > return p.buffer.Read(b)
>> > > >> > > }
>> > > >> > > return p.reader.Read(b)
>> > > >> > > }
>> > > >> > >
>> > > >> > > func (p *PushbackReader) UnreadByte() error {
>> > > >> > > if p.buffer.Len() == 0 {
>> > > >> > > return io.EOF
>> > > >> > > }
>> > > >> > > lastByte := p.buffer.Bytes()[p.buffer.Len()-1]
>> > > >> > > p.buffer.Truncate(p.buffer.Len() - 1)
>> > > >> > > p.buffer.WriteByte(lastByte)
>> > > >> > > return nil
>> > > >> > > }
>> > > >> > >
>> > > >> > > func (p *PushbackReader) Unread(buf []byte) error {
>> > > >> > > if p.buffer.Len() == 0 {
>> > > >> > > return io.EOF
>> > > >> > > }
>> > > >> > > p.buffer.Write(buf)
>> > > >> > > return nil
>> > > >> > > }
>> > > >> > >
>> > > >> > > func main() {
>> > > >> > > // Example usage
>> > > >> > > r := NewPushbackReader(bytes.NewBufferString("Hello, World!"))
>> > > >> > > buf := make([]byte, 5)
>> > > >> > > r.Read(buf)
>> > > >> > > r.UnreadByte()
>> > > >> > > r.Read(buf)
>> > > >> > > }
>> > > >> > >
>> > > >> > > Explanation:
>> > > >> > > PushbackReader struct: This struct holds the underlying
>> io.Reader and
>> > > >> a buffer to store the pushed-back bytes.
>> > > >> > > NewPushbackReader: This function creates a new PushbackReader
>> from an
>> > > >> existing io.Reader.
>> > > >> > > Read method: This method reads bytes from either the buffer
>> (if it
>> > > >> contains data) or the underlying reader.
>> > > >> > > UnreadByte method: This method pushes back a single byte into
>> the
>> > > >> buffer.
>> > > >> > > Unread method: This method pushes back a slice of bytes into
>> the
>> > > >> buffer.
>> > > >> > > Important Considerations:
>> > > >> > > The buffer size is not managed automatically. You may need to
>> adjust
>> > > >> the buffer size based on your use case.
>> > > >> > > This implementation does not handle pushing back beyond the
>> initially
>> > > >> read data. If you need to support arbitrary pushback, you'll need
>> a more
>> > > >> complex solution.
>> > > >> > >
>> > > >> > > Generative AI is experimental.
>> > > >> > >
>> > > >> > >> On Jan 12, 2025, at 2:53 PM, Robert Engels <
>> [email protected]>
>> > > >> wrote:
>> > > >> > >>
>> > > >> > >> You can see the two pass reader here
>> > > >>
>> https://stackoverflow.com/questions/20666594/how-can-i-push-bytes-into-a-reader-in-go
>>
>> > > >> > >>
>> > > >> > >> But yea, the basic premise is that you buffer the data so you
>> can
>> > > >> rewind if needed
>> > > >> > >>
>> > > >> > >> Are you certain it is reading to the end to return EOF? It
>> may be
>> > > >> returning eof once the parsing fails.
>> > > >> > >>
>> > > >> > >> Otherwise I would expect this is being decoded wrong - eg the
>> mime
>> > > >> type or encoding type should tell you the correct format before
>> you start
>> > > >> decoding.
>> > > >> > >>
>> > > >> > >>> On Jan 12, 2025, at 2:46 PM, Rory Campbell-Lange <
>> > > >> [email protected]> wrote:
>> > > >> > >>>
>> > > >> > >>> Thanks for the suggestion of a ReadSeeker to wrap an
>> io.Reader.
>> > > >> > >>>
>> > > >> > >>> My google fu must be deserting me. I can find PushbackReader
>> > > >> implementations in Java, but the only similar thing for Go I could
>> find was
>> > > >> https://gitlab.com/osaki-lab/iowrapper. If you have a specific
>> > > >> recommendation for a ReadSeeker wrapper to an io.Reader that would
>> be great
>> > > >> to know.
>> > > >> > >>>
>> > > >> > >>> Since the base64 decoding error I'm looking for is an EOF, I
>> guess
>> > > >> the wrapper approach will not work when the EOF byte position is >
>> than the
>> > > >> io.ReadSeeker buffer size.
>> > > >> > >>>
>> > > >> > >>> Rory
>> > > >> > >>>
>> > > >> > >>> On 12/01/25, robert engels ([email protected]) wrote:
>> > > >> > >>>> create a ReadSeeker that wraps the Reader providing the
>> buffering
>> > > >> (mark & reset) - normally the buffer only needs to be large enough
>> to
>> > > >> detect the format contained in the Reader.
>> > > >> > >>>>
>> > > >> > >>>> You can search Google for PushbackReader in Go and you’ll
>> get a
>> > > >> basic implementation.
>> > > >> > >>>>
>> > > >> > >>>>> On Jan 12, 2025, at 12:52 PM, Rory Campbell-Lange <
>> > > >> [email protected]> wrote:
>> > > >> > >>> ...
>> > > >> > >>>>> I'm attempting to rationalise the process [of avoiding
>> reading
>> > > >> email parts into byte slices] by simply wrapping the provided
>> io.Reader
>> > > >> with the necessary decoders to reduce memory usage and unnecessary
>> > > >> processing.
>> > > >> > >>>>>
>> > > >> > >>>>> The wrapping strategy seems to work ok. However there is a
>> > > >> particular issue in detecting base64.StdEncoding versus
>> > > >> base64.RawStdEncoding, which requires draining the io.Reader using
>> > > >> base64.StdEncoding and (based on the current implementation)
>> switching to
>> > > >> base64.RawStdEncoding if an io.ErrUnexpectedEOF is found.
>> > > >> > >>>>>
>> > > >> > >>
>> > > >> > >>
>> > > >> > >> --
>> > > >> > >> You received this message because you are subscribed to the
>> Google
>> > > >> Groups "golang-nuts" group.
>> > > >> > >> To unsubscribe from this group and stop receiving emails from
>> it,
>> > > >> send an email to [email protected] <mailto:
>> > > >> [email protected]>.
>> > > >> > >> To view this discussion visit
>> > > >>
>> https://groups.google.com/d/msgid/golang-nuts/DD0C1480-D237-447A-B978-78FC8951FE05%40ix.netcom.com
>>
>> > > >> <
>> > > >>
>> https://groups.google.com/d/msgid/golang-nuts/DD0C1480-D237-447A-B978-78FC8951FE05%40ix.netcom.com?utm_medium=email&utm_source=footer
>>
>> > > >> >.
>> > > >> > >
>> > > >> >
>> > > >>
>> > > >> --
>> > > >> You received this message because you are subscribed to the Google
>> Groups
>> > > >> "golang-nuts" group.
>> > > >> To unsubscribe from this group and stop receiving emails from it,
>> send an
>> > > >> email to [email protected].
>> > > >> To view this discussion visit
>> > > >>
>> https://groups.google.com/d/msgid/golang-nuts/Z4Q0AFRkkoNH52_B%40campbell-lange.net
>>
>> > > >> .
>> > > >>
>> > > >
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups "golang-nuts" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an email to [email protected].
>> > To view this discussion visit
>> https://groups.google.com/d/msgid/golang-nuts/Z4UQYJmuk7Oe6xSG%40campbell-lange.net.
>>
>>
>>
>
--
You received this message because you are subscribed to the Google Groups
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/golang-nuts/8f0e0fab-0cb9-4176-beda-758ca5d62653n%40googlegroups.com.