Re: [go-nuts] Efficiently switch io.Reader to another decoder on error

'Brian Candler' via golang-nuts Tue, 14 Jan 2025 02:10:35 -0800

Sorry ignore that, I hadn't checked your playground link.

On Tuesday, 14 January 2025 at 10:07:53 UTC Brian Candler wrote:


> > AS I wrote earlier, I'm trying to avoid reading the entire email part 
> into memory to discover if I should use base64.StdEncoding or 
> base64.RawStdEncoding.
>
> As I asked before, why would you ever need to use RawStdEncoding? It just 
> means the MIME part was invalid, most likely corrupted/truncated.
>
> > One odd thing is that I'm getting extraneous newlines (shown by stars in 
> the output), eg:
>
> You are feeding two different inputs which do not differ by truncation 
> alone.
>
> % echo -n "Qm9uam91ciwgam95ZXV4IGxpb24K" | base64 -D | hexdump -c
> 0000000   B   o   n   j   o   u   r   ,       j   o   y   e   u   x
> 0000010   l   i   o   n  \n
> 0000015
>
> % echo -n "IkJvbmpvdXIsIGpveWV1eCBsaW9uIg==" | base64 -D | hexdump -c
> 0000000   "   B   o   n   j   o   u   r   ,       j   o   y   e   u   x
> 0000010       l   i   o   n   "
> 0000016
>
> The second one has encoded double-quotes before and after the content.
>
> On Monday, 13 January 2025 at 22:43:51 UTC Rory Campbell-Lange wrote:
>
>> AS I wrote earlier, I'm trying to avoid reading the entire email part 
>> into memory to discover if I should use base64.StdEncoding or 
>> base64.RawStdEncoding. 
>>
>> The following seems to work reasonably well: 
>>
>> type B64Translator struct { 
>> br *bufio.Reader 
>> } 
>>
>> func NewB64Translator(r io.Reader) *B64Translator { 
>> return &B64Translator{ 
>> br: bufio.NewReader(r), 
>> } 
>> } 
>>
>> // Read reads off the buffered reader expecting base64.StdEncoding bytes 
>> // with (potentially) 1-3 '=' padding characters at the end. 
>> // RawStdEncoding can be used for both StdEncoded and RawStdEncoded data 
>> // if the padding is removed. 
>> func (b *B64Translator) Read(p []byte) (n int, err error) { 
>> h := make([]byte, len(p)) 
>> n, err = b.br.Read(h) 
>> if err != nil { 
>> return n, err 
>> } 
>> // to be optimised 
>> c := bytes.Count(h, []byte("=")) 
>> copy(p, h[:n-c]) 
>> // fmt.Println(string(h), n, string(p), n-c) 
>> return n - c, nil 
>> } 
>>
>> https://go.dev/play/p/H6ii7Vy-8as 
>>
>> One odd thing is that I'm getting extraneous newlines (shown by stars in 
>> the output), eg: 
>>
>> -- 
>> raw: Bonjour joyeux lion 
>> Qm9uam91ciwgam95ZXV4IGxpb24K 
>> ok: false 
>> decoded: Bonjour, joyeux lion* <-------------------- e.g. here 
>> -- 
>> std: "Bonjour, joyeux lion" 
>> IkJvbmpvdXIsIGpveWV1eCBsaW9uIg== 
>> ok: true 
>> decoded: "Bonjour, joyeux lion" 
>> -- 
>>
>> Any thoughts on that would be gratefully received. 
>>
>> Rory 
>>
>>
>> On 13/01/25, Rory Campbell-Lange ([email protected]) wrote: 
>> > Thanks very much for the playground link and thoughts. 
>> > 
>> > The use case is reading base64 email parts, which could be of a very 
>> large size. It is unclear when processing these parts if they are base64 
>> padded or not. 
>> > 
>> > I'm trying to avoid reading the entire email part into memory. 
>> Consequently I think your earlier idea of adding padding (or removing it) 
>> in a wrapper could work. Perhaps wrapping the reader with another using a 
>> bufio.Reader to track bytes read and detect EOF. At EOF the wrapper could 
>> add padding if needed. 
>> > 
>> > Rory 
>> > 
>> > On 13/01/25, Axel Wagner ([email protected]) wrote: 
>> > > Just realized: If you twist the idea around, you get something easy 
>> to 
>> > > implement and more correct. 
>> > > Instead of stripping padding if it exist, you can ensure that the 
>> body *is* 
>> > > padded to a multiple of 4 bytes: https://go.dev/play/p/SsPRXV9ZfoS 
>> > > You can then feed that to base64.StdEncoding. If the wrapped Reader 
>> returns 
>> > > padded Base64, this does nothing. If it returns unpadded Base64, it 
>> adds 
>> > > padding. If it returns incorrect Base64, it will create a padded 
>> stream, 
>> > > that will then get rejected by the Base64 decoder. 
>> > > 
>> > > On Mon, 13 Jan 2025 at 10:31, Axel Wagner <[email protected]> 
>>
>> > > wrote: 
>> > > 
>> > > > Hi, 
>> > > > 
>> > > > one way to solve your problem is to wrap the body into an io.Reader 
>> that 
>> > > > strips off everything after the first `=` it finds. That can then 
>> be fed to 
>> > > > base64.RawStdEncoding. This approach requires no extra buffering or 
>> copying 
>> > > > and is easy to implement: https://go.dev/play/p/CwcVz7oietI 
>> > > > 
>> > > > The downside is, that this will not verify that the body is 
>> *either* 
>> > > > correctly padded Base64 *or* unpadded Base64. So, it will not 
>> report an 
>> > > > error if fed something like "AAA=garbage". 
>> > > > That can be remedied by buffering up to four bytes and, when 
>> encountering 
>> > > > an EOF, check that there are at most three trailing `=` and that 
>> the total 
>> > > > length of the stream is divisible by four. It's more finicky to 
>> implement, 
>> > > > but it should also be possible without any extra copies and only 
>> requires a 
>> > > > very small extra buffer. 
>> > > > 
>> > > > On Sun, 12 Jan 2025 at 22:29, Rory Campbell-Lange <
>> [email protected]> 
>> > > > wrote: 
>> > > > 
>> > > >> Thanks very much for the links, pointers and possible solution. 
>> > > >> 
>> > > >> Trying to read base64 standard (padded) encoded data with 
>> > > >> base64.RawStdEncoding can produce an error such as 
>> > > >> 
>> > > >> illegal base64 data at input byte <n> 
>> > > >> 
>> > > >> Reading base64 raw (unpadded) encoded data produces the EOF error. 
>> > > >> 
>> > > >> I'll go with trying to read the standard encoded data up to maybe 
>> 1MB and 
>> > > >> then switch to base64.RawStdEncoding if I hit the "illegal base64 
>> data" 
>> > > >> problem, maybe with reference to bufio.Reader which has most of 
>> the methods 
>> > > >> suggested below. 
>> > > >> 
>> > > >> Yes, the use of a "Rewind" method would be crucial. I guess this 
>> would 
>> > > >> need to: 
>> > > >> 1. error if more than one buffer of data has been read 
>> > > >> 2. else re-read from byte 0 
>> > > >> 
>> > > >> Thanks again very much for these suggestions. 
>> > > >> 
>> > > >> Rory 
>> > > >> 
>> > > >> On 12/01/25, robert engels ([email protected]) wrote: 
>> > > >> > Also, see this 
>> > > >> 
>> https://stackoverflow.com/questions/69753478/use-base64-stdencoding-or-base64-rawstdencoding-to-decode-base64-string-in-go
>>  
>> > > >> as I expected the error should be reported earlier than the end of 
>> stream 
>> > > >> if the chosen format is wrong. 
>> > > >> > 
>> > > >> > > On Jan 12, 2025, at 2:57 PM, robert engels <
>> [email protected]> 
>> > > >> wrote: 
>> > > >> > > 
>> > > >> > > Also, this is what Gemini provided which looks basically 
>> correct - 
>> > > >> but I think encapsulating it with a Rewind() method would be 
>> easier to 
>> > > >> understand. 
>> > > >> > > 
>> > > >> > > 
>> > > >> > > 
>> > > >> > > While Go doesn't have a built-in PushbackReader like some 
>> other 
>> > > >> languages (e.g., Java), you can implement similar functionality 
>> using a 
>> > > >> custom struct and a buffer. 
>> > > >> > > 
>> > > >> > > Here's an example implementation: 
>> > > >> > > 
>> > > >> > > package main 
>> > > >> > > 
>> > > >> > > import ( 
>> > > >> > > "bytes" 
>> > > >> > > "io" 
>> > > >> > > ) 
>> > > >> > > 
>> > > >> > > type PushbackReader struct { 
>> > > >> > > reader io.Reader 
>> > > >> > > buffer *bytes.Buffer 
>> > > >> > > } 
>> > > >> > > 
>> > > >> > > func NewPushbackReader(r io.Reader) *PushbackReader { 
>> > > >> > > return &PushbackReader{ 
>> > > >> > > reader: r, 
>> > > >> > > buffer: new(bytes.Buffer), 
>> > > >> > > } 
>> > > >> > > } 
>> > > >> > > 
>> > > >> > > func (p *PushbackReader) Read(b []byte) (n int, err error) { 
>> > > >> > > if p.buffer.Len() > 0 { 
>> > > >> > > return p.buffer.Read(b) 
>> > > >> > > } 
>> > > >> > > return p.reader.Read(b) 
>> > > >> > > } 
>> > > >> > > 
>> > > >> > > func (p *PushbackReader) UnreadByte() error { 
>> > > >> > > if p.buffer.Len() == 0 { 
>> > > >> > > return io.EOF 
>> > > >> > > } 
>> > > >> > > lastByte := p.buffer.Bytes()[p.buffer.Len()-1] 
>> > > >> > > p.buffer.Truncate(p.buffer.Len() - 1) 
>> > > >> > > p.buffer.WriteByte(lastByte) 
>> > > >> > > return nil 
>> > > >> > > } 
>> > > >> > > 
>> > > >> > > func (p *PushbackReader) Unread(buf []byte) error { 
>> > > >> > > if p.buffer.Len() == 0 { 
>> > > >> > > return io.EOF 
>> > > >> > > } 
>> > > >> > > p.buffer.Write(buf) 
>> > > >> > > return nil 
>> > > >> > > } 
>> > > >> > > 
>> > > >> > > func main() { 
>> > > >> > > // Example usage 
>> > > >> > > r := NewPushbackReader(bytes.NewBufferString("Hello, World!")) 
>> > > >> > > buf := make([]byte, 5) 
>> > > >> > > r.Read(buf) 
>> > > >> > > r.UnreadByte() 
>> > > >> > > r.Read(buf) 
>> > > >> > > } 
>> > > >> > > 
>> > > >> > > Explanation: 
>> > > >> > > PushbackReader struct: This struct holds the underlying 
>> io.Reader and 
>> > > >> a buffer to store the pushed-back bytes. 
>> > > >> > > NewPushbackReader: This function creates a new PushbackReader 
>> from an 
>> > > >> existing io.Reader. 
>> > > >> > > Read method: This method reads bytes from either the buffer 
>> (if it 
>> > > >> contains data) or the underlying reader. 
>> > > >> > > UnreadByte method: This method pushes back a single byte into 
>> the 
>> > > >> buffer. 
>> > > >> > > Unread method: This method pushes back a slice of bytes into 
>> the 
>> > > >> buffer. 
>> > > >> > > Important Considerations: 
>> > > >> > > The buffer size is not managed automatically. You may need to 
>> adjust 
>> > > >> the buffer size based on your use case. 
>> > > >> > > This implementation does not handle pushing back beyond the 
>> initially 
>> > > >> read data. If you need to support arbitrary pushback, you'll need 
>> a more 
>> > > >> complex solution. 
>> > > >> > > 
>> > > >> > > Generative AI is experimental. 
>> > > >> > > 
>> > > >> > >> On Jan 12, 2025, at 2:53 PM, Robert Engels <
>> [email protected]> 
>> > > >> wrote: 
>> > > >> > >> 
>> > > >> > >> You can see the two pass reader here 
>> > > >> 
>> https://stackoverflow.com/questions/20666594/how-can-i-push-bytes-into-a-reader-in-go
>>  
>> > > >> > >> 
>> > > >> > >> But yea, the basic premise is that you buffer the data so you 
>> can 
>> > > >> rewind if needed 
>> > > >> > >> 
>> > > >> > >> Are you certain it is reading to the end to return EOF? It 
>> may be 
>> > > >> returning eof once the parsing fails. 
>> > > >> > >> 
>> > > >> > >> Otherwise I would expect this is being decoded wrong - eg the 
>> mime 
>> > > >> type or encoding type should tell you the correct format before 
>> you start 
>> > > >> decoding. 
>> > > >> > >> 
>> > > >> > >>> On Jan 12, 2025, at 2:46 PM, Rory Campbell-Lange < 
>> > > >> [email protected]> wrote: 
>> > > >> > >>> 
>> > > >> > >>> Thanks for the suggestion of a ReadSeeker to wrap an 
>> io.Reader. 
>> > > >> > >>> 
>> > > >> > >>> My google fu must be deserting me. I can find PushbackReader 
>> > > >> implementations in Java, but the only similar thing for Go I could 
>> find was 
>> > > >> https://gitlab.com/osaki-lab/iowrapper. If you have a specific 
>> > > >> recommendation for a ReadSeeker wrapper to an io.Reader that would 
>> be great 
>> > > >> to know. 
>> > > >> > >>> 
>> > > >> > >>> Since the base64 decoding error I'm looking for is an EOF, I 
>> guess 
>> > > >> the wrapper approach will not work when the EOF byte position is > 
>> than the 
>> > > >> io.ReadSeeker buffer size. 
>> > > >> > >>> 
>> > > >> > >>> Rory 
>> > > >> > >>> 
>> > > >> > >>> On 12/01/25, robert engels ([email protected]) wrote: 
>> > > >> > >>>> create a ReadSeeker that wraps the Reader providing the 
>> buffering 
>> > > >> (mark & reset) - normally the buffer only needs to be large enough 
>> to 
>> > > >> detect the format contained in the Reader. 
>> > > >> > >>>> 
>> > > >> > >>>> You can search Google for PushbackReader in Go and you’ll 
>> get a 
>> > > >> basic implementation. 
>> > > >> > >>>> 
>> > > >> > >>>>> On Jan 12, 2025, at 12:52 PM, Rory Campbell-Lange < 
>> > > >> [email protected]> wrote: 
>> > > >> > >>> ... 
>> > > >> > >>>>> I'm attempting to rationalise the process [of avoiding 
>> reading 
>> > > >> email parts into byte slices] by simply wrapping the provided 
>> io.Reader 
>> > > >> with the necessary decoders to reduce memory usage and unnecessary 
>> > > >> processing. 
>> > > >> > >>>>> 
>> > > >> > >>>>> The wrapping strategy seems to work ok. However there is a 
>> > > >> particular issue in detecting base64.StdEncoding versus 
>> > > >> base64.RawStdEncoding, which requires draining the io.Reader using 
>> > > >> base64.StdEncoding and (based on the current implementation) 
>> switching to 
>> > > >> base64.RawStdEncoding if an io.ErrUnexpectedEOF is found. 
>> > > >> > >>>>> 
>> > > >> > >> 
>> > > >> > >> 
>> > > >> > >> -- 
>> > > >> > >> You received this message because you are subscribed to the 
>> Google 
>> > > >> Groups "golang-nuts" group. 
>> > > >> > >> To unsubscribe from this group and stop receiving emails from 
>> it, 
>> > > >> send an email to [email protected] <mailto: 
>> > > >> [email protected]>. 
>> > > >> > >> To view this discussion visit 
>> > > >> 
>> https://groups.google.com/d/msgid/golang-nuts/DD0C1480-D237-447A-B978-78FC8951FE05%40ix.netcom.com
>>  
>> > > >> < 
>> > > >> 
>> https://groups.google.com/d/msgid/golang-nuts/DD0C1480-D237-447A-B978-78FC8951FE05%40ix.netcom.com?utm_medium=email&utm_source=footer
>>  
>> > > >> >. 
>> > > >> > > 
>> > > >> > 
>> > > >> 
>> > > >> -- 
>> > > >> You received this message because you are subscribed to the Google 
>> Groups 
>> > > >> "golang-nuts" group. 
>> > > >> To unsubscribe from this group and stop receiving emails from it, 
>> send an 
>> > > >> email to [email protected]. 
>> > > >> To view this discussion visit 
>> > > >> 
>> https://groups.google.com/d/msgid/golang-nuts/Z4Q0AFRkkoNH52_B%40campbell-lange.net
>>  
>> > > >> . 
>> > > >> 
>> > > > 
>> > 
>> > -- 
>> > You received this message because you are subscribed to the Google 
>> Groups "golang-nuts" group. 
>> > To unsubscribe from this group and stop receiving emails from it, send 
>> an email to [email protected]. 
>> > To view this discussion visit 
>> https://groups.google.com/d/msgid/golang-nuts/Z4UQYJmuk7Oe6xSG%40campbell-lange.net.
>>  
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/golang-nuts/8f0e0fab-0cb9-4176-beda-758ca5d62653n%40googlegroups.com.

Re: [go-nuts] Efficiently switch io.Reader to another decoder on error

Reply via email to