>> In this patch, I chose to try to handle weird leb128 encodings by >> preserving their values when possible; or returning the maximum value >> on overflow. It isn't clear to me that this is necessarily the best >> choice.
Mark> I think it is a good thing to accept "padded" leb128 numbers, but Mark> probably up to a point. The padding will probably be either mod-4 or Mark> mod-8 bytes. So lets at most read up to 16 bytes total. Now I think all this is just overkill. Actually-existing leb128 relocations (see commit 7d81bc93 in binutils) seem to shrink the representation. There was some talk in the Rust leb128 decoder issues about this topic, but that was a wish-list thing possibly for wasm -- so not exactly a need. >> + const unsigned mask = (1u << (8 * sizeof (acc) - 7 * max)) - 1; Mark> Shouldn't this be a const unsigned char? The C "usual arithmetic conversions" will promote for the "&" anyway. Mark> Is the really just (1u << (8 * 8 - 7 * 9)) - 1 = (1u << 1) - 1 = 1 Mark> So we are really only interested in the lowest bit? Yeah. We could hard-code that if you prefer, but I wrote it this way to pretend that maybe the types in the code would be changed (widened) some day. >> + if (unlikely (value == UINT64_MAX)) >> + return INT64_MAX; Mark> Here you need to help me out a bit, wouldn't UINT64_MAX also be the Mark> representation of -1? Yes. Normally in sleb128 I guess -1 would be written {0x7f}. However one could write it with any number of 0xff prefix bytes first. This weirdness is one reason I changed my mind on handling over-long leb128 sequences. I guess the only solution here is to write 2 copies of each function. Mark> But what I don't fully understand is how this works with a "padded" Mark> leb128, won't we possibly forget to negate in that case? It isn't clear to me what to do when a sleb128 specifies more than 64 bits. Another reason perhaps to avoid this area. Tom