>> In this patch, I chose to try to handle weird leb128 encodings by
>> preserving their values when possible; or returning the maximum value
>> on overflow.  It isn't clear to me that this is necessarily the best
>> choice.

Mark> I think it is a good thing to accept "padded" leb128 numbers, but
Mark> probably up to a point. The padding will probably be either mod-4 or
Mark> mod-8 bytes. So lets at most read up to 16 bytes total.

Now I think all this is just overkill.  Actually-existing leb128
relocations (see commit 7d81bc93 in binutils) seem to shrink the
representation.  There was some talk in the Rust leb128 decoder issues
about this topic, but that was a wish-list thing possibly for wasm -- so
not exactly a need.

>> +      const unsigned mask = (1u << (8 * sizeof (acc) - 7 * max)) - 1;

Mark> Shouldn't this be a const unsigned char?

The C "usual arithmetic conversions" will promote for the "&" anyway.

Mark> Is the really just (1u << (8 * 8 - 7 * 9)) - 1 = (1u << 1) - 1 = 1
Mark> So we are really only interested in the lowest bit?

Yeah.  We could hard-code that if you prefer, but I wrote it this way to
pretend that maybe the types in the code would be changed (widened) some
day.

>> +  if (unlikely (value == UINT64_MAX))
>> +    return INT64_MAX;

Mark> Here you need to help me out a bit, wouldn't UINT64_MAX also be the
Mark> representation of -1?

Yes.  Normally in sleb128 I guess -1 would be written {0x7f}.  However
one could write it with any number of 0xff prefix bytes first.  This
weirdness is one reason I changed my mind on handling over-long leb128
sequences.

I guess the only solution here is to write 2 copies of each function.

Mark> But what I don't fully understand is how this works with a "padded"
Mark> leb128, won't we possibly forget to negate in that case?

It isn't clear to me what to do when a sleb128 specifies more than 64
bits.  Another reason perhaps to avoid this area.

Tom

Reply via email to