[clang] [compiler-rt] [ubsan] Display correct runtime messages for negative _BitInt (PR #96240)

Jakub Jelínek via cfe-commits Mon, 02 Sep 2024 07:12:11 -0700

jakubjelinek wrote:

I'm not suggesting to encode the number of limbs anywhere, I'm suggesting 
encoding the bit precision of a limb somewhere.  And the limb ordering.
On little endian of bits in a limb and little endian ordering of limbs in the 
limb array, at least if the limbs are sane (have precision multiple of char 
precision and there are no padding bits in between), the actual limb precision 
might seem to be irrelevant, all you care about is the N from {,{un,}signed 
}_BitInt(N) and whether it is unsigned or signed, so you can treat the passed 
pointer say as an array of 8-bit limbs, N / 8 limbs with full 8 bits and if N % 
8, the last limb containing some further bits (in some ABIs that will be 
required to be sign or zero extended, in other ABIs the padding bits will be 
undefined, but on the libubsan side you can always treat them as undefined and 
always manually extend).
Or treat it as 16-bit limbs, or 32-bit limbs, or 64-bit limbs, or 128-bit 
limbs, for the higher perhaps with doing the limb reads again using 
internal_memcpy so that you don't impose some alignment requirement perhaps the 
target doesn't have.
But on big-endian, I think knowing the limb precision/size is already essential 
(sure, just a theory for now, GCC right only only supports _BitInt on 
little-endian targets because those are the only ones that have specified their 
ABI).
E.g. I believe _BitInt(513) big-endian with big-endian limb ordering would be 
for 32-bit limbs 17 limbs, the first one containing just one bit (the most 
significant of the whole number) and the remaining ones each 32 bits, while for 
64-bit limbs 9 limbs, the first one containing just one bit and the remaining 
ones 64 bits each; and for 128-bit limbs 5 limbs, the first one just one bit, 
the remaining 128 bits each.  You can't decode these without knowing the limb 
size, the data looks different in memory.
And then there is the possibility of big-endian limbs with little-endian 
ordering of the limbs in the array.
As the 15 bits of the current precision used e.g. for normal integers is 
clearly insufficient  to express supported BITINT_MAXWIDTH (8388608 in clang, 
65535 right now in GCC), my suggestion is to use another bit for the limb 
ordering
(say 0 little endian, 1 big endian) and the reaming 14 bits for the limb 
precision (whether log2 or not doesn't matter that much).
As for the actual _BitInt precision after the type name, one option is what you 
currently implemented, i.e. always use 32-bit integer in memory there, plus the 
extra '\0' termination if you really think it is needed, IMHO it is just a 
waste, and another
option is to use say uleb128 encoding of it.


https://github.com/llvm/llvm-project/pull/96240
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [compiler-rt] [ubsan] Display correct runtime messages for negative _BitInt (PR #96240)

Reply via email to