On 30/08/19 22:46 +0300, Antony Polukhin wrote:
пт, 30 авг. 2019 г. в 19:01, Jonathan Wakely <jwak...@redhat.com>:
<...>
Have you tried comparing the improved code to libc++'s implementation?
I believe they use precomputed arrays of digits, but they use larger
arrays that allow 4 bytes to be written at once, which is considerably
faster (and those precomputed arrays libe in libc++.so not in the
header). Would we be better off keeping the precomputed arrays and
expanding them to do 4-byte writes?
This would not do good for bases 2, 8 and 16. For power of two bases
there is no costly `mod` or `div` instructions, only bit operations.
By increasing the digits table size the cache misses become more
likely.
OK, thanks. I'll try benchmarking your improved code against the
libc++ version next week.