On 08/09/19 16:44 +0300, Antony Polukhin wrote:
We've already beaten this topic to death, so let's put a final nail in
the coffin:
__to_chars_10_impl is quite fast. According to the IACA the main loop
takes only 6.0 cycles, the whole function with one iteration takes
10.0 cycles. Replacing the
We've already beaten this topic to death, so let's put a final nail in
the coffin:
__to_chars_10_impl is quite fast. According to the IACA the main loop
takes only 6.0 cycles, the whole function with one iteration takes
10.0 cycles. Replacing the __first[pos] and __first[pos - 1] with
__first[0]
On 30/08/19 17:08 +0100, Jonathan Wakely wrote:
On 30/08/19 17:01 +0100, Jonathan Wakely wrote:
On 30/08/19 17:27 +0300, Antony Polukhin wrote:
Bunch of micro optimizations for std::to_chars:
* For base == 8 replacing the lookup in __digits table with arithmetic
computations leads to a same CPU
On 30/08/19 22:46 +0300, Antony Polukhin wrote:
пт, 30 авг. 2019 г. в 19:01, Jonathan Wakely :
<...>
Have you tried comparing the improved code to libc++'s implementation?
I believe they use precomputed arrays of digits, but they use larger
arrays that allow 4 bytes to be written at once, which
пт, 30 авг. 2019 г. в 19:01, Jonathan Wakely :
<...>
> Have you tried comparing the improved code to libc++'s implementation?
> I believe they use precomputed arrays of digits, but they use larger
> arrays that allow 4 bytes to be written at once, which is considerably
> faster (and those precomput
On 30/08/19 11:03 -0600, Martin Sebor wrote:
On 8/30/19 8:27 AM, Antony Polukhin wrote:
Bunch of micro optimizations for std::to_chars:
* For base == 8 replacing the lookup in __digits table with arithmetic
computations leads to a same CPU cycles for a loop (exchanges two
movzx with 3 bit ops ht
On 8/30/19 8:27 AM, Antony Polukhin wrote:
Bunch of micro optimizations for std::to_chars:
* For base == 8 replacing the lookup in __digits table with arithmetic
computations leads to a same CPU cycles for a loop (exchanges two
movzx with 3 bit ops https://godbolt.org/z/RTui7m ). However this
sav
On 30/08/19 17:01 +0100, Jonathan Wakely wrote:
On 30/08/19 17:27 +0300, Antony Polukhin wrote:
Bunch of micro optimizations for std::to_chars:
* For base == 8 replacing the lookup in __digits table with arithmetic
computations leads to a same CPU cycles for a loop (exchanges two
movzx with 3 bi
On 30/08/19 17:27 +0300, Antony Polukhin wrote:
Bunch of micro optimizations for std::to_chars:
* For base == 8 replacing the lookup in __digits table with arithmetic
computations leads to a same CPU cycles for a loop (exchanges two
movzx with 3 bit ops https://godbolt.org/z/RTui7m ). However thi