Re: [PATCH] Optimize to_chars

2019-09-09 Thread Jonathan Wakely
On 08/09/19 16:44 +0300, Antony Polukhin wrote: We've already beaten this topic to death, so let's put a final nail in the coffin: __to_chars_10_impl is quite fast. According to the IACA the main loop takes only 6.0 cycles, the whole function with one iteration takes 10.0 cycles. Replacing the

Re: [PATCH] Optimize to_chars

2019-09-08 Thread Antony Polukhin
We've already beaten this topic to death, so let's put a final nail in the coffin: __to_chars_10_impl is quite fast. According to the IACA the main loop takes only 6.0 cycles, the whole function with one iteration takes 10.0 cycles. Replacing the __first[pos] and __first[pos - 1] with __first[0]

Re: [PATCH] Optimize to_chars

2019-09-02 Thread Jonathan Wakely
On 30/08/19 17:08 +0100, Jonathan Wakely wrote: On 30/08/19 17:01 +0100, Jonathan Wakely wrote: On 30/08/19 17:27 +0300, Antony Polukhin wrote: Bunch of micro optimizations for std::to_chars: * For base == 8 replacing the lookup in __digits table with arithmetic computations leads to a same CPU

Re: [PATCH] Optimize to_chars

2019-08-30 Thread Jonathan Wakely
On 30/08/19 22:46 +0300, Antony Polukhin wrote: пт, 30 авг. 2019 г. в 19:01, Jonathan Wakely : <...> Have you tried comparing the improved code to libc++'s implementation? I believe they use precomputed arrays of digits, but they use larger arrays that allow 4 bytes to be written at once, which

Re: [PATCH] Optimize to_chars

2019-08-30 Thread Antony Polukhin
пт, 30 авг. 2019 г. в 19:01, Jonathan Wakely : <...> > Have you tried comparing the improved code to libc++'s implementation? > I believe they use precomputed arrays of digits, but they use larger > arrays that allow 4 bytes to be written at once, which is considerably > faster (and those precomput

Re: [PATCH] Optimize to_chars

2019-08-30 Thread Jonathan Wakely
On 30/08/19 11:03 -0600, Martin Sebor wrote: On 8/30/19 8:27 AM, Antony Polukhin wrote: Bunch of micro optimizations for std::to_chars: * For base == 8 replacing the lookup in __digits table with arithmetic computations leads to a same CPU cycles for a loop (exchanges two movzx with 3 bit ops ht

Re: [PATCH] Optimize to_chars

2019-08-30 Thread Martin Sebor
On 8/30/19 8:27 AM, Antony Polukhin wrote: Bunch of micro optimizations for std::to_chars: * For base == 8 replacing the lookup in __digits table with arithmetic computations leads to a same CPU cycles for a loop (exchanges two movzx with 3 bit ops https://godbolt.org/z/RTui7m ). However this sav

Re: [PATCH] Optimize to_chars

2019-08-30 Thread Jonathan Wakely
On 30/08/19 17:01 +0100, Jonathan Wakely wrote: On 30/08/19 17:27 +0300, Antony Polukhin wrote: Bunch of micro optimizations for std::to_chars: * For base == 8 replacing the lookup in __digits table with arithmetic computations leads to a same CPU cycles for a loop (exchanges two movzx with 3 bi

Re: [PATCH] Optimize to_chars

2019-08-30 Thread Jonathan Wakely
On 30/08/19 17:27 +0300, Antony Polukhin wrote: Bunch of micro optimizations for std::to_chars: * For base == 8 replacing the lookup in __digits table with arithmetic computations leads to a same CPU cycles for a loop (exchanges two movzx with 3 bit ops https://godbolt.org/z/RTui7m ). However thi