https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98076
Francois-Xavier Coudert <fxcoudert at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |fxcoudert at gcc dot gnu.org --- Comment #4 from Francois-Xavier Coudert <fxcoudert at gcc dot gnu.org> --- Benchmarking of the formatting of 100 million times a value of the order of medium-sized number (INT_MAX/2) by gfc_itoa. On the 64-bit target I have at hand (aarch64-apple-darwin), depending on whether the function is implemented as: __int128 : 3.91 seconds int64_t : 0.86 seconds int32_t : 0.84 seconds __int128 relies on a function call of the division (___divti3), others don't. This would allow for a very simple optimisation, that does not require to change the current I/O workflow, i.e., passing all integer values as the largest type (usually int128_t): - have a fast itoa64() function that takes an uint64_t arg - have gfc_itoa() call itoa64() if the argument fits - otherwise, divide by a large power of ten, and recursively apply itoa64() For small values, itoa64() will be called once per gfc_itoa() call. Worst case behaviour (very large 128-bit values) is 2 division calls to itoa64(), which is still faster than doing the 38 128-bit divisions in the current version.