[PATCH] Simplify integer output-related functions in libgfortran
Merry Christmas! The code related to integer output in libgfortran has accumulated some… oddities over the years. I will soon post a finalized patch for faster integer-to-decimal conversion (see https://gcc.gnu.org/pipermail/fortran/2021-December/057201.html), but while working on that I found a couple of things we ought to fix, that are not directly related. So this patch is a simplification patch, a no-op. It does the following things: - gfc_itoa() is always called for nonnegative values, to make it take an unsigned arg. It allows us to simplify the code handling for negative signs (instead of doing it in two places). - fix undefined behaviour on possible overflow when negating large negative values (-HUGE-1) - all callers of write_decimal() always use gfc_itoa() as conversion function, so remove one layer of indirection: get rid of that argument, call gfc_itoa() directly inside write_decimal() - gfc_xtoa() is only used in one file anymore, so move it there and rename it to xtoa() - ztoa_big() is renamed to xtoa_big(), following the convention of other ?to_big() functions - runtime/backtrace.c is the only user of gfc_itoa() outside the I/O system; add a comment so we remember in the future why we use gfc_itoa() there… and what are its limits All this makes the code easier to understand, more consistent, probably marginally more efficient (the gfc_itoa pointer indirection), and will make the future work on speeding up gfc_itoa() easier. Bootstrapped and regtested on x86_64-pc-linux-gnu. OK to commit? FX itoa.patch Description: Binary data
[PATCH] Make integer output faster in libgfortran
Hi, Integer output in libgfortran is done by passing values as the largest integer type available. This is what our gfc_itoa() function for conversion to decimal form uses, as well, performing series of divisions by 10. On targets with a 128-bit integer type (which is most targets, really, nowadays), division is slow, because it is implemented in software and requires a call to a libgcc function. We can speed this up in two easy ways: - If the value fits into 64-bit, use a simple 64-bit itoa() function, which does the series of divisions by 10 with hardware. Most I/O will actually fall into that case, in real-life, unless you’re printing very big 128-bit integers. - If the value does not fit into 64-bit, perform only one slow division, by 10^19, and use two calls to the 64-bit function to output each part (the low part needing zero-padding). What is the speed-up? It really depends on the exact nature of the I/O done. For the most common-case, list-directed I/O with no special format, the patch does not speed (or slow!) things for values up to HUGE(KIND=4), but speeds things up for larger values. For very large 128-bit values, it can cut the I/O time in half. I attach my own timing code to this email. Results before the patch (with previous itoa-patch applied, though): Timing for INTEGER(KIND=1) Value 0, time: 0.191409990 Value HUGE(KIND=1), time: 0.173687011 Timing for INTEGER(KIND=4) Value 0, time: 0.171809018 Value 1049, time: 0.177439988 Value HUGE(KIND=4), time: 0.217984974 Timing for INTEGER(KIND=8) Value 0, time: 0.178072989 Value HUGE(KIND=4), time: 0.214841008 Value HUGE(KIND=8), time: 0.276726007 Timing for INTEGER(KIND=16) Value 0, time: 0.175235987 Value HUGE(KIND=4), time: 0.217689037 Value HUGE(KIND=8), time: 0.280257106 Value HUGE(KIND=16), time: 0.420036077 Results after the patch: Timing for INTEGER(KIND=1) Value 0, time: 0.194633007 Value HUGE(KIND=1), time: 0.172436997 Timing for INTEGER(KIND=4) Value 0, time: 0.167517006 Value 1049, time: 0.176503003 Value HUGE(KIND=4), time: 0.172892988 Timing for INTEGER(KIND=8) Value 0, time: 0.171101034 Value HUGE(KIND=4), time: 0.174461007 Value HUGE(KIND=8), time: 0.180289030 Timing for INTEGER(KIND=16) Value 0, time: 0.175765991 Value HUGE(KIND=4), time: 0.181162953 Value HUGE(KIND=8), time: 0.186082959 Value HUGE(KIND=16), time: 0.207401991 Times are CPU times in seconds, for one million integer writes into a buffer string. With the patch, we see that integer decimal output is almost independent of the value written, meaning the I/O library overhead is dominant, not the decimal conversion. For this reason, I don’t think we really need a faster implementation of the 64-bit itoa, and can keep the current series-of-division-by-10 approach. --- This patch applies on top of my previous itoa-related patch at https://gcc.gnu.org/pipermail/fortran/2021-December/057218.html The patch has been bootstrapped and regtested on two 64-bit targets: aarch64-apple-darwin21 (development branch) and x86_64-pc-gnu-linux. I would like it to be tested on a 32-bit target without 128-bit integer type. Does someone have access to that? Once tested on a 32-bit target, OK to commit? FX itoa-faster.patch Description: Binary data timing.f90 Description: Binary data
Re: [PATCH] Simplify integer output-related functions in libgfortran
First merry Christmas to all! Bootstrapped and regtested on x86_64-pc-linux-gnu. OK to commit? OK. Thanks for the (preliminary) patch!
Re: [PATCH] Make integer output faster in libgfortran
Hi FX, The patch has been bootstrapped and regtested on two 64-bit targets: aarch64-apple-darwin21 (development branch) and x86_64-pc-gnu-linux. I would like it to be tested on a 32-bit target without 128-bit integer type. Does someone have access to that? There are two possibilities: Either use gcc45 on the compile farm, or run it with make -k -j8 check-fortran RUNTESTFLAGS="--target_board=unix'{-m32,-m64}'" which is the magic incantation to also use -m32 binaries. You'll need the 32-bit support on your Linux system, of course (which you can check quickly with a "hello world" kind of program with -m32). Regards Thomas
Re: [PATCH] Make integer output faster in libgfortran
Hi Thomas, > There are two possibilities: Either use gcc45 on the compile farm, or > run it with > make -k -j8 check-fortran RUNTESTFLAGS="--target_board=unix'{-m32,-m64}'" Thanks, right now I don’t have a Linux system with 32-bit support. I’ll see how I can connect to gcc45, but if someone who is already set up to do can fire a quick regtest, that would be great ;) FX
Re: [PATCH] Make integer output faster in libgfortran
Hi fX, right now I don’t have a Linux system with 32-bit support. I’ll see how I can connect to gcc45, but if someone who is already set up to do can fire a quick regtest, that would be great;) I tested this on x86_64-pc-linux-gnu with make -k -j8 check-fortran RUNTESTFLAGS="--target_board=unix'{-m32,-m64}'" and didn't see any problems. So, OK for trunk. (We could also do something like that for a 32-bit system, but that is another kettle of fish). Thanks for taking this up! Best regards Thomas