[PATCH] Simplify integer output-related functions in libgfortran

2021-12-25 Thread FX via Fortran
Merry Christmas!

The code related to integer output in libgfortran has accumulated some… 
oddities over the years. I will soon post a finalized patch for faster 
integer-to-decimal conversion (see 
https://gcc.gnu.org/pipermail/fortran/2021-December/057201.html), but while 
working on that I found a couple of things we ought to fix, that are not 
directly related.

So this patch is a simplification patch, a no-op. It does the following things:

 - gfc_itoa() is always called for nonnegative values, to make it take an 
unsigned arg. It allows us to simplify the code handling for negative signs 
(instead of doing it in two places).
 - fix undefined behaviour on possible overflow when negating large negative 
values (-HUGE-1)
 - all callers of write_decimal() always use gfc_itoa() as conversion function, 
so remove one layer of indirection: get rid of that argument, call gfc_itoa() 
directly inside write_decimal()
 - gfc_xtoa() is only used in one file anymore, so move it there and rename it 
to xtoa()
 - ztoa_big() is renamed to xtoa_big(), following the convention of other 
?to_big() functions
 - runtime/backtrace.c is the only user of gfc_itoa() outside the I/O system; 
add a comment so we remember in the future why we use gfc_itoa() there… and 
what are its limits

All this makes the code easier to understand, more consistent, probably 
marginally more efficient (the gfc_itoa pointer indirection), and will make the 
future work on speeding up gfc_itoa() easier.

Bootstrapped and regtested on x86_64-pc-linux-gnu.
OK to commit?

FX



itoa.patch
Description: Binary data


[PATCH] Make integer output faster in libgfortran

2021-12-25 Thread FX via Fortran
Hi,

Integer output in libgfortran is done by passing values as the largest integer 
type available. This is what our gfc_itoa() function for conversion to decimal 
form uses, as well, performing series of divisions by 10. On targets with a 
128-bit integer type (which is most targets, really, nowadays), division is 
slow, because it is implemented in software and requires a call to a libgcc 
function.

We can speed this up in two easy ways:
- If the value fits into 64-bit, use a simple 64-bit itoa() function, which 
does the series of divisions by 10 with hardware. Most I/O will actually fall 
into that case, in real-life, unless you’re printing very big 128-bit integers.
- If the value does not fit into 64-bit, perform only one slow division, by 
10^19, and use two calls to the 64-bit function to output each part (the low 
part needing zero-padding).


What is the speed-up? It really depends on the exact nature of the I/O done. 
For the most common-case, list-directed I/O with no special format, the patch 
does not speed (or slow!) things for values up to HUGE(KIND=4), but speeds 
things up for larger values. For very large 128-bit values, it can cut the I/O 
time in half.

I attach my own timing code to this email. Results before the patch (with 
previous itoa-patch applied, though):

 Timing for INTEGER(KIND=1)
 Value 0, time:  0.191409990
 Value HUGE(KIND=1), time:  0.173687011
 Timing for INTEGER(KIND=4)
 Value 0, time:  0.171809018
 Value 1049, time:  0.177439988
 Value HUGE(KIND=4), time:  0.217984974
 Timing for INTEGER(KIND=8)
 Value 0, time:  0.178072989
 Value HUGE(KIND=4), time:  0.214841008
 Value HUGE(KIND=8), time:  0.276726007
 Timing for INTEGER(KIND=16)
 Value 0, time:  0.175235987
 Value HUGE(KIND=4), time:  0.217689037
 Value HUGE(KIND=8), time:  0.280257106
 Value HUGE(KIND=16), time:  0.420036077

Results after the patch:

 Timing for INTEGER(KIND=1)
 Value 0, time:  0.194633007
 Value HUGE(KIND=1), time:  0.172436997
 Timing for INTEGER(KIND=4)
 Value 0, time:  0.167517006
 Value 1049, time:  0.176503003
 Value HUGE(KIND=4), time:  0.172892988
 Timing for INTEGER(KIND=8)
 Value 0, time:  0.171101034
 Value HUGE(KIND=4), time:  0.174461007
 Value HUGE(KIND=8), time:  0.180289030
 Timing for INTEGER(KIND=16)
 Value 0, time:  0.175765991
 Value HUGE(KIND=4), time:  0.181162953
 Value HUGE(KIND=8), time:  0.186082959
 Value HUGE(KIND=16), time:  0.207401991

Times are CPU times in seconds, for one million integer writes into a buffer 
string. With the patch, we see that integer decimal output is almost 
independent of the value written, meaning the I/O library overhead is dominant, 
not the decimal conversion. For this reason, I don’t think we really need a 
faster implementation of the 64-bit itoa, and can keep the current 
series-of-division-by-10 approach.

---

This patch applies on top of my previous itoa-related patch at 
https://gcc.gnu.org/pipermail/fortran/2021-December/057218.html

The patch has been bootstrapped and regtested on two 64-bit targets: 
aarch64-apple-darwin21 (development branch) and x86_64-pc-gnu-linux. I would 
like it to be tested on a 32-bit target without 128-bit integer type. Does 
someone have access to that?

Once tested on a 32-bit target, OK to commit?

FX



itoa-faster.patch
Description: Binary data


timing.f90
Description: Binary data


Re: [PATCH] Simplify integer output-related functions in libgfortran

2021-12-25 Thread Thomas Koenig via Fortran

First merry Christmas to all!


Bootstrapped and regtested on x86_64-pc-linux-gnu.
OK to commit?


OK.

Thanks for the (preliminary) patch!


Re: [PATCH] Make integer output faster in libgfortran

2021-12-25 Thread Thomas Koenig via Fortran

Hi FX,


The patch has been bootstrapped and regtested on two 64-bit targets: 
aarch64-apple-darwin21 (development branch) and x86_64-pc-gnu-linux. I would 
like it to be tested on a 32-bit target without 128-bit integer type. Does 
someone have access to that?


There are two possibilities: Either use gcc45 on the compile farm, or
run it with

make -k -j8 check-fortran RUNTESTFLAGS="--target_board=unix'{-m32,-m64}'"

which is the magic incantation to also use -m32 binaries.  You'll need
the 32-bit support on your Linux system, of course (which you can
check quickly with a "hello world" kind of program with -m32).

Regards

Thomas


Re: [PATCH] Make integer output faster in libgfortran

2021-12-25 Thread FX via Fortran
Hi Thomas,

> There are two possibilities: Either use gcc45 on the compile farm, or
> run it with
> make -k -j8 check-fortran RUNTESTFLAGS="--target_board=unix'{-m32,-m64}'"

Thanks, right now I don’t have a Linux system with 32-bit support. I’ll see how 
I can connect to gcc45, but if someone who is already set up to do can fire a 
quick regtest, that would be great ;)

FX

Re: [PATCH] Make integer output faster in libgfortran

2021-12-25 Thread Thomas Koenig via Fortran

Hi fX,


right now I don’t have a Linux system with 32-bit support. I’ll see how I can 
connect to gcc45, but if someone who is already set up to do can fire a quick 
regtest, that would be great;)


I tested this on x86_64-pc-linux-gnu with

make -k -j8 check-fortran RUNTESTFLAGS="--target_board=unix'{-m32,-m64}'"

and didn't see any problems.

So, OK for trunk.

(We could also do something like that for a 32-bit system, but
that is another kettle of fish).

Thanks for taking this up!

Best regards

Thomas