date:20250504

Re: [R-pkg-devel] Use of long double in configuration

2025-05-04 Thread J C Nash

By "short double" I simply meant the 64 bit double without the extended
precision.
Wikipedia has a "long double" article, and is quite good in showing the
diversity
of interpretations.

I've not spent much time looking at the standards since the 80s, except to note
the move to make the extended bits optional recently. As far as I am aware, any
precision above 64 bit storage but 80 bits in registers required special
software
and hardware, but the 64/80 combination seemed to be more or less a given until
the M? chips came along.

For a while I recall there were all sorts of things with gate arrays to make
special hardware. They needed special compilers and were a nuisance to use.
Fortunately, not a great many computations require more than 64 bit doubles,
but there are some situations that need the extra precision for some of the
time.
I remember Kahan saying it was like passing lanes added to two lane highways
to clear out the lines behind a slow truck.

The widespread adoption of IEEE 754 with the 80 bit extended was a great step
forward. For a while most hardware implemented it, but the M? chips don't
have registers, hence the push for changing the standard. It's a pity mechanical
calculators are no longer common, like the Monroe manual I gave away to a
charity last year. They had double width accumulator carriages for a reason,
and make the situation very obvious.

The burden for R developers is that a lot of the behind the screen computations
are done with pretty old codes that very likely were developed with extended
"thinking". Particularly if there are accumulations e.g., of inner products,
we can expect that there will be some surprises. My guess is that there
will need to be some adjustment of the internal routines, most likely to define
somewhat lower standards. My experience has been that results that are a bit
different cause lots of confusion, even if the reality isn't of great import,
and such confusion will be the main waste of time.

I built the "Compact numerical methods" on Data General machines, which had
32 bit floating point with I recall 24 bit mantissa. Later (worse!) the
DG Eclipse used 6 hex digits mantissa. The special functions were pretty
cruddy too. So my codes were incredibly defensive, making them quite reliable
but pretty slow. Also I had to fit program and data in 4K bytes, so the
codes left out bells and whistles. Then we got IEEE 754, which really
helped us out of the bog of weird and not wonderful floating point.
Look up "Fuzz" on Tektronix BASIC. Messed up a lot of my codes.

Cheers,

On 2025-05-04 19:25, Simon Urbanek wrote:

John,

it's sort of the other way around: because neither the implementation, format nor precision of
"long double" are defined by the C standard (it's not even required to be based on IEEE
754/IEC 60559 at all), it is essentially left to the compilers+runtimes to do whatever they choose,
making it a bit of a wild card. Historically, anything beyond double precision was emulated since
most hardware was unable to natively deal with it, so that’s why you had to think hard if you
wanted to use it as the penalty could be an order of magnitude or more. It wasn't until Intel’s
math co-processor and its 80-bit extended precision format which reduced the penalty for such
operations on that CPU and was mapped to long double - at the cost of results being
hardware-specific, somewhat arbitrary and only 2.5x precision (while occupying 4x the space). So
long double is just a simple way to say "do your best" without defining any specifics.

I’m not sure what you mean by "short double" as double precision is defined as 64-bit, so
"short double" would be simply 32-bit = single precision. More recent introduction of varying
floating point precisions such as fp8, f16 etc. (assuming that’s what you meant by "short double"
on M1) are performance and memory usage optimizations for use-cases where the precision is less important
than memory usage, such as in large NNs. M1 is just one of the modern chips that added co-processors
specifically for matrix operations on different precisions like fp16, fp32, fp64 - with great performance
gains (e.g., using AMX via Apple's BLAS/LAPACK with double precision in R on M1 is over 100x faster than the
CPU-based reference version for some operations).

As for precision beyond doubles, Apple has asked few years ago the scientific
community whether there is interest in fp128 (quad precision) and the response
was no, it’s not a priority, so I would assume that’s why it has been left to
emulations (it is interesting in retrospect, because at that point we had no
idea that they were designing what became Apple Silicon). I presume it would be
possible to leverage the Apple matrix co-processor for fp128 operations (e.g.,
PowerPC used double-double arithmetics implementation as a precedent), but
given the low priority I have not seen it yet.

Cheers,
Simon

On Apr 30, 2025, at 10:29 PM, J C Nash wrote:

Re: [R-pkg-devel] Use of long double in configuration

2025-05-04 Thread Simon Urbanek

John,

it's sort of the other way around: because neither the implementation, format 
nor precision of "long double" are defined by the C standard (it's not even 
required to be based on IEEE 754/IEC 60559 at all), it is essentially left to 
the compilers+runtimes to do whatever they choose, making it a bit of a wild 
card. Historically, anything beyond double precision was emulated since most 
hardware was unable to natively deal with it, so that’s why you had to think 
hard if you wanted to use it as the penalty could be an order of magnitude or 
more. It wasn't until Intel’s math co-processor and its 80-bit extended 
precision format which reduced the penalty for such operations on that CPU and 
was mapped to long double - at the cost of results being hardware-specific, 
somewhat arbitrary and only 2.5x precision (while occupying 4x the space). So 
long double is just a simple way to say "do your best" without defining any 
specifics.

I’m not sure what you mean by "short double" as double precision is defined as 
64-bit, so "short double" would be simply 32-bit = single precision. More 
recent introduction of varying floating point precisions such as fp8, f16 etc. 
(assuming that’s what you meant by "short double" on M1) are performance and 
memory usage optimizations for use-cases where the precision is less important 
than memory usage, such as in large NNs. M1 is just one of the modern chips 
that added co-processors specifically for matrix operations on different 
precisions like fp16, fp32, fp64 - with great performance gains (e.g., using 
AMX via Apple's BLAS/LAPACK with double precision in R on M1 is over 100x 
faster than the CPU-based reference version for some operations).

As for precision beyond doubles, Apple has asked few years ago the scientific 
community whether there is interest in fp128 (quad precision) and the response 
was no, it’s not a priority, so I would assume that’s why it has been left to 
emulations (it is interesting in retrospect, because at that point we had no 
idea that they were designing what became Apple Silicon). I presume it would be 
possible to leverage the Apple matrix co-processor for fp128 operations (e.g., 
PowerPC used double-double arithmetics implementation as a precedent), but 
given the low priority I have not seen it yet.

Cheers,
Simon

> On Apr 30, 2025, at 10:29 PM, J C Nash  wrote:
> 
> As one of original 30-some members of 1985 IEEE 754, I find it discouraging 
> that we are treating long-double as the exception, when it is the 
> introduction of "short double" in M1 etc chips that have forced the issue. 
> There are strong commercial reasons, but they aren't computational ones.
> 
> JN
> 
> 
> On 2025-04-30 05:08, Tim Taylor wrote:
>> Thank you all!
>> Everything is clear.
>> Tim
>> On Wed, 30 Apr 2025, at 10:07 AM, Tomas Kalibera wrote:
>>> On 4/30/25 10:43, Tim Taylor wrote:
 Cheers for the quick response.

 To clarify my question: Is it correct to say that as long as packages do 
 not assume the greater precision provided by 'double' there is no reason 
 they cannot use 'long double' to get *possible* advantages (e.g. in 
 summations). AFAICT 'long double' is (and has always been) part of the C 
 standard so it's use as a type should be unproblematic (this is the query 
 relevant to matrixStats).
>>> 
>>> Probably already clear from previous answers, but yes, packages can use
>>> long double type.
>>> 
>>> Whenever using a long double type, one needs to be careful about making
>>> sure the algorithms work, and the tests pass (so have reasonable
>>> tolerances), even when the long double type happens to be just the same
>>> as double. This is the case on aarch64, and macOS/aarch64 is one of the
>>> platforms where packages have to work, anyway, so this shouldn't be too
>>> limiting anymore - but really one needs to test on such platform.
>>> 
>>> R itself has an option to disable use of long double to make such
>>> testing in R itself possible also on other platforms. In principle one
>>> could do something similar in a package, have some ifdefs to disable
>>> long doubles, but this is not required. And I probably wouldn't do that,
>>> I'd just test on aarch64 regularly.
>>> 
>>> See Writing R Extensions for more details.
>>> 
>>> Best
>>> Tomas
>>> 
 Apologies if this does not make much sense.

 Tim

 On Wed, 30 Apr 2025, at 9:33 AM, Uwe Ligges wrote:
> On 30.04.2025 10:25, Tim Taylor wrote:
>> Is it correct to say that R's conditional use of long double is around 
>> ensuring things work on platforms which have 'long double' identical to 
>> 'double' types, as opposed to there being an odd compiler targeted that 
>> does not even have any concept of 'long double' type?
> a double is 64 bit and stored that way on all platforms, the concept of
> long doubles is CPU specific. x86 chips have 80bit in the floating point
> units for calculations befo

Re: [R-pkg-devel] Use of long double in configuration

Re: [R-pkg-devel] Use of long double in configuration

2 matches

Site Navigation

Mail list logo

Footer information