Re: [R-pkg-devel] Use of long double in configuration
By "short double" I simply meant the 64 bit double without the extended precision. Wikipedia has a "long double" article, and is quite good in showing the diversity of interpretations. I've not spent much time looking at the standards since the 80s, except to note the move to make the extended bits optional recently. As far as I am aware, any precision above 64 bit storage but 80 bits in registers required special software and hardware, but the 64/80 combination seemed to be more or less a given until the M? chips came along. For a while I recall there were all sorts of things with gate arrays to make special hardware. They needed special compilers and were a nuisance to use. Fortunately, not a great many computations require more than 64 bit doubles, but there are some situations that need the extra precision for some of the time. I remember Kahan saying it was like passing lanes added to two lane highways to clear out the lines behind a slow truck. The widespread adoption of IEEE 754 with the 80 bit extended was a great step forward. For a while most hardware implemented it, but the M? chips don't have registers, hence the push for changing the standard. It's a pity mechanical calculators are no longer common, like the Monroe manual I gave away to a charity last year. They had double width accumulator carriages for a reason, and make the situation very obvious. The burden for R developers is that a lot of the behind the screen computations are done with pretty old codes that very likely were developed with extended "thinking". Particularly if there are accumulations e.g., of inner products, we can expect that there will be some surprises. My guess is that there will need to be some adjustment of the internal routines, most likely to define somewhat lower standards. My experience has been that results that are a bit different cause lots of confusion, even if the reality isn't of great import, and such confusion will be the main waste of time. I built the "Compact numerical methods" on Data General machines, which had 32 bit floating point with I recall 24 bit mantissa. Later (worse!) the DG Eclipse used 6 hex digits mantissa. The special functions were pretty cruddy too. So my codes were incredibly defensive, making them quite reliable but pretty slow. Also I had to fit program and data in 4K bytes, so the codes left out bells and whistles. Then we got IEEE 754, which really helped us out of the bog of weird and not wonderful floating point. Look up "Fuzz" on Tektronix BASIC. Messed up a lot of my codes. Cheers, JN On 2025-05-04 19:25, Simon Urbanek wrote: John, it's sort of the other way around: because neither the implementation, format nor precision of "long double" are defined by the C standard (it's not even required to be based on IEEE 754/IEC 60559 at all), it is essentially left to the compilers+runtimes to do whatever they choose, making it a bit of a wild card. Historically, anything beyond double precision was emulated since most hardware was unable to natively deal with it, so that’s why you had to think hard if you wanted to use it as the penalty could be an order of magnitude or more. It wasn't until Intel’s math co-processor and its 80-bit extended precision format which reduced the penalty for such operations on that CPU and was mapped to long double - at the cost of results being hardware-specific, somewhat arbitrary and only 2.5x precision (while occupying 4x the space). So long double is just a simple way to say "do your best" without defining any specifics. I’m not sure what you mean by "short double" as double precision is defined as 64-bit, so "short double" would be simply 32-bit = single precision. More recent introduction of varying floating point precisions such as fp8, f16 etc. (assuming that’s what you meant by "short double" on M1) are performance and memory usage optimizations for use-cases where the precision is less important than memory usage, such as in large NNs. M1 is just one of the modern chips that added co-processors specifically for matrix operations on different precisions like fp16, fp32, fp64 - with great performance gains (e.g., using AMX via Apple's BLAS/LAPACK with double precision in R on M1 is over 100x faster than the CPU-based reference version for some operations). As for precision beyond doubles, Apple has asked few years ago the scientific community whether there is interest in fp128 (quad precision) and the response was no, it’s not a priority, so I would assume that’s why it has been left to emulations (it is interesting in retrospect, because at that point we had no idea that they were designing what became Apple Silicon). I presume it would be possible to leverage the Apple matrix co-processor for fp128 operations (e.g., PowerPC used double-double arithmetics implementation as a precedent), but given the low priority I have not seen it yet. Cheers, Simon On Apr 30, 2025, at 10:29 PM, J C Nash wrote:
Re: [R-pkg-devel] Use of long double in configuration
John, it's sort of the other way around: because neither the implementation, format nor precision of "long double" are defined by the C standard (it's not even required to be based on IEEE 754/IEC 60559 at all), it is essentially left to the compilers+runtimes to do whatever they choose, making it a bit of a wild card. Historically, anything beyond double precision was emulated since most hardware was unable to natively deal with it, so that’s why you had to think hard if you wanted to use it as the penalty could be an order of magnitude or more. It wasn't until Intel’s math co-processor and its 80-bit extended precision format which reduced the penalty for such operations on that CPU and was mapped to long double - at the cost of results being hardware-specific, somewhat arbitrary and only 2.5x precision (while occupying 4x the space). So long double is just a simple way to say "do your best" without defining any specifics. I’m not sure what you mean by "short double" as double precision is defined as 64-bit, so "short double" would be simply 32-bit = single precision. More recent introduction of varying floating point precisions such as fp8, f16 etc. (assuming that’s what you meant by "short double" on M1) are performance and memory usage optimizations for use-cases where the precision is less important than memory usage, such as in large NNs. M1 is just one of the modern chips that added co-processors specifically for matrix operations on different precisions like fp16, fp32, fp64 - with great performance gains (e.g., using AMX via Apple's BLAS/LAPACK with double precision in R on M1 is over 100x faster than the CPU-based reference version for some operations). As for precision beyond doubles, Apple has asked few years ago the scientific community whether there is interest in fp128 (quad precision) and the response was no, it’s not a priority, so I would assume that’s why it has been left to emulations (it is interesting in retrospect, because at that point we had no idea that they were designing what became Apple Silicon). I presume it would be possible to leverage the Apple matrix co-processor for fp128 operations (e.g., PowerPC used double-double arithmetics implementation as a precedent), but given the low priority I have not seen it yet. Cheers, Simon > On Apr 30, 2025, at 10:29 PM, J C Nash wrote: > > As one of original 30-some members of 1985 IEEE 754, I find it discouraging > that we are treating long-double as the exception, when it is the > introduction of "short double" in M1 etc chips that have forced the issue. > There are strong commercial reasons, but they aren't computational ones. > > JN > > > On 2025-04-30 05:08, Tim Taylor wrote: >> Thank you all! >> Everything is clear. >> Tim >> On Wed, 30 Apr 2025, at 10:07 AM, Tomas Kalibera wrote: >>> On 4/30/25 10:43, Tim Taylor wrote: Cheers for the quick response. To clarify my question: Is it correct to say that as long as packages do not assume the greater precision provided by 'double' there is no reason they cannot use 'long double' to get *possible* advantages (e.g. in summations). AFAICT 'long double' is (and has always been) part of the C standard so it's use as a type should be unproblematic (this is the query relevant to matrixStats). >>> >>> Probably already clear from previous answers, but yes, packages can use >>> long double type. >>> >>> Whenever using a long double type, one needs to be careful about making >>> sure the algorithms work, and the tests pass (so have reasonable >>> tolerances), even when the long double type happens to be just the same >>> as double. This is the case on aarch64, and macOS/aarch64 is one of the >>> platforms where packages have to work, anyway, so this shouldn't be too >>> limiting anymore - but really one needs to test on such platform. >>> >>> R itself has an option to disable use of long double to make such >>> testing in R itself possible also on other platforms. In principle one >>> could do something similar in a package, have some ifdefs to disable >>> long doubles, but this is not required. And I probably wouldn't do that, >>> I'd just test on aarch64 regularly. >>> >>> See Writing R Extensions for more details. >>> >>> Best >>> Tomas >>> Apologies if this does not make much sense. Tim On Wed, 30 Apr 2025, at 9:33 AM, Uwe Ligges wrote: > On 30.04.2025 10:25, Tim Taylor wrote: >> Is it correct to say that R's conditional use of long double is around >> ensuring things work on platforms which have 'long double' identical to >> 'double' types, as opposed to there being an odd compiler targeted that >> does not even have any concept of 'long double' type? > a double is 64 bit and stored that way on all platforms, the concept of > long doubles is CPU specific. x86 chips have 80bit in the floating point > units for calculations befo