Re: [TCWG CI] 447.dealII:libstdc++.so.6.0.29 grew in size by 12% after gcc: libstdc++: Add floating-point std::to_chars implementation

Maxim Kuvyrkov Tue, 21 Sep 2021 07:04:29 -0700

Hi Patrick,

Is it expected that libstdc++.so grew by 12% from this one patch?


Thanks,

--
Maxim Kuvyrkov
https://www.linaro.org

> On 21 Sep 2021, at 13:01, ci_not...@linaro.org wrote:
> 
> After gcc commit 3c57e692357c79ee7623dfc1586652aee2aefb8f
> Author: Patrick Palka <ppa...@redhat.com>
> 
>    libstdc++: Add floating-point std::to_chars implementation
> 
> the following hot functions grew in size by more than 10% (but their 
> benchmarks grew in size by less than 1%):
> - 447.dealII:libstdc++.so.6.0.29 grew in size by 12% from 1245370 to 1391240 
> bytes
> 
> Below reproducer instructions can be used to re-build both "first_bad" and 
> "last_good" cross-toolchains used in this bisection.  Naturally, the scripts 
> will fail when triggerring benchmarking jobs if you don't have access to 
> Linaro TCWG CI.
> 
> For your convenience, we have uploaded tarballs with pre-processed source and 
> assembly files at:
> - First_bad save-temps: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release-arm-spec2k6-Os/10/artifact/artifacts/build-3c57e692357c79ee7623dfc1586652aee2aefb8f/save-temps/
> - Last_good save-temps: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release-arm-spec2k6-Os/10/artifact/artifacts/build-5033506993ef92589373270a8e8dbbf50e3ebef1/save-temps/
> - Baseline save-temps: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release-arm-spec2k6-Os/10/artifact/artifacts/build-baseline/save-temps/
> 
> Configuration:
> - Benchmark: SPEC CPU2006
> - Toolchain: Clang + Glibc + LLVM Linker
> - Version: all components were built from their latest release branch
> - Target: arm-linux-gnueabihf
> - Compiler flags: -Os -mthumb
> - Hardware: APM Mustang 8x X-Gene1
> 
> This benchmarking CI is work-in-progress, and we welcome feedback and 
> suggestions at linaro-toolchain@lists.linaro.org .  In our improvement plans 
> is to add support for SPEC CPU2017 benchmarks and provide "perf 
> report/annotate" data behind these reports.
> 
> THIS IS THE END OF INTERESTING STUFF.  BELOW ARE LINKS TO BUILDS, 
> REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
> 
> This commit has regressed these CI configurations:
> - tcwg_bmk_llvm_apm/llvm-release-arm-spec2k6-Os
> 
> First_bad build: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release-arm-spec2k6-Os/10/artifact/artifacts/build-3c57e692357c79ee7623dfc1586652aee2aefb8f/
> Last_good build: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release-arm-spec2k6-Os/10/artifact/artifacts/build-5033506993ef92589373270a8e8dbbf50e3ebef1/
> Baseline build: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release-arm-spec2k6-Os/10/artifact/artifacts/build-baseline/
> Even more details: 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release-arm-spec2k6-Os/10/artifact/artifacts/
> 
> Reproduce builds:
> <cut>
> mkdir investigate-gcc-3c57e692357c79ee7623dfc1586652aee2aefb8f
> cd investigate-gcc-3c57e692357c79ee7623dfc1586652aee2aefb8f
> 
> # Fetch scripts
> git clone https://git.linaro.org/toolchain/jenkins-scripts
> 
> # Fetch manifests and test.sh script
> mkdir -p artifacts/manifests
> curl -o artifacts/manifests/build-baseline.sh 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release-arm-spec2k6-Os/10/artifact/artifacts/manifests/build-baseline.sh
>  --fail
> curl -o artifacts/manifests/build-parameters.sh 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release-arm-spec2k6-Os/10/artifact/artifacts/manifests/build-parameters.sh
>  --fail
> curl -o artifacts/test.sh 
> https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release-arm-spec2k6-Os/10/artifact/artifacts/test.sh
>  --fail
> chmod +x artifacts/test.sh
> 
> # Reproduce the baseline build (build all pre-requisites)
> ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
> 
> # Save baseline build state (which is then restored in artifacts/test.sh)
> mkdir -p ./bisect
> rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ 
> --exclude /gcc/ ./ ./bisect/baseline/
> 
> cd gcc
> 
> # Reproduce first_bad build
> git checkout --detach 3c57e692357c79ee7623dfc1586652aee2aefb8f
> ../artifacts/test.sh
> 
> # Reproduce last_good build
> git checkout --detach 5033506993ef92589373270a8e8dbbf50e3ebef1
> ../artifacts/test.sh
> 
> cd ..
> </cut>
> 
> Full commit (up to 1000 lines):
> <cut>
> commit 3c57e692357c79ee7623dfc1586652aee2aefb8f
> Author: Patrick Palka <ppa...@redhat.com>
> Date:   Thu Dec 17 23:11:34 2020 -0500
> 
>    libstdc++: Add floating-point std::to_chars implementation
> 
>    This implements the floating-point std::to_chars overloads for float,
>    double and long double.  We use the Ryu library to compute the shortest
>    round-trippable fixed and scientific forms for float, double and long
>    double.  We also use Ryu for performing explicit-precision fixed and
>    scientific formatting for float and double. For explicit-precision
>    formatting for long double we fall back to using printf.  Hexadecimal
>    formatting for float, double and long double is implemented from
>    scratch.
> 
>    The supported long double binary formats are binary64, binary80 (x86
>    80-bit extended precision), binary128 and ibm128.
> 
>    Much of the complexity of the implementation is in computing the exact
>    output length before handing it off to Ryu (which doesn't do bounds
>    checking).  In some cases it's hard to compute the output length
>    beforehand, so in these cases we instead compute an upper bound on the
>    output length and use a sufficiently-sized intermediate buffer only if
>    necessary.
> 
>    Another source of complexity is in the general-with-precision formatting
>    mode, where we need to do zero-trimming of the string returned by Ryu,
>    and where we also take care to avoid having to format the number through
>    Ryu a second time when the general formatting mode resolves to fixed
>    (which we determine by doing a scientific formatting first and
>    inspecting the scientific exponent).  We avoid going through Ryu twice
>    by instead transforming the scientific form to the corresponding fixed
>    form via in-place string manipulation.
> 
>    This implementation is non-conforming in a couple of ways:
> 
>    1. For the shortest hexadecimal formatting, we currently follow the
>       Microsoft implementation's decision to be consistent with the
>       output of printf's '%a' specifier at the expense of sometimes not
>       printing the shortest representation.  For example, the shortest hex
>       form for the number 1.08p+0 is 2.1p-1, but we output the former
>       instead of the latter, as does printf.
> 
>    2. The Ryu routine generic_binary_to_decimal that we use for performing
>       shortest formatting for large floating point types is implemented
>       using the __int128 type, but some targets with a large long double
>       type lack __int128 (e.g. i686), so we can't perform shortest
>       formatting of long double on such targets through Ryu.  As a
>       temporary stopgap this patch makes the long double to_chars overloads
>       just dispatch to the double overloads on these targets, which means
>       we lose precision in the output.  (We could potentially fix this by
>       writing a specialized version of Ryu's generic_binary_to_decimal
>       routine that uses uint64_t instead of __int128.)  [Though I wonder if
>       there's a better way to work around the lack of __int128 on i686
>       specifically?]
> 
>    3. Our shortest formatting for __ibm128 doesn't guarantee the round-trip
>       property if the difference between the high- and low-order exponent
>       is large.  This is because we treat __ibm128 as if it has a
>       contiguous 105-bit mantissa by merging the mantissas of the high-
>       and low-order parts (using code extracted from glibc), so we
>       potentially lose precision from the low-order part.  This seems to be
>       consistent with how glibc printf formats __ibm128.
> 
>    libstdc++-v3/ChangeLog:
> 
>            * config/abi/pre/gnu.ver: Add new exports.
>            * include/std/charconv (to_chars): Declare the floating-point
>            overloads for float, double and long double.
>            * src/c++17/Makefile.am (sources): Add floating_to_chars.cc.
>            * src/c++17/Makefile.in: Regenerate.
>            * src/c++17/floating_to_chars.cc: New file.
>            (to_chars): Define for float, double and long double.
>            * testsuite/20_util/to_chars/long_double.cc: New test.
> ---
> libstdc++-v3/config/abi/pre/gnu.ver                |    7 +
> libstdc++-v3/include/std/charconv                  |   24 +
> libstdc++-v3/src/c++17/Makefile.am                 |    1 +
> libstdc++-v3/src/c++17/Makefile.in                 |    3 +-
> libstdc++-v3/src/c++17/floating_to_chars.cc        | 1563 ++++++++++++++++++++
> .../testsuite/20_util/to_chars/long_double.cc      |  199 +++
> 6 files changed, 1796 insertions(+), 1 deletion(-)
> 
> diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
> b/libstdc++-v3/config/abi/pre/gnu.ver
> index 4b4bd8ab6da..05e0a512247 100644
> --- a/libstdc++-v3/config/abi/pre/gnu.ver
> +++ b/libstdc++-v3/config/abi/pre/gnu.ver
> @@ -2393,6 +2393,13 @@ GLIBCXX_3.4.29 {
>     # std::once_flag::_M_finish(bool)
>     _ZNSt9once_flag9_M_finishEb;
> 
> +    # std::to_chars(char*, char*, [float|double|long double])
> +    _ZSt8to_charsPcS_[defg];
> +    # std::to_chars(char*, char*, [float|double|long double], chars_format)
> +    _ZSt8to_charsPcS_[defg]St12chars_format;
> +    # std::to_chars(char*, char*, [float|double|long double], chars_format, 
> int)
> +    _ZSt8to_charsPcS_[defg]St12chars_formati;
> +
> } GLIBCXX_3.4.28;
> 
> # Symbols in the support library (libsupc++) have their own tag.
> diff --git a/libstdc++-v3/include/std/charconv 
> b/libstdc++-v3/include/std/charconv
> index dd1ebdf8322..b57b0a16db2 100644
> --- a/libstdc++-v3/include/std/charconv
> +++ b/libstdc++-v3/include/std/charconv
> @@ -702,6 +702,30 @@ namespace __detail
>            chars_format __fmt = chars_format::general) noexcept;
> #endif
> 
> +  // Floating-point std::to_chars
> +
> +  // Overloads for float.
> +  to_chars_result to_chars(char* __first, char* __last, float __value) 
> noexcept;
> +  to_chars_result to_chars(char* __first, char* __last, float __value,
> +                        chars_format __fmt) noexcept;
> +  to_chars_result to_chars(char* __first, char* __last, float __value,
> +                        chars_format __fmt, int __precision) noexcept;
> +
> +  // Overloads for double.
> +  to_chars_result to_chars(char* __first, char* __last, double __value) 
> noexcept;
> +  to_chars_result to_chars(char* __first, char* __last, double __value,
> +                        chars_format __fmt) noexcept;
> +  to_chars_result to_chars(char* __first, char* __last, double __value,
> +                        chars_format __fmt, int __precision) noexcept;
> +
> +  // Overloads for long double.
> +  to_chars_result to_chars(char* __first, char* __last, long double __value)
> +    noexcept;
> +  to_chars_result to_chars(char* __first, char* __last, long double __value,
> +                        chars_format __fmt) noexcept;
> +  to_chars_result to_chars(char* __first, char* __last, long double __value,
> +                        chars_format __fmt, int __precision) noexcept;
> +
> _GLIBCXX_END_NAMESPACE_VERSION
> } // namespace std
> #endif // C++14
> diff --git a/libstdc++-v3/src/c++17/Makefile.am 
> b/libstdc++-v3/src/c++17/Makefile.am
> index 37cdb53c076..2ec5ed621ca 100644
> --- a/libstdc++-v3/src/c++17/Makefile.am
> +++ b/libstdc++-v3/src/c++17/Makefile.am
> @@ -51,6 +51,7 @@ endif
> 
> sources = \
>       floating_from_chars.cc \
> +     floating_to_chars.cc \
>       fs_dir.cc \
>       fs_ops.cc \
>       fs_path.cc \
> diff --git a/libstdc++-v3/src/c++17/Makefile.in 
> b/libstdc++-v3/src/c++17/Makefile.in
> index ccae721ab3f..9b36b7a916c 100644
> --- a/libstdc++-v3/src/c++17/Makefile.in
> +++ b/libstdc++-v3/src/c++17/Makefile.in
> @@ -124,7 +124,7 @@ LTLIBRARIES = $(noinst_LTLIBRARIES)
> libc__17convenience_la_LIBADD =
> @ENABLE_DUAL_ABI_TRUE@am__objects_1 = cow-fs_dir.lo cow-fs_ops.lo \
> @ENABLE_DUAL_ABI_TRUE@        cow-fs_path.lo
> -am__objects_2 = floating_from_chars.lo fs_dir.lo fs_ops.lo fs_path.lo \
> +am__objects_2 = floating_from_chars.lo floating_to_chars.lo fs_dir.lo 
> fs_ops.lo fs_path.lo \
>       memory_resource.lo $(am__objects_1)
> @ENABLE_DUAL_ABI_TRUE@am__objects_3 = cow-string-inst.lo
> @ENABLE_EXTERN_TEMPLATE_TRUE@am__objects_4 = ostream-inst.lo \
> @@ -440,6 +440,7 @@ headers =
> 
> sources = \
>       floating_from_chars.cc \
> +     floating_to_chars.cc \
>       fs_dir.cc \
>       fs_ops.cc \
>       fs_path.cc \
> diff --git a/libstdc++-v3/src/c++17/floating_to_chars.cc 
> b/libstdc++-v3/src/c++17/floating_to_chars.cc
> new file mode 100644
> index 00000000000..dd83f5eea93
> --- /dev/null
> +++ b/libstdc++-v3/src/c++17/floating_to_chars.cc
> @@ -0,0 +1,1563 @@
> +// std::to_chars implementation for floating-point types -*- C++ -*-
> +
> +// Copyright (C) 2020 Free Software Foundation, Inc.
> +//
> +// This file is part of the GNU ISO C++ Library.  This library is free
> +// software; you can redistribute it and/or modify it under the
> +// terms of the GNU General Public License as published by the
> +// Free Software Foundation; either version 3, or (at your option)
> +// any later version.
> +
> +// This library is distributed in the hope that it will be useful,
> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +// GNU General Public License for more details.
> +
> +// Under Section 7 of GPL version 3, you are granted additional
> +// permissions described in the GCC Runtime Library Exception, version
> +// 3.1, as published by the Free Software Foundation.
> +
> +// You should have received a copy of the GNU General Public License and
> +// a copy of the GCC Runtime Library Exception along with this program;
> +// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +// <http://www.gnu.org/licenses/>.
> +
> +// Activate __glibcxx_assert within this file to shake out any bugs.
> +#define _GLIBCXX_ASSERTIONS 1
> +
> +#include <charconv>
> +
> +#include <bit>
> +#include <cfenv>
> +#include <cassert>
> +#include <cmath>
> +#include <cstdio>
> +#include <cstring>
> +#include <langinfo.h>
> +#include <optional>
> +#include <string_view>
> +#include <type_traits>
> +
> +// Determine the binary format of 'long double'.
> +
> +// We support the binary64, float80 (i.e. x86 80-bit extended precision),
> +// binary128, and ibm128 formats.
> +#define LDK_UNSUPPORTED 0
> +#define LDK_BINARY64    1
> +#define LDK_FLOAT80     2
> +#define LDK_BINARY128   3
> +#define LDK_IBM128      4
> +
> +#if __LDBL_MANT_DIG__ == __DBL_MANT_DIG__
> +# define LONG_DOUBLE_KIND LDK_BINARY64
> +#elif defined(__SIZEOF_INT128__)
> +// The Ryu routines need a 128-bit integer type in order to do shortest
> +// formatting of types larger than 64-bit double, so without __int128 we 
> can't
> +// support any large long double format.  This is the case for e.g. i386.
> +# if __LDBL_MANT_DIG__ == 64
> +#  define LONG_DOUBLE_KIND LDK_FLOAT80
> +# elif __LDBL_MANT_DIG__ == 113
> +#  define LONG_DOUBLE_KIND LDK_BINARY128
> +# elif __LDBL_MANT_DIG__ == 106
> +#  define LONG_DOUBLE_KIND LDK_IBM128
> +# endif
> +#endif
> +#if !defined(LONG_DOUBLE_KIND)
> +# define LONG_DOUBLE_KIND LDK_UNSUPPORTED
> +#endif
> +
> +namespace
> +{
> +  namespace ryu
> +  {
> +#include "ryu/common.h"
> +#include "ryu/digit_table.h"
> +#include "ryu/d2s_intrinsics.h"
> +#include "ryu/d2s_full_table.h"
> +#include "ryu/d2fixed_full_table.h"
> +#include "ryu/f2s_intrinsics.h"
> +#include "ryu/d2s.c"
> +#include "ryu/d2fixed.c"
> +#include "ryu/f2s.c"
> +
> +#ifdef __SIZEOF_INT128__
> +    namespace generic128
> +    {
> +      // Put the generic Ryu bits in their own namespace to avoid name 
> conflicts.
> +# include "ryu/generic_128.h"
> +# include "ryu/ryu_generic_128.h"
> +# include "ryu/generic_128.c"
> +    } // namespace generic128
> +
> +    using generic128::floating_decimal_128;
> +    using generic128::generic_binary_to_decimal;
> +
> +    int
> +    to_chars(const floating_decimal_128 v, char* const result)
> +    { return generic128::generic_to_chars(v, result); }
> +#endif
> +  } // namespace ryu
> +
> +  // A traits class that contains pertinent information about the binary
> +  // format of each of the floating-point types we support.
> +  template<typename T>
> +    struct floating_type_traits
> +    { };
> +
> +  template<>
> +    struct floating_type_traits<float>
> +    {
> +      // We (and Ryu) assume float has the IEEE binary32 format.
> +      static_assert(__FLT_MANT_DIG__ == 24);
> +      static constexpr int mantissa_bits = 23;
> +      static constexpr int exponent_bits = 8;
> +      static constexpr bool has_implicit_leading_bit = true;
> +      using mantissa_t = uint32_t;
> +      using shortest_scientific_t = ryu::floating_decimal_32;
> +
> +      static constexpr uint64_t pow10_adjustment_tab[]
> +     = { 0b0000000000011101011100110101100101101110000000000000000000000000 
> };
> +    };
> +
> +  template<>
> +    struct floating_type_traits<double>
> +    {
> +      // We (and Ryu) assume double has the IEEE binary64 format.
> +      static_assert(__DBL_MANT_DIG__ == 53);
> +      static constexpr int mantissa_bits = 52;
> +      static constexpr int exponent_bits = 11;
> +      static constexpr bool has_implicit_leading_bit = true;
> +      using mantissa_t = uint64_t;
> +      using shortest_scientific_t = ryu::floating_decimal_64;
> +
> +      static constexpr uint64_t pow10_adjustment_tab[]
> +     = { 0b0000000000000000000000011000110101110111000001100101110000111100,
> +         0b0111100011110101011000011110000000110110010101011000001110011111,
> +         0b0101101100000000011100100100111100110110110100010001010101110000,
> +         0b0011110010111000101111110101100011101100010001010000000101100111,
> +         0b0001010000011001011100100001010000010101101000001101000000000000 
> };
> +    };
> +
> +#if LONG_DOUBLE_KIND == LDK_BINARY64
> +  // When long double is equivalent to double, we just forward the long 
> double
> +  // overloads to the double overloads, so we don't need to define a a
> +  // floating_type_traits<long double> specialization in this case.
> +#elif LONG_DOUBLE_KIND == LDK_FLOAT80
> +  template<>
> +    struct floating_type_traits<long double>
> +    {
> +      static constexpr int mantissa_bits = 64;
> +      static constexpr int exponent_bits = 15;
> +      static constexpr bool has_implicit_leading_bit = false;
> +      using mantissa_t = uint64_t;
> +      using shortest_scientific_t = ryu::floating_decimal_128;
> +
> +      static constexpr uint64_t pow10_adjustment_tab[]
> +     = { 0b0000000000000000000000000000110101011111110100010100110000011101,
> +         0b1001100101001111010011011111101000101111110001011001011101110000,
> +         0b0000101111111011110010001000001010111101011110111111010100011001,
> +         0b0011100000011111001101101011111001111100100010000101001111101001,
> +         0b0100100100000000100111010010101110011000110001101101110011001010,
> +         0b0111100111100010100000010011000010010110101111110101000011110100,
> +         0b1010100111100010011110000011011101101100010110000110101010101010,
> +         0b0000001111001111000000101100111011011000101000110011101100110010,
> +         0b0111000011100100101101010100001101111110101111001000010011111111,
> +         0b0010111000100110100100100010101100111010110001101010010111001000,
> +         0b0000100000010110000011001001000111000001111010100101101000001111,
> +         0b0010101011101000111100001011000010011101000101010010010000101111,
> +         0b1011111011101101110010101011010001111000101000101101011001100011,
> +         0b1010111011011011110111110011001010000010011001110100101101000101,
> +         0b0011000001110110011010010000011100100011001011001100001101010110,
> +         0b0100011111011000111111101000011110000010111110101001000000001001,
> +         0b1110000001110001001101101110011000100000001010000111100010111010,
> +         0b1110001001010011101000111000001000010100110000010110100011110000,
> +         0b0000011010110000110001111000011111000011001101001101001001000110,
> +         0b1010010111001000101001100101010110100100100010010010000101000010,
> +         0b1011001110000111100010100110000011100011111001110111001100000101,
> +         0b0110101001001000010110001000010001010101110101100001111100011001,
> +         0b1111100011110101011110011010101001010010100011000010110001101001,
> +         0b0100000100001000111101011100010011011111011001000000001100011000,
> +         0b1110111111000111100101110111110000000011001110011100011011011001,
> +         0b1100001100100000010001100011011000111011110000110011010101000011,
> +         0b1111111011100111011101001111111000010000001111010111110010000100,
> +         0b1110111001111110101111000101000000001010001110011010001000111010,
> +         0b1000010001011000101111111010110011111101110101101001111000111010,
> +         0b0100000111101001000111011001101000001010111011101001101111000100,
> +         0b0000011100110001000111011100111100110001101111111010110111100000,
> +         0b0000011101011100100110010011110101010100010011110010010111010000,
> +         0b0011011001100111110101111100001001101110101101001110110011110110,
> +         0b1011000101000001110100111001100100111100110011110000000001101000,
> +         0b1011100011110100001001110101010110111001000000001011101001011110,
> +         0b1111001010010010100000010110101010101011101000101000000000001100,
> +         0b1000001111100100111001110101100001010011111111000001000011110000,
> +         0b0001011101001000010000101101111000001110101100110011001100110111,
> +         0b1110011100000010101011011111001010111101111110100000011100000011,
> +         0b1001110110011100101010011110100010110001001110110000101011100110,
> +         0b1001101000100011100111010000011011100001000000110101100100001001,
> +         0b1010111000101000101101010111000010001100001010100011111100000100,
> +         0b0111101000100011000101101011111011100010001101110111001111001011,
> +         0b1110100111010110001110110110000000010110100011110000010001111100,
> +         0b1100010100011010001011001000111001010101011110100101011001000000,
> +         0b0000110001111001100110010110111010101101001101000000000010010101,
> +         0b0001110111101000001111101010110010010000111110111100000111110100,
> +         0b0111110111001001111000110001101101001010101110110101111110000100,
> +         0b0000111110111010101111100010111010011100010110011011011001000001,
> +         0b1010010100100100101110111111111000101100000010111111101101000110,
> +         0b1000100111111101100011001101000110001000000100010101010100001101,
> +         0b1100101010101000111100101100001000110001110010100000000010110101,
> +         0b1010000100111101100100101010010110100010000000110101101110000100,
> +         0b1011111011110001110000100100000000001010111010001101100000100100,
> +         0b0111101101100011001110011100000001000101101101111000100111011111,
> +         0b0100111010010011011001010011110100001100111010010101111111100011,
> +         0b0010001001011000111000001100110111110111110010100011000110110110,
> +         0b0101010110000000010000100000110100111011111101000100000111010010,
> +         0b0110000011011101000001010100110101101110011100110101000000001001,
> +         0b1101100110100000011000001111000100100100110001100110101010101100,
> +         0b0010100101010110010010001010101000011111111111001011001010001111,
> +         0b0111001010001111001100111001010101001000110101000011110000001000,
> +         0b0110010011001001001111110001010010001011010010001101110110110011,
> +         0b0110010100111011000100111000001001101011111001110010111110111111,
> +         0b0101110111001001101100110100101001110010101110011001101110001000,
> +         0b0100110101010111011010001100010111100011010011111001010100111000,
> +         0b0111000110110111011110100100010111000110000110110110110001111110,
> +         0b1000101101010100100100111110100011110110110010011001110011110101,
> +         0b1001101110101001010100111101101011000101000010110101101111110000,
> +         0b0100100101001011011001001011000010001101001010010001010110101000,
> +         0b0010100001001011100110101000010110000111000111000011100101011011,
> +         0b0110111000011001111101101011111010001000000010101000101010011110,
> +         0b1000110110100001111011000001111100001001000000010110010100100100,
> +         0b1001110100011111100111101011010000010101011100101000010010100110,
> +         0b0001010110101110100010101010001110110110100011101010001001111100,
> +         0b1010100101101100000010110011100110100010010000100100001110000100,
> +         0b0001000000010000001010000010100110000001110100111001110111101101,
> +         0b1100000000000000000000000000000000000000000000000000000000000000 
> };
> +    };
> +#elif LONG_DOUBLE_KIND == LDK_BINARY128
> +  template<>
> +    struct floating_type_traits<long double>
> +    {
> +      static constexpr int mantissa_bits = 112;
> +      static constexpr int exponent_bits = 15;
> +      static constexpr bool has_implicit_leading_bit = true;
> +      using mantissa_t = unsigned __int128;
> +      using shortest_scientific_t = ryu::floating_decimal_128;
> +
> +      static constexpr uint64_t pow10_adjustment_tab[]
> +     = { 0b0000000000000000000000000000000000000000000000000100000010000000,
> +         0b1011001111110100000100010101101110011100100110000110010110011000,
> +         0b1010100010001101111111000000001101010010100010010000111011110111,
> +         0b1011111001110001111000011111000010110111000111110100101010100101,
> +         0b0110100110011110011011000011000010011001110001001001010011100011,
> +         0b0000011111110010101111101011101010000110011111100111001110100111,
> +         0b0100010101010110000010111011110100000010011001001010001110111101,
> +         0b1101110111000010001101100000110100000111001001101011000101011011,
> +         0b0100111011101101010000001101011000101100101110010010110000101011,
> +         0b0100000110111000000110101000010011101000110100010110000011101101,
> +         0b1011001101001000100001010001100100001111011101010101110001010110,
> +         0b1000000001000000101001110010110010001111101101010101001100000110,
> +         0b0101110110100110000110000001001010111110001110010000111111010011,
> +         0b1010001111100111000100011100100100111100100101000001011001000111,
> +         0b1010011000011100110101100111001011100101111111100001110100000100,
> +         0b1100011100100010100000110001001010000000100000001001010111011101,
> +         0b0101110000100011001111101101000000100110000010010111010001111010,
> +         0b0100111100011010110111101000100110000111001001101100000001111100,
> +         0b1100100100111110101011000100000101011010110111000111110100110101,
> +         0b0110010000010111010100110011000000111010000010111011010110000100,
> +         0b0101001001010010110111010111000101011100000111100111000001110010,
> +         0b1101111111001011101010110001000111011010111101001011010110100100,
> +         0b0001000100110000011111101011001101110010110110010000000011100100,
> +         0b0001000000000101001001001000000000011000100011001110101001001110,
> +         0b0010010010001000111010011011100001000110011011011110110100111000,
> +         0b0000100110101100000111100010100100011100110111011100001111001100,
> +         0b1011111010001110001100000011110111111111100000001011111111101100,
> +         0b0000011100001111010101110000100110111100101101110111101001000001,
> +         0b1100010001110110111100001001001101101000011100000010110101001011,
> +         0b0100101001101011111001011110101101100011011111011100101010101111,
> +         0b0001101001111001110000101101101100001011010001011110011101000010,
> +         0b1111000000101001101111011010110011101110100001011011001011100010,
> +         0b0101001010111101101100001111100010010110001101001000001101100100,
> +         0b0101100101011110001100101011111000111001111001001001101101100001,
> +         0b1111001101010010100100011011000110110010001111000111010001001101,
> +         0b0001110010011000000001000110110111011000011100001000011001110111,
> +         0b0100001011011011011011110011101100100101111111101100101000001110,
> +         0b0101011110111101010111100111101111000101111111111110100011011010,
> +         0b1110101010001001110100000010110111010111111010111110100110010110,
> +         0b1010001111100001001100101000110100001100011100110010000011010111,
> +         0b1111111101101111000100111100000101011000001110011011101010111001,
> +         0b1111101100001110100101111101011001000100000101110000110010100011,
> +         0b1001010110110101101101000101010001010000101011011111010011010000,
> +         0b0111001110110011101001100111000001000100001010110000010000001101,
> +         0b0101111100111110100111011001111001111011011110010111010011101010,
> +         0b1110111000000001100100111001100100110001011011001110101111110111,
> +         0b0001010001001101010111101010011111000011110001101101011001111111,
> +         0b0101000011100011010010001101100001011101011010100110101100100010,
> +         0b0001000101011000100101111100110110000101101101111000110001001011,
> +         0b0101100101001011011000010101000000010100011100101101000010011111,
> +         0b1000010010001011101001011010100010111011110100110011011000100111,
> +         0b1000011011100001010111010111010011101100100010010010100100101001,
> +         0b1001001001010111110101000010111010000000101111010100001010010010,
> +         0b0011011110110010010101111011000001000000000011011111000011111011,
> +         0b1011000110100011001110000001000100000001011100010111010010011110,
> +         0b0111101110110101110111110000011000000100011100011000101101101110,
> +         0b1001100101111011011100011110101011001111100111101010101010110111,
> +         0b1100110010010001100011001111010000000100011101001111011101001111,
> +         0b1000111001111010100101000010000100000001001100101010001011001101,
> +         0b0011101011110000110010100101010100110010100001000010101011111101,
> +         0b1100000000000110000010101011000000011101000110011111100010111111,
> +         0b0010100110000011011100010110111100010110101100110011101110001101,
> +         0b0010111101010011111000111001111100110111111100100011110001101110,
> +         0b1001110111001001101001001001011000010100110001000000100011010110,
> +         0b0011110101100111011011111100001000011001010100111100100101111010,
> +         0b0010001101000011000010100101110000010101101000100110000100001010,
> +         0b0010000010100110010101100101110011101111000111111111001001100001,
> +         0b0100111111011011011011100111111011000010011101101111011111110110,
> +         0b1111111111010110101011101000100101110100001110001001101011100111,
> +         0b1011111101000101110000111100100010111010100001010000010010110010,
> +         0b1111010101001011101011101010000100110110001110111100100110111111,
> +         0b1011001101000001001101000010101010010110010001100001011100011010,
> +         0b0101001011011101010001110100010000010001111100100100100001001101,
> +         0b0010100000111001100011000101100101000001111100111001101000000010,
> +         0b1011001111010101011001000100100110100100110111110100000110111000,
> +         0b0101011111010011100011010010111101110010100001111111100010001001,
> +         0b0010111011101100100000000000001111111010011101100111100001001101,
> +         0b1101000000000000000000000000000000000000000000000000000000000000 
> };
> +    };
> +#elif LONG_DOUBLE_KIND == LDK_IBM128
> +  template<>
> +    struct floating_type_traits<long double>
> +    {
> +      static constexpr int mantissa_bits = 105;
> +      static constexpr int exponent_bits = 11;
> +      static constexpr bool has_implicit_leading_bit = true;
> +      using mantissa_t = unsigned __int128;
> +      using shortest_scientific_t = ryu::floating_decimal_128;
> +
> +      static constexpr uint64_t pow10_adjustment_tab[]
> +     = { 0b0000000000000000000000000000000000000000000000001000000100000000,
> +         0b0000000000000000000100000000000000000000001000000000000000000010,
> +         0b0000100000000000000000001001000000000000000001100100000000000000,
> +         0b0011000000000000000000000000000001110000010000000000000000000000,
> +         0b0000100000000000001000000000000000000000000000100000000000000000 
> };
> +    };
> +#endif
> +
> +  // An IEEE-style decomposition of a floating-point value of type T.
> +  template<typename T>
> +    struct ieee_t
> +    {
> +      typename floating_type_traits<T>::mantissa_t mantissa;
> +      uint32_t biased_exponent;
> +      bool sign;
> +    };
> +
> +  // Decompose the floating-point value into its IEEE components.
> +  template<typename T>
> +    ieee_t<T>
> +    get_ieee_repr(const T value)
> +    {
> +      constexpr int mantissa_bits = floating_type_traits<T>::mantissa_bits;
> +      constexpr int exponent_bits = floating_type_traits<T>::exponent_bits;
> +      constexpr int total_bits = mantissa_bits + exponent_bits + 1;
> +
> +      constexpr auto get_uint_t = [] {
> +     if constexpr (total_bits <= 32)
> +       return uint32_t{};
> +     else if constexpr (total_bits <= 64)
> +       return uint64_t{};
> +#ifdef __SIZEOF_INT128__
> +     else if constexpr (total_bits <= 128)
> +       return (unsigned __int128){};
> +#endif
> +      };
> +      using uint_t = decltype(get_uint_t());
> +      uint_t value_bits = 0;
> +      memcpy(&value_bits, &value, sizeof(value));
> +
> +      ieee_t<T> ieee_repr;
> +      ieee_repr.mantissa = value_bits & ((uint_t{1} << mantissa_bits) - 1u);
> +      ieee_repr.biased_exponent
> +     = (value_bits >> mantissa_bits) & ((uint_t{1} << exponent_bits) - 1u);
> +      ieee_repr.sign = (value_bits >> (mantissa_bits + exponent_bits)) & 1;
> +      return ieee_repr;
> +    }
> +
> +#if LONG_DOUBLE_KIND == LDK_IBM128
> +  template<>
> +    ieee_t<long double>
> +    get_ieee_repr(const long double value)
> +    {
> +      // The layout of __ibm128 isn't compatible with the standard IEEE 
> format.
> +      // So we transform it into an IEEE-compatible format, suitable for
> +      // consumption by the generic Ryu API, with an 11-bit exponent and 
> 105-bit
> +      // mantissa (plus an implicit leading bit).  We use the exponent and 
> sign
> +      // of the high part, and we merge the mantissa of the high part with 
> the
> +      // mantissa (and the implicit leading bit) of the low part.
> +      using uint_t = unsigned __int128;
> +      uint_t value_bits = 0;
> +      memcpy(&value_bits, &value, sizeof(value_bits));
> +
> +      const uint64_t value_hi = value_bits;
> +      const uint64_t value_lo = value_bits >> 64;
> +
> +      uint64_t mantissa_hi = value_hi & ((1ull << 52) - 1);
> +      unsigned exponent_hi = (value_hi >> 52) & ((1ull << 11) - 1);
> +      const int sign_hi = (value_hi >> 63) & 1;
> +
> +      uint64_t mantissa_lo = value_lo & ((1ull << 52) - 1);
> +      const unsigned exponent_lo = (value_lo >> 52) & ((1ull << 11) - 1);
> +      const int sign_lo = (value_lo >> 63) & 1;
> +
> +     {
> +       // The following code for adjusting the low-part mantissa to combine
> +       // it with the high-part mantissa is taken from the glibc source file
> +       // sysdeps/ieee754/ldbl-128ibm/printf_fphex.c.
> +       mantissa_lo <<= 7;
> +       if (exponent_lo != 0)
> +         mantissa_lo |= (1ull << (52 + 7));
> +       else
> +         mantissa_lo <<= 1;
> +
> +       const int ediff = exponent_hi - exponent_lo - 53;
> +       if (ediff > 63)
> +         mantissa_lo = 0;
> +       else if (ediff > 0)
> +         mantissa_lo >>= ediff;
> +       else if (ediff < 0)
> +         mantissa_lo <<= -ediff;
> +
> +       if (sign_lo != sign_hi && mantissa_lo != 0)
> +         {
> +           mantissa_lo = (1ull << 60) - mantissa_lo;
> +           if (mantissa_hi == 0)
> +             {
> +               mantissa_hi = 0xffffffffffffeLL | (mantissa_lo >> 59);
> +               mantissa_lo = 0xfffffffffffffffLL & (mantissa_lo << 1);
> +               exponent_hi--;
> +             }
> +           else
> +             mantissa_hi--;
> +         }
> +     }
> +
> +      ieee_t<long double> ieee_repr;
> +      ieee_repr.mantissa = ((uint_t{mantissa_hi} << 64)
> +                         | (uint_t{mantissa_lo} << 4)) >> 11;
> +      ieee_repr.biased_exponent = exponent_hi;
> +      ieee_repr.sign = sign_hi;
> +      return ieee_repr;
> +    }
> +#endif
> +
> +  // Invoke Ryu to obtain the shortest scientific form for the given
> +  // floating-point number.
> +  template<typename T>
> +    typename floating_type_traits<T>::shortest_scientific_t
> +    floating_to_shortest_scientific(const T value)
> +    {
> +      if constexpr (std::is_same_v<T, float>)
> +     return ryu::floating_to_fd32(value);
> +      else if constexpr (std::is_same_v<T, double>)
> +     return ryu::floating_to_fd64(value);
> +#ifdef __SIZEOF_INT128__
> +      else if constexpr (std::is_same_v<T, long double>)
> +     {
> +       constexpr int mantissa_bits
> +         = floating_type_traits<T>::mantissa_bits;
> +       constexpr int exponent_bits
> +         = floating_type_traits<T>::exponent_bits;
> +       constexpr bool has_implicit_leading_bit
> +         = floating_type_traits<T>::has_implicit_leading_bit;
> +
> +       const auto [mantissa, exponent, sign] = get_ieee_repr(value);
> +       return ryu::generic_binary_to_decimal(mantissa, exponent, sign,
> +                                             mantissa_bits, exponent_bits,
> +                                             !has_implicit_leading_bit);
> +     }
> +#endif
> +    }
> +
> +  // This subroutine returns true if the shortest scientific form fd is a
> +  // positive power of 10, and the floating-point number that has this 
> shortest
> +  // scientific form is smaller than this power of 10.
> +  //
> +  // For instance, the exactly-representable 64-bit number
> +  // 99999999999999991611392.0 has the shortest scientific form 1e23, so its
> +  // exact value is smaller than its shortest scientific form.
> +  //
> +  // For these powers of 10 the length of the fixed form is one digit less
> +  // than what the scientific exponent suggests.
> +  //
> +  // This subroutine inspects a lookup table to detect when fd is such a
> +  // "rounded up" power of 10.
> +  template<typename T>
> +    bool
> +    is_rounded_up_pow10_p(const typename
> +                       floating_type_traits<T>::shortest_scientific_t fd)
> +    {
> +      if (fd.exponent < 0 || fd.mantissa != 1) [[likely]]
> +     return false;
> +
> +      constexpr auto& pow10_adjustment_tab
> +     = floating_type_traits<T>::pow10_adjustment_tab;
> +      __glibcxx_assert(fd.exponent/64 < 
> (int)std::size(pow10_adjustment_tab));
> +      return (pow10_adjustment_tab[fd.exponent/64]
> +           & (1ull << (63 - fd.exponent%64)));
> +    }
> +
> +  int
> +  get_mantissa_length(const ryu::floating_decimal_32 fd)
> +  { return ryu::decimalLength9(fd.mantissa); }
> +
> +  int
> +  get_mantissa_length(const ryu::floating_decimal_64 fd)
> +  { return ryu::decimalLength17(fd.mantissa); }
> +
> +#ifdef __SIZEOF_INT128__
> +  int
> +  get_mantissa_length(const ryu::floating_decimal_128 fd)
> +  { return ryu::generic128::decimalLength(fd.mantissa); }
> +#endif
> +} // anon namespace
> +
> +namespace std _GLIBCXX_VISIBILITY(default)
> +{
> +_GLIBCXX_BEGIN_NAMESPACE_VERSION
> +
> +// This subroutine of __floating_to_chars_* handles writing nan, inf and 0 in
> +// all formatting modes.
> +template<typename T>
> +  static optional<to_chars_result>
> +  __handle_special_value(char* first, char* const last, const T value,
> +                      const chars_format fmt, const int precision)
> +  {
> +    __glibcxx_assert(precision >= 0);
> +
> +    string_view str;
> +    switch (__builtin_fpclassify(FP_NAN, FP_INFINITE, FP_NORMAL, 
> FP_SUBNORMAL,
> +                              FP_ZERO, value))
> +      {
> +      case FP_INFINITE:
> +     str = "-inf";
> +     break;
> +
> +      case FP_NAN:
> +     str = "-nan";
> +     break;
> +
> +      case FP_ZERO:
> +     break;
> +
> +      default:
> +      case FP_SUBNORMAL:
> +      case FP_NORMAL: [[likely]]
> +     return nullopt;
> +      }
> +
> +    if (!str.empty())
> +      {
> +     // We're formatting +-inf or +-nan.
> +     if (!__builtin_signbit(value))
> +       str.remove_prefix(strlen("-"));
> +
> +     if (last - first < (int)str.length())
> +       return {{last, errc::value_too_large}};
> +
> +     memcpy(first, &str[0], str.length());
> +     first += str.length();
> +     return {{first, errc{}}};
> +      }
> +
> +    // We're formatting 0.
> +    __glibcxx_assert(value == 0);
> +    const auto orig_first = first;
> +    const bool sign = __builtin_signbit(value);
> +    int expected_output_length;
> +    switch (fmt)
> +      {
> +      case chars_format::fixed:
> +      case chars_format::scientific:
> +      case chars_format::hex:
> +     expected_output_length = sign + 1;
> +     if (precision)
> +       expected_output_length += strlen(".") + precision;
> +     if (fmt == chars_format::scientific)
> +       expected_output_length += strlen("e+00");
> +     else if (fmt == chars_format::hex)
> +       expected_output_length += strlen("p+0");
> +     if (last - first < expected_output_length)
> +       return {{last, errc::value_too_large}};
> +
> +     if (sign)
> +       *first++ = '-';
> +     *first++ = '0';
> +     if (precision)
> +       {
> +         *first++ = '.';
> +         memset(first, '0', precision);
> +         first += precision;
> +       }
> +     if (fmt == chars_format::scientific)
> +       {
> +         memcpy(first, "e+00", 4);
> +         first += 4;
> +       }
> +     else if (fmt == chars_format::hex)
> +       {
> +         memcpy(first, "p+0", 3);
> +         first += 3;
> +       }
> +     break;
> +
> +      case chars_format::general:
> +      default: // case chars_format{}:
> +     expected_output_length = sign + 1;
> +     if (last - first < expected_output_length)
> +       return {{last, errc::value_too_large}};
> +
> +     if (sign)
> +       *first++ = '-';
> +     *first++ = '0';
> +     break;
> +      }
> +    __glibcxx_assert(first - orig_first == expected_output_length);
> +    return {{first, errc{}}};
> +  }
> +
> +// This subroutine of the floating-point to_chars overloads performs
> +// hexadecimal formatting.
> +template<typename T>
> +  static to_chars_result
> +  __floating_to_chars_hex(char* first, char* const last, const T value,
> +                       const optional<int> precision)
> +  {
> +    if (precision.has_value() && precision.value() < 0) [[unlikely]]
> +      // A negative precision argument is treated as if it were omitted.
> +      return __floating_to_chars_hex(first, last, value, nullopt);
> +
> +    __glibcxx_requires_valid_range(first, last);
> +
> +    constexpr int mantissa_bits = floating_type_traits<T>::mantissa_bits;
> +    constexpr bool has_implicit_leading_bit
> +      = floating_type_traits<T>::has_implicit_leading_bit;
> +    constexpr int exponent_bits = floating_type_traits<T>::exponent_bits;
> +    constexpr int exponent_bias = (1u << (exponent_bits - 1)) - 1;
> +    using mantissa_t = typename floating_type_traits<T>::mantissa_t;
> +    constexpr int mantissa_t_width = sizeof(mantissa_t) * __CHAR_BIT__;
> +
> +    if (auto result = __handle_special_value(first, last, value,
> +                                          chars_format::hex,
> +                                          precision.value_or(0)))
> +      return *result;
> +
> +    // Extract the sign, mantissa and exponent from the value.
> +    const auto [ieee_mantissa, biased_exponent, sign] = get_ieee_repr(value);
> +    const bool is_normal_number = (biased_exponent != 0);
> +
> +    // Calculate the unbiased exponent.
> +    const int32_t unbiased_exponent = (is_normal_number
> +                                    ? biased_exponent - exponent_bias
> +                                    : 1 - exponent_bias);
> +
> +    // Shift the mantissa so that its bitwidth is a multiple of 4.
> +    constexpr unsigned rounded_mantissa_bits = (mantissa_bits + 3) / 4 * 4;
> +    static_assert(mantissa_t_width >= rounded_mantissa_bits);
> +    mantissa_t effective_mantissa
> +      = ieee_mantissa << (rounded_mantissa_bits - mantissa_bits);
> +    if (is_normal_number)
> +      {
> +     if constexpr (has_implicit_leading_bit)
> +       // Restore the mantissa's implicit leading bit.
> +       effective_mantissa |= mantissa_t{1} << rounded_mantissa_bits;
> +     else
> +       // The explicit mantissa bit should already be set.
> +       __glibcxx_assert(effective_mantissa & (mantissa_t{1} << (mantissa_bits
> +                                                                - 1u)));
> +      }
> +
> +    // Compute the shortest precision needed to print this value exactly,
> +    // disregarding trailing zeros.
> +    constexpr int full_hex_precision = (has_implicit_leading_bit
> +                                     ? (mantissa_bits + 3) / 4
> +                                     // With an explicit leading bit, we
> +                                     // use the four leading nibbles as the
> +                                     // hexit before the decimal point.
> +                                     : (mantissa_bits - 4 + 3) / 4);
> +    const int trailing_zeros = __countr_zero(effective_mantissa) / 4;
> +    const int shortest_full_precision = full_hex_precision - trailing_zeros;
> +    __glibcxx_assert(shortest_full_precision >= 0);
> +
> +    int written_exponent = unbiased_exponent;
> +    const int effective_precision = 
> precision.value_or(shortest_full_precision);
> +    if (effective_precision < shortest_full_precision)
> +      {
> +     // When limiting the precision, we need to determine how to round the
> +     // least significant printed hexit.  The following branchless
> +     // bit-level-parallel technique computes whether to round up the
> +     // mantissa bit at index N (according to round-to-nearest rules) when
> +     // dropping N bits of precision, for each index N in the bit vector.
> +     // This technique is borrowed from the MSVC implementation.
> +     using bitvec = mantissa_t;
> +     const bitvec round_bit = effective_mantissa << 1;
> +     const bitvec has_tail_bits = round_bit - 1;
> +     const bitvec lsb_bit = effective_mantissa;
> +     const bitvec should_round = round_bit & (has_tail_bits | lsb_bit);
> +
> +     const int dropped_bits = 4*(full_hex_precision - effective_precision);
> +     // Mask out the dropped nibbles.
> +     effective_mantissa >>= dropped_bits;
> +     effective_mantissa <<= dropped_bits;
> +     if (should_round & (mantissa_t{1} << dropped_bits))
> +       {
> +         // Round up the least significant nibble.
> +         effective_mantissa += mantissa_t{1} << dropped_bits;
> +         // Check and adjust for overflow of the leading nibble.  When the
> +         // type has an implicit leading bit, then the leading nibble
> +         // before rounding is either 0 or 1, so it can't overflow.
> +         if constexpr (!has_implicit_leading_bit)
> +           {
> +             // The only supported floating-point type with explicit
> +             // leading mantissa bit is LDK_FLOAT80, i.e. x86 80-bit
> +             // extended precision, and so we hardcode the below overflow
> +             // check+adjustment for this type.
> +             static_assert(mantissa_t_width == 64
> +                           && rounded_mantissa_bits == 64);
> +             if (effective_mantissa == 0)
> +               {
> +                 // We rounded up the least significant nibble and the
> +                 // mantissa overflowed, e.g f.fcp+10 with precision=1
> +                 // became 10.0p+10.  Absorb this extra hexit into the
> +                 // exponent to obtain 1.0p+14.
> +                 effective_mantissa
> +                   = mantissa_t{1} << (rounded_mantissa_bits - 4);
> +                 written_exponent += 4;
> +               }
> +           }
> +       }
> +      }
> +
> +    // Compute the leading hexit and mask it out from the mantissa.
> +    char leading_hexit;
> +    if constexpr (has_implicit_leading_bit)
> +      {
> +     const unsigned nibble = effective_mantissa >> rounded_mantissa_bits;
> +     __glibcxx_assert(nibble <= 2);
> +     leading_hexit = '0' + nibble;
> +     effective_mantissa &= ~(mantissa_t{0b11} << rounded_mantissa_bits);
> +      }
> +    else
> +      {
> +     const unsigned nibble = effective_mantissa >> (rounded_mantissa_bits-4);
> +     __glibcxx_assert(nibble < 16);
> +     leading_hexit = "0123456789abcdef"[nibble];
> +     effective_mantissa &= ~(mantissa_t{0b1111} << 
> (rounded_mantissa_bits-4));
> +     written_exponent -= 3;
> +      }
> +
> +    // Now before we start writing the string, determine the total length of
> +    // the output string and perform a single bounds check.
> +    int expected_output_length = sign + 1;
> +    if (effective_precision != 0)
> +      expected_output_length += strlen(".") + effective_precision;
> +    const int abs_written_exponent = abs(written_exponent);
> +    expected_output_length += (abs_written_exponent >= 10000 ? 
> strlen("p+ddddd")
> +                            : abs_written_exponent >= 1000 ? strlen("p+dddd")
> +                            : abs_written_exponent >= 100 ? strlen("p+ddd")
> +                            : abs_written_exponent >= 10 ? strlen("p+dd")
> +                            : strlen("p+d"));
> +    if (last - first < expected_output_length)
> +      return {last, errc::value_too_large};
> +
> +    const auto saved_first = first;
> +    // Write the negative sign and the leading hexit.
> +    if (sign)
> +      *first++ = '-';
> +    *first++ = leading_hexit;
> +
> +    if (effective_precision > 0)
> +      {
> +     *first++ = '.';
> +     int written_hexits = 0;
> +     // Extract and mask out the leading nibble after the decimal point,
> +     // write its corresponding hexit, and repeat until the mantissa is
> +     // empty.
> +     int nibble_offset = rounded_mantissa_bits;
> +     if constexpr (!has_implicit_leading_bit)
> +       // We already printed the entire leading hexit.
> +       nibble_offset -= 4;
> +     while (effective_mantissa != 0)
> +       {
> +         nibble_offset -= 4;
> +         const unsigned nibble = effective_mantissa >> nibble_offset;
> +         __glibcxx_assert(nibble < 16);
> +         *first++ = "0123456789abcdef"[nibble];
> +         ++written_hexits;
> +          effective_mantissa &= ~(mantissa_t{0b1111} << nibble_offset);
> +       }
> +     __glibcxx_assert(nibble_offset >= 0);
> +     __glibcxx_assert(written_hexits <= effective_precision);
> +     // Since the mantissa is now empty, every hexit hereafter must be '0'.
> +     if (int remaining_hexits = effective_precision - written_hexits)
> +       {
> +         memset(first, '0', remaining_hexits);
> +         first += remaining_hexits;
> +       }
> +      }
> +
> +    // Finally, write the exponent.
> +    *first++ = 'p';
> +    if (written_exponent >= 0)
> +      *first++ = '+';
> +    const to_chars_result result = to_chars(first, last, written_exponent);
> +    __glibcxx_assert(result.ec == errc{}
> +                  && result.ptr == saved_first + expected_output_length);
> +    return result;
> +  }
> +
> +template<typename T>
> +  static to_chars_result
> +  __floating_to_chars_shortest(char* first, char* const last, const T value,
> +                            chars_format fmt)
> +  {
> +    if (fmt == chars_format::hex)
> +      return __floating_to_chars_hex(first, last, value, nullopt);
> +
> +    __glibcxx_assert(fmt == chars_format::fixed
> +                  || fmt == chars_format::scientific
> </cut>

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: [TCWG CI] 447.dealII:libstdc++.so.6.0.29 grew in size by 12% after gcc: libstdc++: Add floating-point std::to_chars implementation

Reply via email to