On Wed, 2 Apr 2025, Tomasz KamiĆski wrote:
> This patch implements part P2372R3 that specified debug (escaped)
> format for the stings and characters sequecenes. This include both
> handling of the '?' formatt specifier and set_debug_format member.
>
> To indicate partial support we define __glibcxx_format_ranges macro
> value 1, without defining __cpp_lib_format_ranges.
>
> We provide two separate escaping routines depending on the literal
> encoding for the corresponding character types. If the charcter
> encoding we follow the specification for the standard
> (__format::__write_escaped_unicode).
> For other encodings, we escape only characters in range [0x00, 0x80),
> interpreting them as ACII values: [0x00, 0x20), 0x7f and '\t', '\r',
> '\n', '\\', '"', '\'' are escaped. We assume every character outside
> this range is printable (__format::_write_escpaed_ascii).
> In particular we do not yet implement special handling of shift
> sequences.
>
> For Unicode escaping a new __escape_edges table is introduced,
> that encodes information if character belongs tp General_Category
> that is escaped by the standard (Control or Other). This table
> is generated from DerivedGeneralCategory.txt provided by Unicode.
> Only boolean flag is preserved to reduce the number of entires.
> The additional rules for escaping are handled by __should_escape_unicode.
>
> When width of precision is specified, we emit escaped string
> to the temporary buffer and format the resulting string according
> ot the format spec. For characters fixed size stack buffer, for
> which a new _Fixedbuf_sink is introduced.
>
> Finally this patch corrects handling of UTF-32LE and UTF32-BE
> in __unicode::__literal_encoding_is_unicode<_CharT>, and now they
> are properly recognized as unicode.
>
> contrib/ChangeLog:
>
> * unicode/README:
> Mentioned `DerivedGeneralCategory.txt`
> * unicode/gen_libstdcxx_unicode_data.py:
> Generation __escape_edges table from DerivedGeneralCategory.txt.
> Update file name in comments.
> * unicode/DerivedGeneralCategory.txt:
> Copy of file distrubuted by Unicode Consortium
>
> ftp://ftp.unicode.org/Public/UNIDATA/extracted/DerivedGeneralCategory.txt.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/chrono_io.h (_GLIBCXX_WIDEN_, _GLIBCXX_WIDEN)
> (__detail::_Widen): Moved to std/format file.
> * include/bits/unicode-data.h:
> Regnerate using contrib/unicode/gen_libstdcxx_unicode_data.py.
> * include/bits/unicode.h (__unicode::_Utf_iterator::_M_units)
> (__unicode::__should_escape_category): Define.
> (__unicode::__literal_encoding_is_unicode<_CharT>):
> Corrected handing for UTF-16 and UTF-32 with "LE" or "BE" suffix.
> * include/bits/version.def:
> Define __glibcxx_format_ranges without corresponding std name.
> * include/bits/version.h: Regenerate.
> * include/std/format (_GLIBCXX_WIDEN_, _GLIBCXX_WIDEN):
> Moved from include/bits/chrono_io.h.
> (__format::_Term_char, __format::_Escapes, __format::_Separators)
> (__format::__should_escape_ascii, __format::__should_escape_unicode)
> (__format::__write_escape_seq, __format::__write_escaped_char)
> (__format::__write_escaped_acii, __format::__write_escaped_unicode)
> (__format::__write_escaped): Define.
> (__formatter_str::_M_format): Extracted non-escaped formatting.
> (__formatter_str::format): Handle _Pres_esc.
> (__formatter_int::_M_do_parse): Parse '?' if__glibcxx_format_ranges
> if set.
> (__formatter_int::_M_format_character_escaped): Define.
> (formatter<_CharT, _CharT>::format, formatter<char, wchar_t>::format):
> Handle _Pres_esc.
> (__formatter_str::set_debug_format, formatter<...>::set_debug_format)
> Guard with __glibcxx_format_ranges.
> (__format::_Fixedbuf_sink): Define.
> * testsuite/std/format/debug.cc: New test.
> * testsuite/std/format/parse_ctx.cc (escaped_strings_supported):
> Define to true if __glibcxx_format_ranges is defined.
> * testsuite/std/format/string.cc (escaped_strings_supported):
> Define to true if __glibcxx_format_ranges is defined.
> ---
> Testing on x86_64-linux. OK for trunk?
>
> For dg-options could I cofigure a run with unicode and non-unicode
> encodings in same file? If so what would encoding that may be supported
> on most of the platforms we run tests on (value for -fexec-charset=).
Not sure if you resolved this already but one way to generally run the
same test with a different set of flags is to create a new test file
that #includes the original one and sets different dg-options (and
duplicates the other directives as appropriate), e.g.
libstdc++-v3/testsuite/std/format/debug_nonunicode.cc:
// { dg-options "-fexec-charset=... -fwide-exec-charset=..." }
// { dg-do run { target c++23 } }
// { dg-add-options no_pch }
#include "debug.cc"
Dejagnu directives are parsed before the preprocessor is run.