Re: [PATCH] libstdc++: Implement debug format for strings and charcters formatters [PR109162]

Patrick Palka Tue, 08 Apr 2025 15:18:23 -0700

On Wed, 2 Apr 2025, Tomasz Kamiński wrote:

> This patch implements part P2372R3 that specified debug (escaped)
> format for the stings and characters sequecenes. This include both
> handling of the '?' formatt specifier and set_debug_format member.
> 
> To indicate partial support we define __glibcxx_format_ranges macro
> value 1, without defining __cpp_lib_format_ranges.
> 
> We provide two separate escaping routines depending on the literal
> encoding for the corresponding character types. If the charcter
> encoding we follow the specification for the standard
> (__format::__write_escaped_unicode).
> For other encodings, we escape only characters in range [0x00, 0x80),
> interpreting them as ACII values: [0x00, 0x20), 0x7f and  '\t', '\r',
> '\n', '\\', '"', '\'' are escaped. We assume every character outside
> this range is printable (__format::_write_escpaed_ascii).
> In particular we do not yet implement special handling of shift
> sequences.
> 
> For Unicode escaping a new __escape_edges table is introduced,
> that encodes information if character belongs tp General_Category
> that is escaped by the standard (Control or Other). This table
> is generated from DerivedGeneralCategory.txt provided by Unicode.
> Only boolean flag is preserved to reduce the number of entires.
> The additional rules for escaping are handled by __should_escape_unicode.
> 
> When width of precision is specified, we emit escaped string
> to the temporary buffer and format the resulting string according
> ot the format spec. For characters fixed size stack buffer, for
> which a new _Fixedbuf_sink is introduced.
> 
> Finally this patch corrects handling of UTF-32LE and UTF32-BE
> in __unicode::__literal_encoding_is_unicode<_CharT>, and now they
> are properly recognized as unicode.
> 
> contrib/ChangeLog:
> 
>       * unicode/README:
>       Mentioned `DerivedGeneralCategory.txt`
>       * unicode/gen_libstdcxx_unicode_data.py:
>       Generation __escape_edges table from DerivedGeneralCategory.txt.
>       Update file name in comments.
>       * unicode/DerivedGeneralCategory.txt:
>       Copy of file distrubuted by Unicode Consortium
>       
> ftp://ftp.unicode.org/Public/UNIDATA/extracted/DerivedGeneralCategory.txt.
> 
> libstdc++-v3/ChangeLog:
> 
>       * include/bits/chrono_io.h (_GLIBCXX_WIDEN_, _GLIBCXX_WIDEN)
>       (__detail::_Widen): Moved to std/format file.
>       * include/bits/unicode-data.h:
>       Regnerate using contrib/unicode/gen_libstdcxx_unicode_data.py.
>       * include/bits/unicode.h (__unicode::_Utf_iterator::_M_units)
>       (__unicode::__should_escape_category): Define.
>       (__unicode::__literal_encoding_is_unicode<_CharT>):
>       Corrected handing for UTF-16 and UTF-32 with "LE" or "BE" suffix.
>       * include/bits/version.def:
>       Define __glibcxx_format_ranges without corresponding std name.
>       * include/bits/version.h: Regenerate.
>       * include/std/format (_GLIBCXX_WIDEN_, _GLIBCXX_WIDEN):
>       Moved from include/bits/chrono_io.h.
>       (__format::_Term_char, __format::_Escapes, __format::_Separators)
>       (__format::__should_escape_ascii, __format::__should_escape_unicode)
>       (__format::__write_escape_seq, __format::__write_escaped_char)
>       (__format::__write_escaped_acii, __format::__write_escaped_unicode)
>       (__format::__write_escaped): Define.
>       (__formatter_str::_M_format): Extracted non-escaped formatting.
>       (__formatter_str::format): Handle _Pres_esc.
>       (__formatter_int::_M_do_parse): Parse '?' if__glibcxx_format_ranges
>       if set.
>       (__formatter_int::_M_format_character_escaped): Define.
>       (formatter<_CharT, _CharT>::format, formatter<char, wchar_t>::format):
>       Handle _Pres_esc.
>       (__formatter_str::set_debug_format, formatter<...>::set_debug_format)
>       Guard with __glibcxx_format_ranges.
>       (__format::_Fixedbuf_sink): Define.
>       * testsuite/std/format/debug.cc: New test.
>       * testsuite/std/format/parse_ctx.cc (escaped_strings_supported):
>       Define to true if __glibcxx_format_ranges is defined.
>       * testsuite/std/format/string.cc (escaped_strings_supported):
>       Define to true if __glibcxx_format_ranges is defined.
> ---
> Testing on x86_64-linux. OK for trunk?
> 
> For dg-options could I cofigure a run with unicode and non-unicode
> encodings in same file? If so what would encoding that may be supported
> on most of the platforms we run tests on (value for -fexec-charset=).


Not sure if you resolved this already but one way to generally run the
same test with a different set of flags is to create a new test file
that #includes the original one and sets different dg-options (and
duplicates the other directives as appropriate), e.g.

libstdc++-v3/testsuite/std/format/debug_nonunicode.cc:

// { dg-options "-fexec-charset=... -fwide-exec-charset=..." }
// { dg-do run { target c++23 } }
// { dg-add-options no_pch }

#include "debug.cc"

Dejagnu directives are parsed before the preprocessor is run.

Re: [PATCH] libstdc++: Implement debug format for strings and charcters formatters [PR109162]

Reply via email to