https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108976
--- Comment #3 from Jonathan Wakely <redi at gcc dot gnu.org> --- I have some new code for handling UTF-8 for std::print, and using that code your relaxed u8str gets converted to 12 U+FFFD code points when printed to a terminal, which I think is correct. #include <print> int main() { char u8str[] = "\uC800\uCBFF\uCC00\uCFFF"; std::println("valid UTF-8: {}", u8str); u8str[0] = u8str[3] = u8str[6] = u8str[9] = 0xED; // turn the C into D. // now the string is D800, DBFF, DC00 and DFFF encoded in relaxed UTF-8 // that allows surrogate code points. std::vprint_nonunicode("invalid UTF-8 printed raw: {}\n", std::make_format_args(u8str)); std::println("invalid UTF-8 printed safely: {}", u8str); } $ g++ -std=c++23 surr.cc && ./a.out && ./a.out | xxd valid UTF-8: 저쯿찀쿿 invalid UTF-8 printed raw: ������������ invalid UTF-8 printed safely: ������������ 00000000: 7661 6c69 6420 5554 462d 383a 20ec a080 valid UTF-8: ... 00000010: ecaf bfec b080 ecbf bf0a 696e 7661 6c69 ..........invali 00000020: 6420 5554 462d 3820 7072 696e 7465 6420 d UTF-8 printed 00000030: 7261 773a 20ed a080 edaf bfed b080 edbf raw: ........... 00000040: bf0a 696e 7661 6c69 6420 5554 462d 3820 ..invalid UTF-8 00000050: 7072 696e 7465 6420 7361 6665 6c79 3a20 printed safely: 00000060: efbf bdef bfbd efbf bdef bfbd efbf bdef ................ 00000070: bfbd efbf bdef bfbd efbf bdef bfbd efbf ................ 00000080: bdef bfbd 0a ..... The new code is also much faster, so I'm thinking of rewriting some of the src/c++11/codecvt.cc facets to use it. But that's a longer term project, we should fix this bug first.