Peppe had said: >> I'm not following this. If I do >> >> std::format("{} {}", utf8string, latin1string) >> >> what am I supposed to get out? A string which is a mix of two different >> encodings? I don't think that's ever possibly wanted.
Ivan Solovev (7 June 2024 10:53) replied: > Yes, that's exactly what I mean. And, by the way, that's exactly how > std::format is working now. > If you write something like this: > > std::string utf8{"\xC3\x84\xC3\x96\xC3\x9C"}; // ÄÖÜ in UTF-8 > std::string latin1{"\xC4\xD6\xDC"}; // ÄÖÜ in Latin1 > std::string buffer; > std::format_to(std::back_inserter(buffer), "{} {}", utf8, latin1); > > Then the resulting buffer will simply contain > "\xC3\x84\xC3\x96\xC3\x9C \xC4\xD6\xDC". > > So, std::format does not care about the encodings. To be fair, std::format is given no information above as to the encoding of the two strings - it doesn't know the names you've given the variables, only their values - so this isn't really comparable to the case where Qt code uses QLatin1StringView and QUtf8StringView; their types tell Qt what encoding they're using. My guess is that std::string is really the equivalent of QByteArray, not a String in Qt's use of the term (i.e. it doesn't know its encoding). If the standard does not address the question of encodings, I suggest we (via Ville) poke them about that. For my part, I'd say The Right Thing To Do is to use UTF-8 consistently in all use of std::string so that client code that needs to do something with the result that needs a different encoding can do its own conversion, knowing what the baseline is for what it's getting from Qt. At least some uses of std::format shall be to produce data to send to a database or over a socket, where the current user's settings are irrelevant, so using the user's local encoding strikes me as A Bad Plan. It probably makes sense for std::print but, even then, we'll be better off if we know the content being passed to std::print is in UTF-8, because then print knows it always has to do to-local conversion, rather than having to muck about determining which encoding it's being fed. Other consumers of std::format's results (such as a database or socket) may need some other encoding: whatever's interacting with those has, in any case, to work out the right encoding to use for it and convert to it if needed; its life shall be easier if it knows the std::format result is always UTF-8, rather than having to query the user's settings to determine whether they need to convert (whether it's even possible) and how to do the conversion. And, of course, using the user's native encoding may be broken because it simply lacks a representation for some of the content you're formatting, meaning that no down-stream processing can recover from the fact that std::format didn't succeed in faithfully representing the data it was given. As usual, the only even remotely sane answer for 8-bit is UTF-8, Eddy. -- Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development