Hi, I'm now working on introducing std::format support for some of the Qt types. I decided to start with the variety of Qt string types, and I have some open question regarding the implementation that I want to discuss.
First, I'd like to give a very short summary of my understanding of how std::format works in plain C++ when it comes to string formatting. Basically, we have two types of of formatters: * std::formatter<T, char> that handle std::string, const char *, and const char (&)[N] overloads. * std::formatter<T, wchar_t> that handle std::wstring, const wchar_t *, and const wchar_t (&)[N] overloads. The encoding for the wide char strings is usually known - it's either UTF-16 on Windows or UTF-32 on Linux and macOS. But what is the encoding for the char strings? The answer is that std::format does not care. It just tries to format the characters according to the format string. What you see in the terminal fully depends on your terminal encoding. So, back to the main question. How we should format Qt string types? The support for wide char formatters is straightforward - we can use QString::toStdWString() and be sure that we do not get any unreadable characters in the formatted output. I already have a WIP patch implementing it [0]. But what to do with the char formatters? Should we aim for the formatted strings to be always readable, or should we just not care, like the std::formatter<char> does? I see several options here: 1. Treat everything as UTF-8 Traditionally all QString(View) constructors taking char arrays or std::string treat the data as UTF-8. Also, QString::toStdString() provides a UTF-8 encoded std::string. So this would be sort of an expected behavior for Qt users. With this approach QLatin1StringView should also be converted to UTF-8 before being processed by the formatter. 2. Treat everything as Local8Bit Basically similar to the previous approach, but use toLocal8Bit() instead of toUtf8() when passing the data to the formatter. On Linux and macOS that would actually be equivalent to the first approach, because toLocal8Bit() simply assumes UTF-8 as an encoding. On Windows it would use CP_ACP to do the conversion. In this case the behavior would be similar to what qDebug() does. The drawback is that the formatted string might be different from the original one. For example, `Ü` might be replaced with `U`, some other symbols might be replaced with `?`, depending on the currently selected code page. Similarly to the previous option, QLatin1StringView and QUtf8StringView should also be converted to Local8Bit before formatting. 3. Try to not guess the encoding for the user Basically, for QUtf8StringView and QLatin1StringView their encoding is explicitly mentioned in the names of the classes, so we can just consider that if the users use these classes with std::format, they expect to have UTF-8 or Latin1 output respectively. Question here is how to deal with QString(View)? 3a. Convert it to UTF-8, because that's the pre-existing behavior which should be known for the users. 3b. Do not implement std::formatter<QString(View), char> at all and let the users explicitly convert QString to something else first. Option 3b is inconvenient and defeats the purpose of std::format support for Qt types, so I'd personally prefer 3a here. The WIP patch [1] now implements approach 2, but I'm actually leaning towards updating it to approach 3 (with 3a for QString(View)). [0]: https://codereview.qt-project.org/c/qt/qtbase/+/563859 [1]: https://codereview.qt-project.org/c/qt/qtbase/+/559758 I'd like to hear more opinions on how to proceed here, so please share your ideas! Best regards, Ivan ------------------------------ Ivan Solovev Senior Software Engineer The Qt Company GmbH Erich-Thilo-Str. 10 12489 Berlin, Germany ivan.solo...@qt.io www.qt.io Geschäftsführer: Mika Pälsi, Juha Varelius, Jouni Lintunen Sitz der Gesellschaft: Berlin, Registergericht: Amtsgericht Charlottenburg, HRB 144331 B
-- Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development