So to summarize, it sounds like the recommendation is to use QString and QTextStream by default unless it turns out to be too slow. In that case one can optimize by using QByteArray or non-Qt alternatives like re2c if you have control over the encoding.
If the data read in is later put into QStrings, I guess you can just as well use QString during parsing, as the strings will be converted to UTF16 at some point anyway. Is that right? I've written code for reading lots of different formats, some for files up to several hundred MBs, and each time I wonder if I am doing it the best way or not. Such a common task and so many ways to do it... Harald 2017-03-15 11:15 GMT+01:00 Konstantin Tokarev <annu...@yandex.ru>: > > > 15.03.2017, 12:59, "Viktor Engelmann" <viktor.engelm...@qt.io>: > > On 14.03.2017 10:50, Konstantin Tokarev wrote: > >> 14.03.2017, 12:44, "Harald Vistnes" <harald.vist...@gmail.com>: > >>> Hi, > >>> > >>> I'm currently working on reading and parsing large ASCII based text > files and I am wondering what is the current best practice. There are so > many classes and macros available, so it can be a bit confusing to know > what to use when. > >>> > >>> QString, QLatin1String, QByteArray, QStringLiteral, QLatin1Literal, > QByteArrayLiteral, plain C++ string literal, QStringRef, QStringBuilder and > so on. And then std::string and raw const char* strings. > >>> > >>> In my case I want to read a large ASCII file line by line, so I don't > need unicode. I need to compare a string with a literal, extract substrings > and convert some strings to numbers. > >>> > >>> Should I just use QString all the way, or is it faster to use some > other classes when you know you don't need unicode? > >> You should use QByteArray here, which is what QIODevice::readLine() > returns. Avoid using QString as long as possible because that will trigger > conversion of your text to UTF16 encoding, which may be totally useless in > your use case. > > > > If the program is small and you don't want it to ever grow beyond ASCII, > > using byte arrays is okay, but in my experience, if you want to be > > future-proof, you should interpret byte-arrays *as soon as possible*. > > > > Then you have an object with a controlled format and you can use that > > throughout your program, without worrying about encodings. > > In the modern world there is one portable encoding used for exchanging data > between systems: UTF-8. So in wide range of applications one can safely > assume all textual (!) byte array data to be UTF-8 or ASCII, and it causes > no > confusion. YMMV though. > > Things change if you intermix textual and non-textual QByteArray's near in > your > code, in this case it's better to store text strings in objects of > different class. > > > Keeping the > > data raw will increase the probability that some module does something > > wrong because it assumes a wrong encoding and breaks your results (i.e. > > using bytewise comparison for string comparison, which works for ASCII, > > but not for unicode - even if both have the same encoding, because there > > are letters that have multiple different unicode codepoints). > > > > -- > > > > Viktor Engelmann > > Software Engineer > > > > The Qt Company GmbH > > Rudower Chaussee 13 > > D-12489 Berlin > > > > viktor.engelm...@qt.io > > +49 151 26784521 > > > > http://qt.io > > Geschäftsführer: Mika Pälsi, Juha Varelius, Mika Harjuaho > > Sitz der Gesellschaft: Berlin > > Registergericht: Amtsgericht Charlottenburg, HRB 144331 B > > > > _______________________________________________ > > Interest mailing list > > Interest@qt-project.org > > http://lists.qt-project.org/mailman/listinfo/interest > > -- > Regards, > Konstantin > _______________________________________________ > Interest mailing list > Interest@qt-project.org > http://lists.qt-project.org/mailman/listinfo/interest >
_______________________________________________ Interest mailing list Interest@qt-project.org http://lists.qt-project.org/mailman/listinfo/interest