15.03.2017, 14:09, "Harald Vistnes" <harald.vist...@gmail.com>: > So to summarize, it sounds like the recommendation is to use QString and > QTextStream by default unless it turns out to be too slow. In that case one > can optimize by using QByteArray or non-Qt alternatives like re2c if you have > control over the encoding.
For the record, recent version of re2c support UTF16, so you can process QTextStream output with it as well. > > If the data read in is later put into QStrings, I guess you can just as well > use QString during parsing, as the strings will be converted to UTF16 at some > point anyway. Is that right? That largely depends on your data. If you have lots of fixed "control words" and numbers that won't end up in QStrings in your data structures resulting from parsing, this may be wrong. In other workload it may be perfectly resonable. For small amounts of data it just doesn't matter. > I've written code for reading lots of different formats, some for files up to > several hundred MBs, and each time I wonder if I am doing it the best way or > not. Such a common task and so many ways to do it... Run your code with profiler (e.g. callgrind) and see where most time is spent. Results may surprise you. > Harald > > 2017-03-15 11:15 GMT+01:00 Konstantin Tokarev <annu...@yandex.ru>: > >> 15.03.2017, 12:59, "Viktor Engelmann" <viktor.engelm...@qt.io>: >>> On 14.03.2017 10:50, Konstantin Tokarev wrote: >>>> 14.03.2017, 12:44, "Harald Vistnes" <harald.vist...@gmail.com>: >>>>> Hi, >>>>> >>>>> I'm currently working on reading and parsing large ASCII based text >>>>> files and I am wondering what is the current best practice. There are so >>>>> many classes and macros available, so it can be a bit confusing to know >>>>> what to use when. >>>>> >>>>> QString, QLatin1String, QByteArray, QStringLiteral, QLatin1Literal, >>>>> QByteArrayLiteral, plain C++ string literal, QStringRef, QStringBuilder >>>>> and so on. And then std::string and raw const char* strings. >>>>> >>>>> In my case I want to read a large ASCII file line by line, so I don't >>>>> need unicode. I need to compare a string with a literal, extract >>>>> substrings and convert some strings to numbers. >>>>> >>>>> Should I just use QString all the way, or is it faster to use some other >>>>> classes when you know you don't need unicode? >>>> You should use QByteArray here, which is what QIODevice::readLine() >>>> returns. Avoid using QString as long as possible because that will trigger >>>> conversion of your text to UTF16 encoding, which may be totally useless in >>>> your use case. >>> >>> If the program is small and you don't want it to ever grow beyond ASCII, >>> using byte arrays is okay, but in my experience, if you want to be >>> future-proof, you should interpret byte-arrays *as soon as possible*. >>> >>> Then you have an object with a controlled format and you can use that >>> throughout your program, without worrying about encodings. >> >> In the modern world there is one portable encoding used for exchanging data >> between systems: UTF-8. So in wide range of applications one can safely >> assume all textual (!) byte array data to be UTF-8 or ASCII, and it causes no >> confusion. YMMV though. >> >> Things change if you intermix textual and non-textual QByteArray's near in >> your >> code, in this case it's better to store text strings in objects of different >> class. >> >>> Keeping the >>> data raw will increase the probability that some module does something >>> wrong because it assumes a wrong encoding and breaks your results (i.e. >>> using bytewise comparison for string comparison, which works for ASCII, >>> but not for unicode - even if both have the same encoding, because there >>> are letters that have multiple different unicode codepoints). >>> >>> -- >>> >>> Viktor Engelmann >>> Software Engineer >>> >>> The Qt Company GmbH >>> Rudower Chaussee 13 >>> D-12489 Berlin >>> >>> viktor.engelm...@qt.io >>> +49 151 26784521 >>> >>> http://qt.io >>> Geschäftsführer: Mika Pälsi, Juha Varelius, Mika Harjuaho >>> Sitz der Gesellschaft: Berlin >>> Registergericht: Amtsgericht Charlottenburg, HRB 144331 B >>> >>> _______________________________________________ >>> Interest mailing list >>> Interest@qt-project.org >>> http://lists.qt-project.org/mailman/listinfo/interest >> >> -- >> Regards, >> Konstantin >> _______________________________________________ >> Interest mailing list >> Interest@qt-project.org >> http://lists.qt-project.org/mailman/listinfo/interest -- Regards, Konstantin _______________________________________________ Interest mailing list Interest@qt-project.org http://lists.qt-project.org/mailman/listinfo/interest