Hi Josh, > My primary question is: is there a way to avoid doing a copy for the utf8 > data (the char *buffer in operator>> and the QByteArray utf8 in > operator<<)?
QString stores the data as UTF-16, so you won't be able to avoid a copy if you want to convert the data to/from UTF-8. I didn't have a look at the implementation, but one thing to note is that IIUC defining your own serialization operators for QString would be an ODR violation. So, maybe you'd need to wrap QString into a custom struct first. Best regards, Ivan ________________________________________ From: Interest <interest-boun...@qt-project.org> on behalf of Josh <jnf...@grauman.com> Sent: Tuesday, April 15, 2025 7:39 AM To: interest@qt-project.org Subject: [Interest] Smaller QString Serialization Hello all, It looks like the standard QString serialization writes a 32-bit size (uses 0xFFFFFFFF for Null String, and uses 0xFFFFFFFE if 32-bit isn't big enough and then writes a 64-bit size), followed by 16 bits for each character. To save space, I want to use a scheme that writes an 8-bit size, and reserves the last four values (255, 254, 254, 252) for a null QString, or another 16-bit, 32-bit, or 64-bit size depending on how large the QString is, followed by utf8 data. This will make a null QString 1 byte instead of 4, a single Latin character like 'a' 2 bytes instead of 6, and most long Latin1 strings around half the size... I wrote a quick implementation, and it seems to work well. My primary question is: is there a way to avoid doing a copy for the utf8 data (the char *buffer in operator>> and the QByteArray utf8 in operator<<)? Any other obvious ways to speed it up, or other suggestions? Josh QDataStream &operator<<(QDataStream &out, const QString &str) { if(str.isNull()) out << (quint8)255; //null marker else { QByteArray utf8=str.toUtf8(); qsizetype size=utf8.size(); if(size<252) { out << (quint8)size; out.writeRawData(utf8.data(), size); } else if(size<65536) { out << (quint8)254 << (quint16)size; out.writeRawData(utf8.data(), size); } else if(size<4294967296) { out << (quint8)253 << (quint32)size; out.writeRawData(utf8.data(), size); } else { out << (quint8)252 << (qsizetype)size; out.writeRawData(utf8.data(), size); } } return(out); } QDataStream &operator>>(QDataStream &in, QString &str) { quint8 firstSize; in >> firstSize; if(firstSize==255) //null marker str = QString(); else if(firstSize<252) { char* buffer = new char[firstSize]; in.readRawData(buffer, firstSize); str = QString::fromUtf8(buffer, firstSize); delete[] buffer; } else if(firstSize==254) { quint16 secondSize; in >> secondSize; char* buffer = new char[secondSize]; in.readRawData(buffer, secondSize); str = QString::fromUtf8(buffer, secondSize); delete[] buffer; } else if(firstSize==253) { quint32 secondSize; in >> secondSize; char* buffer = new char[secondSize]; in.readRawData(buffer, secondSize); str = QString::fromUtf8(buffer, secondSize); delete[] buffer; } else if(firstSize==252) { qsizetype secondSize; in >> secondSize; char* buffer = new char[secondSize]; in.readRawData(buffer, secondSize); str = QString::fromUtf8(buffer, secondSize); delete[] buffer; } return(in); } _______________________________________________ Interest mailing list Interest@qt-project.org https://lists.qt-project.org/listinfo/interest _______________________________________________ Interest mailing list Interest@qt-project.org https://lists.qt-project.org/listinfo/interest