Hi Josh,

> My primary question is: is there a way to avoid doing a copy for the utf8
> data (the char *buffer in operator>> and the QByteArray utf8 in
> operator<<)?

QString stores the data as UTF-16, so you won't be able to avoid a copy
if you want to convert the data to/from UTF-8.

I didn't have a look at the implementation, but one thing to note is that
IIUC defining your own serialization operators for QString would be
an ODR violation. So, maybe you'd need to wrap QString into a custom
struct first.

Best regards,
Ivan

________________________________________
From: Interest <interest-boun...@qt-project.org> on behalf of Josh 
<jnf...@grauman.com>
Sent: Tuesday, April 15, 2025 7:39 AM
To: interest@qt-project.org
Subject: [Interest] Smaller QString Serialization

Hello all,

It looks like the standard QString serialization writes a 32-bit size
(uses 0xFFFFFFFF for Null String, and uses 0xFFFFFFFE if 32-bit isn't big
enough and then writes a 64-bit size), followed by 16 bits for each
character.

To save space, I want to use a scheme that writes an 8-bit size, and
reserves the last four values (255, 254, 254, 252) for a null QString, or
another 16-bit, 32-bit, or 64-bit size depending on how large the QString
is, followed by utf8 data.

This will make a null QString 1 byte instead of 4, a single Latin
character like 'a' 2 bytes instead of 6, and most long Latin1 strings
around half the size...

I wrote a quick implementation, and it seems to work well.

My primary question is: is there a way to avoid doing a copy for the utf8
data (the char *buffer in operator>> and the QByteArray utf8 in
operator<<)?

Any other obvious ways to speed it up, or other suggestions?

Josh

QDataStream &operator<<(QDataStream &out, const QString &str)
{
   if(str.isNull())
     out << (quint8)255; //null marker
   else
   {
     QByteArray utf8=str.toUtf8();
     qsizetype size=utf8.size();
     if(size<252)
     {
       out << (quint8)size;
       out.writeRawData(utf8.data(), size);
     }
     else if(size<65536)
     {
       out << (quint8)254 << (quint16)size;
       out.writeRawData(utf8.data(), size);
     }
     else if(size<4294967296)
     {
       out << (quint8)253 << (quint32)size;
       out.writeRawData(utf8.data(), size);
     }
     else
     {
       out << (quint8)252 << (qsizetype)size;
       out.writeRawData(utf8.data(), size);
     }
   }
   return(out);
}

QDataStream &operator>>(QDataStream &in, QString &str)
{
   quint8 firstSize;
   in >> firstSize;
   if(firstSize==255) //null marker
     str = QString();
   else if(firstSize<252)
   {
     char* buffer = new char[firstSize];
     in.readRawData(buffer, firstSize);
     str = QString::fromUtf8(buffer, firstSize);
     delete[] buffer;
   }
   else if(firstSize==254)
   {
     quint16 secondSize;
     in >> secondSize;
     char* buffer = new char[secondSize];
     in.readRawData(buffer, secondSize);
     str = QString::fromUtf8(buffer, secondSize);
     delete[] buffer;
   }
   else if(firstSize==253)
   {
     quint32 secondSize;
     in >> secondSize;
     char* buffer = new char[secondSize];
     in.readRawData(buffer, secondSize);
     str = QString::fromUtf8(buffer, secondSize);
     delete[] buffer;
   }
   else if(firstSize==252)
   {
     qsizetype secondSize;
     in >> secondSize;
     char* buffer = new char[secondSize];
     in.readRawData(buffer, secondSize);
     str = QString::fromUtf8(buffer, secondSize);
     delete[] buffer;
   }
   return(in);
}
_______________________________________________
Interest mailing list
Interest@qt-project.org
https://lists.qt-project.org/listinfo/interest
_______________________________________________
Interest mailing list
Interest@qt-project.org
https://lists.qt-project.org/listinfo/interest

Reply via email to