On Wednesday 14 October 2015 12:16:47 Marc Mutz wrote: > > >But as a condition to be even considered, it needs to be only for the > > >methods > > >that do not hold a copy of the string. That is, methods that immediately > > >consume the string and no longer need to reference its contents. > > Thiago, I think it would help the discussion if you quickly summarised your > planned changes to QString in Qt 6. > > AFAIK, the size and offset will move into the object, so I expected that > Q6String would subsume QStringRef, because each QString could provide a > separate view on the shared underlying data. I also was led to believe that > Q6String would use SSO, which, given its inceased sizeof(), would make a > lot of sense, imo.
Indeed, that's the biggest gain. QString will contain a QStringPrivate, which is struct QStringPrivate { QArrayData *d; ushort *b; qsize size; // let's bikeshed what qsize is later }; My current code initialises a QStringLiteral like so: # define QStringLiteral(str) \ ([]() -> QString { \ QStringPrivate holder = { \ QArrayData::sharedStatic(), \ reinterpret_cast<ushort *>(const_cast<qunicodechar *>(QT_UNICODE_LITERAL(str))), \ sizeof(QT_UNICODE_LITERAL(str))/2 - 1 }; \ return QString(holder); \ }()) \ The separation of the string itself from the size and the d pointer allows the compiler, if it wants to, to share strings. In fact, disassembly of f(QStringLiteral("foo"), QStringLiteral("foo")) produces one copy of u"foo" only. Like you said, QString can become its own QStringView/QStringRef/QSubString. QString::left/mid/right can simply copy the d pointer, increment the refcount, then adjust b and size. This solves the issue I had with your proposal: passing a QStringView to a method that decides to copy it, so it wouldn't participate in reference counting. The drawback with this is the pathological case where a short substring is holding a large data block hostage. My next objective, not yet achieved due to lack of time, is to make that QArrayData::sharedStatic() actually be a null pointer. That is, for anything that we didn't allocate memory for, the d pointer should be null. That implies a much faster loading of constant QStringLiterals and much faster handling of the decrement case. The biggest pain point in the code above in my current version is what happens after the call to f(): the compiler generates 2x bit testing of d->flags and calls to QArrayData::deallocate(), which are dead code and will never be run. After that, implement SSO, which should hold 11 UTF-16 characters, including the null terminator. If we benchmark and find that we could use more, we can simply artificially increase sizeof(QString) to 32, which may have some extra benefits of its own, including the fact that the 24-byte short QString will be at odds with the null d pointer -- the if (d) check instead becomes if (quintptr(d) & ~quintptr(1)) [also note how the order of the members in QStringPrivate needs to change for big-endian architectures] [and note everything I say about QString also applies to QByteArray and QVector] > And then I thought, QString would be converted to hold UTF-8. I saw > wip/qstring-utf8 fly by on gerrit, but ok, that hasn't received any updates > since 2012. That was when we converted the QString methods taking const char* from Latin1 to UTF-8. The backing store has never changed. My version of QString stores an extra flag that indicates whether the string is US-ASCII, in which case we can run the unchecked to-Latin1 algorithm in both toLatin1 and toUtf8. Another idea I had but haven't investigated is to cache that result, which requires the returned QByteArray to share the d pointer with the QString. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center _______________________________________________ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development