On 2020-05-13 09:04, Lars Knoll wrote:
On 12 May 2020, at 18:59, Marc Mutz <marc.m...@kdab.com> wrote:
[...]
Most other classes:
* Only take and return QString

This is wrong. By taking a QString in the API of non-string-related APIs, you expose QString as an implementation detail of the class. Think QRegion, which, before I added begin()/end(), had a SSO-like internal container (1 QRect or QVector<QRect>) which forced most users to code around the owning-container API like this:

I don’t think you can compare those cases. QString is the container
our users use in 99% of the cases to hold a string. And this won’t
change (and I don’t think we should advocate any changes here).

So then, an API taking and returning a QString is the most logical,
easiest and most convenient to use.

There are two independent issues here: taking and returning.

For taking I don't see how "taking QString is most logical, easiest, most convenient" follows. It's simply not true that 99% of the cases to hold a string use QString. In the common case of applications which are not localized, all the nice QString APIs are fed with const char* literals, causing QString creation and destruction in user code. Even if you change all these to const char16_t* and a QString that doesn't allocate in this case, you still have all these complex (in the QTypeInfo sense) QString objects created and destroyed at the call site. This bloats user code. If all those shiny QString APIs would take QStringView instead, the construction and destruction of which is trivial (in the C++ sense), the compiler can remove all those calls and the QString construction (if any) happens centrally inside the library function (O(1) instead of O(N), N = #callers).

In the not so hypothetical case that Qt is used to visualize results of some business calculations, chances are that thrid-party libraries will use std::string or std::u16string, and not QString, requiring the use of QString::fromStdString() to pass these to a QString API. Had the API taken QStringView, no extra code would have been necessary.

So, I have shown that taking QString is neither the most convenient, nor the most easy, choice, as it requires users with other string types as data sources to jump through hoops to pass their data. Only if you employ a very narrow focus where both efficiency and the existence of 3rd-party code are all ignored, can you still maintain that a function taking an owning container is more convenient and easier than one taking a view. The only logic in such an API that _I_ can find, however, is "MUST ... NOT ... BREAK ... COW logic", which has been proven as flawed over twenty years ago:

http://www.gotw.ca/gotw/043.htm
http://www.gotw.ca/gotw/044.htm
http://www.gotw.ca/gotw/045.htm

(and CoW isn't even correctly implemented in Qt, ever since unsharable data was removed).

In case the internal storage _is_ QString, then providing a QString&& overload to avoid the copy is a good idea, if you're willing to impart the details of the implementation that way.

Second, returning.

You talk about QString getting an SSO buffer, maybe. Then returning a QString will become even more expensive. It already got more expensive, since instead of sizeof(void*) it's now two or three (didn't check) words, but adding SSO will not make it better. And if we don't get SSO, classes can decide to store u16string instead, which _has_ SSO already. So, efficiency wins here, too.

  if (r.rectCount() == 1) {
     use(r.boundingRect());
  } else {
     const auto rects = r.rects();
     for (const QRect &rect : rects)
        use(rect);
  }

If rects() had been a view (today, we'd use gsl/std::span<const QRect>), all users could just do

I don’t get that argument. The region is a list of rectangles today,
so you could simply add a rects() method that returns them and the
code below would work.

I think you should take a look at QRegionPrivate::begin() (in Qt 6 or Qt 5). And in Qt 5's version of QRegion::rects().

  for (const QRect &rect : r.rects())
     use(rect);

This is objectively a better API, for two reasons: a) the user doesn't need to care about some weird idiosyncrasies of the class to avoid performance penalties and b) the class author is now free to extend the SSO buffer from one to, say, four, without changing the API, not even those affected by Hyrum.

It _seems_ your solution is to fold views into owning containers, and while that may seem to work, it's dangerous:

Assume QRegion::rects() returned a QVector-acting-as-view. Then this would silently fail:

   QRegion region();
   for (const QRect &rect : region().rects())
       use(rect);

because, clearly, QVector is an owning container, so we don't care that the QRegion went out of scope. Whereas with a view, it will be immediately obvious (to a tool like Clazy, at least) that this can't work.

The above example is rather weirdly constructed.

But anyhow, those data lifetime issues when using views as return
values extensively are a lot less obvious to spot when a human reads
the code. APIs should be safe to use wherever possible, so that our
users don’t have to worry about those things. This will lead to fewer
bugs in the resulting code and faster development times. Trading that
for some saved CPU cycles is in almost all cases (and yes there are
exceptions in low level classes) a very bad deal.

You didn't get my point: If I return a view, it's clear what's going on (to user and tools) and that the data will only be valid until a non-const member function on the source object is called. So far so simple. This is what we have with QString::data() and a ton of other APIs, and it's easy to understand.

Now change that to an owning container. Say QString, for the sake of argument. Now, users (and tools) expect that QString to own the data, but that's far from guaranteed. We had the problem with QStringLiteral in plugins, where the data just went away on plugin unloading. Solution? Don't actually unload plugins on unloading. Wow! What we do to save QString-as-a-view! Tell me how that's convenient and easy API? Had all those functions not returned QStringLiterals through QString, but through QStringView, it would have been more suggestive to copy the data than it was with QString.

And this will just become more pronounced if every construction from a char16_t* will create a QString that doesn't actually own the data. By choosing QString (owning container or view) over QStringView (view only) in APIs, you (deliberately, if I may say so) blurred the line between owning container and view, and in doing so you kill the raison d'étre for returning owning containers: avoiding dangling references. This is neither convenient nor easy. Actually, you're forcing all users to take a deep copy, as per

   QString deepCopy(QStringView v) { return v.toString(); }

(and hope that suggestions for QStringView carrying the d-pointer in a vein attempt at "optimizing" QStringView::toString() will not be implemented).

So, I'd argue for the complete opposite here: we would increase encapsulation of our APIs if they stopped trafficking in owning container types. I call this the NOI pattern (Non-Owning Interface). By not having to serve QString everywhere, we'll be much more free to use alternative storage types in the implementation (e.g. QVarLengthArray<char16_t>, or - the horror! - std::pmr::u16string). Handling-wise, QStringView makes all these choices equal, so the implementation of a class can use whatever is objectively optimal instead of being bound to QString.

You can just as well argue the other way round. Returning a view
requires the class to store the data in continuous memory. It can’t
create it on the fly if asked.

That's not true. You can always do the construction lazily and then return a view. Thread-safety is an issue, yes, but it's not terribly difficult to fix (and not fixed in many other classes, either, QFileInfo comes to mind, so it can't be that important). Compared to the 'usual' implementation in these cases, repeated calls to the function won't have to re-calculate the result anew each time (see https://codereview.qt-project.org/c/qt/qtbase/+/299986 for a recent example).

AsidE: If you think that CoW is still a thing today: no. SSO is a thing these days, and it seems that QString will not have it in Qt 6, either. NOI favours SSO, QString-everywhere cements the naïve CoW world of the 1990s for yet another decade.

Let’s see if we can get SSO working for QString and QBA in time. It
should not be very difficult to implement with the new structure we’re
having.

Even if you enable it for QString and QBA, you can't implement it for QVector without breaking tons of code that relies on iterator stability (which is why std::vector can't do it, either).

You might call CoW naive, but I do believe that the fact that Qt does
use containers that own their data in most of our APIs is what makes
Qt significantly simpler and safer to use than many other APIs.

Agreed. But you're in the process of blurring the line between owning containers and views. This already started in Qt 5 with QStringLiteral and even earlier with QString::fromRawData(). At least the former was statically-allocated and presented a problem in only a a very limited circumstances and the latter is explicit. But you're making QString(const char16_t*) the equivalent of QString::fromRawData() now and so the cases where QStrings are actually views that doesn't own the data is going to explode.

   QLineEdit *le = ...;
   {
       std::u16string u16s = businessResult();
le->setText(u16s.data()); // = setText(QString(const char16_t*)) = setText(QString::fromRawData())
   } // BOOM

No such API exists in std, so a std::vector or a std::string _always_ own the data. For everything else, there's string_view and span, which _never_ own.

Using views in our API would make it in many cases harder to track
lifetime, esp. if they are combined with the use of auto. Yes, tools
like clazy can help, but I’d rather have inherent safety than rely on
additional tools.

I said in a review the other days is that APIs are easy to use _not_ when the number of classes is minimized, but when the responsibilities of any given class are minimal and to-the-point. For a given domain area, that means more small classes are better than fewer large ones.

You don't want to port APIs to QStringView, because it's a ton of work, but you want to the benefits, so you're folding QStringView into QString and make the use of QString that much harder (btw: the same happened to QList/QVector).

That's legitimate, esp. as a step in the direction of fully view-enabling the library.

It's _not_ legitimate to claim that this makes the API "easier to use, more logical and convenient". It doesn't. It's also _not_ legitimate to use this as an argument against more QStringView-overloads around the library.

To spell it out: Just like QList-is-QVector, what you're doing to QString is a (hopefully stop-gap) measure to avoid rewriting all the APIs and classes to take and accept views at the expense of making QString even use harder to reason about than it already has been in Qt 5.

Learn from QRegion!

I have spoken on many conferences (at least QtWS, Meeting C++, emBO++) on this, if anyone wants to learn more.

Not everybody agrees with your opinions, and we need to remember that
most of our users are not necessarily people knowing the C++ standard
inside out. And they *shouldn’t* have to be.

I fail to see what this has to do with "knowing the C++ standard inside out". Maybe you can enlighten me? AFAICT, all I'm doing is applying sound engineering principles, incl., but not limited to, "make an API easy to use and hard to misuse" and "minimal, _efficient_ basis of operations" to Qt.

Thanks,
Marc
_______________________________________________
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development

Reply via email to