We have the following that a pattern in our code base: 1) SetCapacity(newCapacity) is called on an XPCOM string. 2) A pointer obtained from BeginWriting() is used for writing more than Length() but no more than newCapacity code units to the XPCOM string. 3) SetLength(actuallyWritten) is called on the XPCOM string such that actuallyWritten > Length() but actuallyWritten <= newCapacity.
This is an encapsulation violation, because the string implementation doesn't know that the content past Length() is there. The caller assumes that step #3 will not reallocate and will only write a zero-terminator at actuallyWritten and set mLength to actuallyWritten. (The pattern is common enough that clearly people have believed it to be a valid pattern. However, I haven't seen any in-tree or on-wiki string documentation endorsing the pattern.) It should be non-controversial that this is an encapsulation violation, but does the encapsulation violation matter? It matters if we want SetLength() to be able to conserve memory by allocating a smaller buffer when actuallyWritten code units would fit in a smaller mozjemalloc bucket. In order for the above pattern to work if SetLength() can reallocate in such case, SetLength() has to memcpy the whole buffer in case someone has written content that the string implementation is unaware of instead of just memcpying the part of the buffer that the string implementation knows to be in use. Pessimizing the number of code units to memcpy is bad for performance. It's unclear if trying to use a smaller mozjemalloc bucket it is a worthwhile thing. It obviously is for large long-lived strings and it obviously isn't for short-lived strings even if they are large. SetLength() doesn't know what the future holds for the string. :-( But in any case, it's bad if we can't make changes that are sound from the perspective of the string encapsulation, because we have other code violating the encapsulation. After the soft freeze, I'd like to change things so that we memcpy only the part of the buffer that the string implementation knows is in use. To that end, we should stop using the above pattern that is an encapsulation violation. For m-c, I've filed https://bugzilla.mozilla.org/show_bug.cgi?id=1472113 and worked on fixing the bugs that block it. For c-c, I've filed https://bugzilla.mozilla.org/show_bug.cgi?id=1486706 but don't intend to do the work of investigating or fixing the string usage in c-c. As for fixing the above pattern, there are two alternatives. The first one is: 1) SetLength(newCapacity) 2) BeginWriting() 3) Truncate(actuallyWritten) (or SetLength(actuallyWritten), Truncate simply tells to the reader that the string isn't being made longer) With this pattern, writing happens to the part of the buffer that the string implementation believes to be in use. This has the downside that the first SetLength() call (like, counter-intuitively, SetCapacity() currently!) writes the zero terminator, which from the point of view of CPU caches is an out-of-the-blue write that's not part of a well-behaved forward-only linear write pattern and not necessarily near recently-accessed locations. The second alternative is BulkWrite() in C++ and bulk_write() in Rust. This is new API that is well-behaved in terms of the cache access pattern and is also more versatile in the sense that it lets the caller know how newCapacity was rounded up, which is relevant to callers that ask for best-case capacity and then ask more capacity if there turns out to be more to write. When the caller is made aware of the rounding, a second request for added capacity can be avoided if the amount that actually needs to be written exceeds the best case estimate but fits within the rounded-up capacity. In Rust, bulk_write() is rather nicely misuse-resistant. However, on the C++ side the lack of a borrow checker as well as mozilla::Result not working with move-only types (https://bugzilla.mozilla.org/show_bug.cgi?id=1418624) pushes more things to documentation. The documentation can be found at https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Guide/Internal_strings#Bulk_Write https://searchfox.org/mozilla-central/rev/2fe43133dbc774dda668d4597174d73e3969181a/xpcom/string/nsTSubstring.h#1190 https://searchfox.org/mozilla-central/rev/2fe43133dbc774dda668d4597174d73e3969181a/xpcom/string/nsTSubstring.h#32 P.S. GetMutableData() is redundant with BeginWriting() and SetLength(). It's used very rarely and I'd like to remove it as redundant, so please don't use GetMutableData(). -- Henri Sivonen hsivo...@mozilla.com _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform