https://gcc.gnu.org/bugzilla/show_bug.cgi?id=21334

--- Comment #53 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to tlknv from comment #44)
> Thanks Marc.
> I have posted my patch at
> http://gcc.gnu.org/ml/gcc-patches/2012-05/msg02086.html

This is just a bandaid and not sufficient to avoid the problem. If instead of
one "reader" and one "writer" you have multiple readers making copies of the
string and one writer leaking it, then your new !_M_is_shared() check in
_M_refcopy() will not detect the problem, and so the same bug is still present.

A more correct approach would be for _M_leak_hard() to use compare-and-swap to
set _M_refcount to -1, looping and retrying if the string became shared (or
leaked!) by another thread.

A CAS loop would also ensure that the store to _M_refcount is atomic, which is
necessary if you want other threads to be able to read it concurrently. But it
would make _M_leak_hard() larger and slower for all multithreaded programs,
even ones that are already correctly synchronized to avoid copying a string
while it's being leaked.

But setting _M_refcount to -1 atomically wouldn't do anything to address the
problem of changes to _M_dataplus._M_p being non-atomic. The string that calls
_M_leak_hard() might need to make a new clone (via _M_mutate(0,0,0)) and then
set that to leaked, but if other threads are concurrently accessing the object
then they will make non-atomic loads of the _M_dataplus._M_p pointer while
_M_mutate is doing a non-atomic store to it. That's also a data race, i.e.
undefined behaviour. To mitigate that, all loads and stores of _M_dataplus._M_p
would need to be atomic, which would pessimize correct programs even more. (And
this ignores the fact that not all our supported targets even have native
atomic instructions for pointer-sized variables.)

And if your code is concurrently accessing strings while another thread does
the copy-on-write step, none of the changes above would help unless you
recompile *all* code using std::string. Otherwise some old objects can still
contain inlined versions of the member functions using non-atomic accesses. So
if you have to recompile the entire world to mitigate your unsynchronized uses
of COW strings, you might as well just stop using it and recompile everything
to use the new SSO string instead.

tl;dr making this work would be very invasive and is not going to happen for
the legacy COW strings.

Reply via email to