https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120806

--- Comment #1 from Jonathan Wakely <redi at gcc dot gnu.org> ---
This deviation from the standard is intentional, and was part of the original
design, and was present in the reference implementation:
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p1132r8.html#perf

However the wording in the standard does not permit this optimization. I sent
the following email to the C++ committee's library working group in November
2023 but nobody responded, so I should open an LWG issue:

<quote>
tl;dr I don't think the current spec for std::out_ptr and std::inout_ptr allows
any optimizations for std::unique_ptr and std::shared_ptr, even though that was
intended by the design.

The https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p1132r8.html paper
seems pretty clear that implementations should try to optimize std::out_ptr for
at least std::unique_ptr. One strategy is for std::out_ptr_t::operator
Pointer*() to return the address of the unique_ptr's internal pointer, e.g.

operator Pointer*() const noexcept {
  return std::addressof(m_uniqptr.m_ptr);
}

The ~out_ptr destructor then doesn't need to use reset(p) to store the result
into the unique_ptr, because it's already there. However, this optimization
would be observable and so doesn't seem to be permitted. Consider:

#include <memory>
#include <cassert>

std::unique_ptr<int> global;

void c_api_func(int** p)
{
  // 1) std::out_ptr ensures this since LWG 3734
 assert(global == nullptr);
  // 2) and ensures this:
  assert(*p == nullptr);
  // 3) but nothing says I can't do this:
  global.reset(new int(1));
  // 4) is this still true?
  assert(*p == nullptr);
  *p = new int(2);
}

int main()
{
  c_api_func(std::out_ptr(global));
}

If the implementation gives direct access to the unique_ptr's member then 4)
will fail, because resetting the global unique_ptr obviously stores to that
member.

N.B. ztd::out_ptr fails the assertion at 1) under all conditions, presumably
LWG 3734 isn't supported.
ztd::out_ptr fails assertions 2) and 4) when using
-DZTD_OUT_PTR_USE_CLEVER_OUT_PTR=1 to optimize the unique_ptr case, and with
NDEBUG it results in a double-free.

My current implementation has a similar clever optimization, but using
friendship instead of UB aliasing. It also fails 4) and instead of a
double-free it leaks memory.

I can make 4) still hold by making out_ptr always have its own pointer member
and return that from operator Pointer*(), and then assigning that pointer to
the unique_ptr's member in the ~out_ptr_t destructor. But if doing
global.reset(...) is still allowed at 3) then the ~out_ptr_t dtor **must** use
s.reset(p) to update the smart pointer. It can't just do s.m_ptr = p; because
that would leak the previous value of s.m_ptr that was set at 3), as my impl
does.
But my understanding from P1132 is that avoiding s.reset(p) is the entire point
of the clever optimizations. So it seems no optimization is currently possible
at all.

Should the standard say users are not allowed to access the smart_ptr while
it's "owned" by out_ptr_t?
I think the implementation should be allowed to assume that Smart& is a unique
reference, similar to the blanket permission given for rvalue references in
[res.on.arguments].
That would disallow 3) and so 4) would still hold.


P1132 also hints at optimizing std::out_ptr_t<std::shared_ptr<T>, T*, D> by
preallocating a shared_ptr control block in the out_ptr_t constructor, so that
~out_ptr_t doesn't allocate when doing s.reset(p, args...). But that would be
observable. Consider std::out_ptr(shptr, my_deleter, my_alloc) which must use
my_alloc to allocate the control block, so it's observable whether that happens
during out_ptr_t construction, or during destruction. The standard is quite
explicit about the effects and when they happen ("Effects: Equivalent to..."
doesn't leave much left to the imagination).

An implementation could choose to only do the pre-allocation optimization when
no custom allocator is provided, but even operator new is observable. I think
out_ptr_t is missing some normative allowance for allocating memory (as hinted
at by the note about the ctor not being noexcept).

Apart from the question of when allocations are allowed to happen, the standard
also guarantees two copies of the deleter: one into tuple<Args...> during
construction and one into the control block when doing s.reset(p, args...)
during destruction. If we preallocate the control block it seems like it would
make sense to construct the deleter (and any allocator) into the control block
right away, so we don't need to store the tuple<Args...> at all. But that seems
to be disallowed by the precise number of initializations implied by the
effects.

I think we can maybe argue that making fewer copies is allowed, but I think we
need to say something about when allocations happen.
</quote>

Reply via email to