On Mon, Oct 13, 2025 at 12:22 PM Jonathan Wakely <[email protected]> wrote:
> > > On Mon, 13 Oct 2025 at 10:08, Tomasz Kaminski <[email protected]> wrote: > >> >> >> On Sat, Oct 11, 2025 at 3:33 PM Jonathan Wakely <[email protected]> >> wrote: >> >>> We need to ensure that the memory allocated for a path::_List::_Impl is >>> at least 4-byte aligned, so that we can use the two least significant >>> bits to store a _Type value. Use __STDCPP_DEFAULT_NEW_ALIGNMENT__ to >>> detect whether we need to use aligned new/delete to get memory with >>> unused low bits. >>> >>> Allocation and deallocation of path::_List::_Impl objects is refactored >>> into a new _Impl::create function so that the memory allocation is done >>> in one place and can ensure the required alignment. >>> >> If I understand correctly we do not have a target >> where __STDCPP_DEFAULT_NEW_ALIGNMENT__ is >> smaller than 4. If so, wouldn't static_assert be better instead of >> effectively dead code? >> > > > I think it's dead code in all targets supported by upstream GCC. It's > possible there are out-of-tree ports for other microprocessors > where __STDCPP_DEFAULT_NEW_ALIGNMENT__ < 4, and if we use a static_assert > they would need to do additional work in their port. But we can add a > comment to the static_assert, and maybe deal with that problem later if it > ever happens (YAGNI). > As an alternative, I would be fine if we only had one code path, and always used an aligned version of the new. And we just add if (align <= __STDCPP_DEFAULT_NEW_ALIGNMENT__) return ::operator new(sz) in the align_val_t operator new implementation. > > > >> >>> libstdc++-v3/ChangeLog: >>> >>> PR libstdc++/122255 >>> * src/c++17/fs_path.cc (path::_List::_Impl::copy): Use create >>> function. >>> (path::_List::_Impl::required_alignment): New static data >>> member. >>> (path::_List::_Impl::use_aligned_new): New static data member. >>> (path::_List::_Impl::create): New static member function. >>> (path::_List::_Impl_deleter::operator()): Use aligned delete if >>> needed. >>> --- >>> >>> Tested x86_64-linux. >>> >> >> >>> >>> libstdc++-v3/src/c++17/fs_path.cc | 80 ++++++++++++++++++++++--------- >>> 1 file changed, 58 insertions(+), 22 deletions(-) >>> >>> diff --git a/libstdc++-v3/src/c++17/fs_path.cc >>> b/libstdc++-v3/src/c++17/fs_path.cc >>> index 215afa08ad25..aa47cfa2a655 100644 >>> --- a/libstdc++-v3/src/c++17/fs_path.cc >>> +++ b/libstdc++-v3/src/c++17/fs_path.cc >>> @@ -34,6 +34,7 @@ >>> #include <filesystem> >>> #include <algorithm> >>> #include <array> >>> +#include <new> >>> #include <bits/stl_uninitialized.h> >>> #include <ext/numeric_traits.h> // __gnu_cxx::__int_traits >>> >>> @@ -207,6 +208,8 @@ struct path::_List::_Impl >>> >>> _Impl(int cap) : _M_size(0), _M_capacity(cap) { } >>> >>> + // Align the first member like the value_type so that we can store >>> one or >>> + // more objects of that type immediately after the memory occupied by >>> *this. >>> alignas(value_type) int _M_size; >>> int _M_capacity; >>> >>> @@ -246,8 +249,7 @@ struct path::_List::_Impl >>> unique_ptr<_Impl, _Impl_deleter> copy() const >>> { >>> const auto n = size(); >>> - void* p = ::operator new(sizeof(_Impl) + n * sizeof(value_type)); >>> - unique_ptr<_Impl, _Impl_deleter> newptr(::new (p) _Impl{n}); >>> + auto newptr = create(n, false); // Skip overflow checks, n is a >>> good size. >>> std::uninitialized_copy_n(begin(), n, newptr->begin()); >>> newptr->_M_size = n; >>> return newptr; >>> @@ -259,16 +261,67 @@ struct path::_List::_Impl >>> constexpr uintptr_t mask = ~(uintptr_t)0x3; >>> return reinterpret_cast<_Impl*>(reinterpret_cast<uintptr_t>(p) & >>> mask); >>> } >>> + >>> + // Require memory aligned to 4 bytes so we can store _Type in the low >>> bits. >>> + static constexpr size_t required_alignment >>> + = std::max(size_t(4), alignof(value_type)); >>> >> This max seem superfluous here, we use this variable only if default new >> alignment >> is lower than 4, >> > > It occurred to me that it's theoretically possible that > alignof(_Impl::value_type) might also be higher than guaranteed by operator > new. We need the memory to be suitable for storing _Impl *and* to have two > free bits for tagging. > > If we just did `new _Impl` then the compiler would do the same > alignof(_Impl) check and use aligned-new if needed, so this is replicating > that (we can't just do `new _Impl` because we need the additional memory > following it for the components). > > Again, it's probably redundant for all targets supported in-tree by GCC > ... but then that means PR122255 is not a bug at all and this entire change > is unnecessary. The refactoring to add _Impl::create still seems useful > though, so I can adjust the patch to do that refactoring, and just add a > static assert to ensure we don't silently under-align anything. > > > >> + >>> + // Only use aligned new if needed, because it might be less efficient. >>> + static constexpr bool use_aligned_new >>> + = __STDCPP_DEFAULT_NEW_ALIGNMENT__ < required_alignment; >>> + >>> + // Create a new _Impl with capacity for n components. >>> + static unique_ptr<_Impl, _Impl_deleter> >>> + create(int n, bool check_overflow = true) >>> + { >>> + if (check_overflow) >>> + { >>> + using __gnu_cxx::__int_traits; >>> + // Nobody should need paths with this many components. >>> + if (n >= __int_traits<int>::__max / 4) >>> + std::__throw_bad_alloc(); >>> + >>> + if constexpr (__int_traits<int>::__max >= >>> __int_traits<size_t>::__max) >>> + { >>> + // Check that the calculation below won't overflow. >>> + size_t bytes; >>> + if (__builtin_mul_overflow(n, sizeof(value_type), &bytes) >>> + || __builtin_add_overflow(sizeof(_Impl), bytes, >>> &bytes)) >>> + std::__throw_bad_alloc(); >>> + } >>> + // Otherwise, it can't overflow, even for 20-bit size_t on >>> msp430. >>> + } >>> + >>> + void* p; >>> + if constexpr (use_aligned_new) >>> + p = ::operator new(sizeof(_Impl) + n * sizeof(value_type), >>> + align_val_t{required_alignment}); >>> + else >>> + p = ::operator new(sizeof(_Impl) + n * sizeof(value_type)); >>> + >>> + return std::unique_ptr<_Impl, _Impl_deleter>(::new(p) _Impl{n}); >>> + } >>> }; >>> >>> -void path::_List::_Impl_deleter::operator()(_Impl* p) const noexcept >>> +// Destroy and deallocate an _Impl. >>> +void >>> +path::_List::_Impl_deleter::operator()(_Impl* p) const noexcept >>> { >>> p = _Impl::notype(p); >>> if (p) >>> { >>> __glibcxx_assert(p->_M_size <= p->_M_capacity); >>> p->clear(); >>> - ::operator delete(p, sizeof(*p) + p->_M_capacity * >>> sizeof(value_type)); >>> + >>> + const auto n = p->_M_capacity; >>> + // Destructor is trivial, so this isn't really necessary: >>> + p->~_Impl(); >>> + >>> + if constexpr (_Impl::use_aligned_new) >>> + ::operator delete(p, sizeof(_Impl) + n * >>> sizeof(_Impl::value_type), >>> + align_val_t{_Impl::required_alignment}); >>> + else >>> + ::operator delete(p, sizeof(_Impl) + n * >>> sizeof(_Impl::value_type)); >>> } >>> } >>> >>> @@ -455,24 +508,7 @@ path::_List::reserve(int newcap, bool exact = false) >>> newcap = nextcap; >>> } >>> >>> - using __gnu_cxx::__int_traits; >>> - // Nobody should need paths with this many components. >>> - if (newcap >= __int_traits<int>::__max / 4) >>> - std::__throw_bad_alloc(); >>> - >>> - size_t bytes; >>> - if constexpr (__int_traits<int>::__max >= >>> __int_traits<size_t>::__max) >>> - { >>> - size_t components; >>> - if (__builtin_mul_overflow(newcap, sizeof(value_type), >>> &components) >>> - || __builtin_add_overflow(sizeof(_Impl), components, >>> &bytes)) >>> - std::__throw_bad_alloc(); >>> - } >>> - else // This won't overflow, even for 20-bit size_t on msp430. >>> - bytes = sizeof(_Impl) + newcap * sizeof(value_type); >>> - >>> - void* p = ::operator new(bytes); >>> - std::unique_ptr<_Impl, _Impl_deleter> newptr(::new(p) >>> _Impl{newcap}); >>> + auto newptr = _Impl::create(newcap); >>> const int cursize = curptr ? curptr->size() : 0; >>> if (cursize) >>> { >>> -- >>> 2.51.0 >>> >>>
