On Tuesday, 15 November 2022 00:52:24 PST Marc Mutz via Development wrote: > That remains to be proven. A rule of thumb for atomics is that they're > two orders of magnitude slower than a normal int. They also still act as > optimizer firewalls. With that rule of thumb, copying 50 char16_t's is > faster than one ref-count update. What really is the deciding point is > whether or not there's a memory allocation involved. I mentioned that > for many use-cases, therefore, a non-CoW SBO container is preferable over a > CoW non-SBO one.
That's irrelevant so long as we don't have SBO containers. So what we need to really compare are memory allocations versus the atomics. A locked operation on a cacheline on x86 will take in the order of 20 cycles of latency on top of any memory delays[1], but do note the CPU keeps running meanwhile (read: an atomic inc has a much smaller impact than an atomic dec that uses the result). A memory allocation for a single byte will have an impact bigger than this, hundreds of cycles. Therefore, in the case of CoW versus deep copy, CoW always wins. [1] https://uops.info/html-instr/INC_LOCK_M32.html says 23 cycles on an 11- year-old Sandy Bridge, 19 on Haswell, 18 on everything since Skylake. -- Thiago Macieira - thiago.macieira (AT) intel.com Cloud Software Architect - Intel DCAI Cloud Engineering _______________________________________________ Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development