https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #26 from Dmitriy Ovdienko <dmitriy.ovdienko at gmail dot com> --- Created attachment 49201 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49201&action=edit Modified solution (thread per iteration) Attached is a code similar to what Rust sample is doing (parallel iterations-loop rather than depth-loop). What I'd like to improve is to reuse allocated memory rather than allocate every iteration. According to requirements to the task, I cannot implement my own memory arena. So I have to find way how to use STL to achieve same effect.