https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110319
Bug ID: 110319 Summary: Performance slowdown using a pointer to perform a reduction vs. using a normal variable Product: gcc Version: 11.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: lorien.lopez at unizar dot es CC: jakub at gcc dot gnu.org Target Milestone: --- When performing an OpenMP sum reduction into a regular variable, GCC uses a "lock cmpxchg" instruction. In contrast, when the reduction is performed into a pointer, it uses an OpenMP atomic region. The second version is several times slower in an Intel Skylake CPU. The original report can be found in Stack Overflow: https://stackoverflow.com/questions/76480632/performance-slowdown-using-a-pointer-to-perform-a-reduction-vs-using-a-normal-v