https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87077
Bug ID: 87077 Summary: missed optimization for horizontal add for x86 SSE Product: gcc Version: 7.0.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: trashyankes at wp dot pl Target Milestone: --- During some experiments with toy programs I find out that GCC do not do any horizontal adding for xmm registers. Some benchmark code: http://quick-bench.com/HhZPnOtb9SYYK8z4IMKb_XAWYCI If I'm not mistaken both function do same work and one hand written is faster. And IIRC `_mm_hadd_ps` is consider a slow way to do this but is still faster than standard function. Is my finding correct or I simply miss some important details why GCC do not do this?