https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87077

            Bug ID: 87077
           Summary: missed optimization for horizontal add for x86 SSE
           Product: gcc
           Version: 7.0.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: trashyankes at wp dot pl
  Target Milestone: ---

During some experiments with toy programs I find out that GCC do not do any
horizontal adding for xmm registers.

Some benchmark code:
http://quick-bench.com/HhZPnOtb9SYYK8z4IMKb_XAWYCI

If I'm not mistaken both function do same work and one hand written is faster.
And IIRC `_mm_hadd_ps` is consider a slow way to do this but is still faster
than standard function.

Is my finding correct or I simply miss some important details why GCC do not do
this?

Reply via email to