https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111905

            Bug ID: 111905
           Summary: -O3 vectorization terribly pessimizes the code for an
                    already unrolled loop
           Product: gcc
           Version: 13.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kamkaz at windowslive dot com
  Target Milestone: ---

https://godbolt.org/z/rK4nEWovc

With -O2, the code is the way you'd expect it to be - with performance benefit
for the code "manually unrolled", that processes data in chunks (using
assumption that w > 16).

With -O3 however, auto-vecorization kicks in for the already unrolled loop,
with the results being abysmal, a lot of unnecessary checks and (what I assume
to be) dead code.

Clang doesn't seem to have this problem.

Reply via email to