https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84719

--- Comment #10 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #9)
> So with 2 bytes we get

Try 3 bytes (the worst case).

> Are you sure performance isn't dominated by the
> first init loop (both GCC and clang vectorize it).

Replacing memcpy(,,block) with memcpy(,,8) (the next line masks the other bytes
anyway) gained a factor 8 in running time, when I tried the other day.

Reply via email to