https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68482
--- Comment #2 from lvqcl.mail at gmail dot com --- (In reply to Marc Glisse from comment #1) > The extra cast so 32-bit unsigned and 64-bit pointers can interact confuses > the compiler. Trunk (gcc-6) seems to work fine though, can you confirm? I never compiled GCC myself (and I use Windows), so I found and downloaded "gcc version 6.0.0 20151121 (experimental)" from https://sourceforge.net/projects/mingw-w64/files/Toolchains%20targetting%20Win64/Personal%20Builds/dongsheng-daily/ The loop is vectorized: .L4: movdqu (%rbx), %xmm0 addl $1, %r9d addq $16, %rbx cmpl %r9d, %edx paddd %xmm0, %xmm1 ja .L4