The full case attached. Jakub, you are right, we have to convert signed ints into something a bit more tricky. BTW, here is output for that cases from Intel compiler:
vpxor %ymm1, %ymm1, %ymm1 #184.23 vmovdqu .L_2il0floatpacket.12(%rip), %ymm0 #184.23 movslq %ecx, %rdi #182.7 # LOE rax rbx rdi edx ecx ymm0 ymm1 ..B1.82: # Preds ..B1.82 ..B1.81 vmovdqu 2132(%rsp,%rax,4), %ymm3 #183.14 vmovdqu 2100(%rsp,%rax,4), %ymm2 #183.27 vpsrad $8, %ymm3, %ymm11 #183.27 vpsrad $8, %ymm2, %ymm5 #183.27 vpcmpgtd %ymm5, %ymm1, %ymm4 #184.23 vpcmpgtd %ymm11, %ymm1, %ymm10 #184.23 vpand %ymm0, %ymm4, %ymm6 #184.23 vpand %ymm0, %ymm10, %ymm12 #184.23 vpaddd %ymm6, %ymm5, %ymm7 #184.23 vpaddd %ymm12, %ymm11, %ymm13 #184.23 vpsrad $1, %ymm7, %ymm8 #184.23 vpsrad $1, %ymm13, %ymm14 #184.23 vpaddd %ymm0, %ymm8, %ymm9 #184.23 vpaddd %ymm0, %ymm14, %ymm15 #184.23 vpslld $8, %ymm9, %ymm2 #185.27 vpslld $8, %ymm15, %ymm3 #185.27 vmovdqu %ymm2, 2100(%rsp,%rax,4) #185.10 vmovdqu %ymm3, 2132(%rsp,%rax,4) #185.10 addq $16, %rax #182.7 cmpq %rdi, %rax #182.7 jb ..B1.82 # Prob 99% #182.7 Thanks, K On Tue, Dec 13, 2011 at 5:21 PM, Jakub Jelinek <ja...@redhat.com> wrote: > On Tue, Dec 13, 2011 at 02:07:11PM +0100, Richard Guenther wrote: >> > Hi guys, >> > While looking at Spec2006/401.bzip2 I found such a loop: >> > for (i = 1; i <= alphaSize; i++) { >> > j = weight[i] >> 8; >> > j = 1 + (j / 2); >> > weight[i] = j << 8; >> > } > > It would be helpful to have a self-contained testcase, because we don't know > the types of the variables in question. Is j signed or unsigned? > Signed divide by 2 is unfortunately not equivalent to >> 1. > If j is signed int, on x86_64 we expand j / 2 as (j + (j >> 31)) >> 1. > Sure, the pattern recognizer could try that if vector division isn't > supported. > If j is unsigned int, then I'd expect it to be already canonicalized into >> > 1 by the time we enter the vectorizer. > > Jakub
int weight [ 258 * 2 ]; void foo(int alphaSize) { int j, i; for (i = 1; i <= alphaSize; i++) { j = weight[i] >> 8; j = 1 + (j / 2); weight[i] = j << 8; } }