The full case attached.
Jakub, you are right, we have to convert signed ints into something a
bit more tricky.
BTW, here is output for that cases from Intel compiler:
vpxor %ymm1, %ymm1, %ymm1 #184.23
vmovdqu .L_2il0floatpacket.12(%rip), %ymm0 #184.23
movslq %ecx, %rdi #182.7
# LOE rax rbx rdi edx ecx ymm0 ymm1
..B1.82: # Preds ..B1.82 ..B1.81
vmovdqu 2132(%rsp,%rax,4), %ymm3 #183.14
vmovdqu 2100(%rsp,%rax,4), %ymm2 #183.27
vpsrad $8, %ymm3, %ymm11 #183.27
vpsrad $8, %ymm2, %ymm5 #183.27
vpcmpgtd %ymm5, %ymm1, %ymm4 #184.23
vpcmpgtd %ymm11, %ymm1, %ymm10 #184.23
vpand %ymm0, %ymm4, %ymm6 #184.23
vpand %ymm0, %ymm10, %ymm12 #184.23
vpaddd %ymm6, %ymm5, %ymm7 #184.23
vpaddd %ymm12, %ymm11, %ymm13 #184.23
vpsrad $1, %ymm7, %ymm8 #184.23
vpsrad $1, %ymm13, %ymm14 #184.23
vpaddd %ymm0, %ymm8, %ymm9 #184.23
vpaddd %ymm0, %ymm14, %ymm15 #184.23
vpslld $8, %ymm9, %ymm2 #185.27
vpslld $8, %ymm15, %ymm3 #185.27
vmovdqu %ymm2, 2100(%rsp,%rax,4) #185.10
vmovdqu %ymm3, 2132(%rsp,%rax,4) #185.10
addq $16, %rax #182.7
cmpq %rdi, %rax #182.7
jb ..B1.82 # Prob 99% #182.7
Thanks, K
On Tue, Dec 13, 2011 at 5:21 PM, Jakub Jelinek <[email protected]> wrote:
> On Tue, Dec 13, 2011 at 02:07:11PM +0100, Richard Guenther wrote:
>> > Hi guys,
>> > While looking at Spec2006/401.bzip2 I found such a loop:
>> > for (i = 1; i <= alphaSize; i++) {
>> > j = weight[i] >> 8;
>> > j = 1 + (j / 2);
>> > weight[i] = j << 8;
>> > }
>
> It would be helpful to have a self-contained testcase, because we don't know
> the types of the variables in question. Is j signed or unsigned?
> Signed divide by 2 is unfortunately not equivalent to >> 1.
> If j is signed int, on x86_64 we expand j / 2 as (j + (j >> 31)) >> 1.
> Sure, the pattern recognizer could try that if vector division isn't
> supported.
> If j is unsigned int, then I'd expect it to be already canonicalized into >>
> 1 by the time we enter the vectorizer.
>
> Jakub
int weight [ 258 * 2 ];
void foo(int alphaSize) {
int j, i;
for (i = 1; i <= alphaSize; i++) {
j = weight[i] >> 8;
j = 1 + (j / 2);
weight[i] = j << 8;
}
}