The full case attached.

Jakub, you are right, we have to convert signed ints into something a
bit more tricky.
BTW, here is output for that cases from Intel compiler:

        vpxor     %ymm1, %ymm1, %ymm1                           #184.23
        vmovdqu   .L_2il0floatpacket.12(%rip), %ymm0            #184.23
        movslq    %ecx, %rdi                                    #182.7
                                # LOE rax rbx rdi edx ecx ymm0 ymm1
..B1.82:                        # Preds ..B1.82 ..B1.81
        vmovdqu   2132(%rsp,%rax,4), %ymm3                      #183.14
        vmovdqu   2100(%rsp,%rax,4), %ymm2                      #183.27
        vpsrad    $8, %ymm3, %ymm11                             #183.27
        vpsrad    $8, %ymm2, %ymm5                              #183.27
        vpcmpgtd  %ymm5, %ymm1, %ymm4                           #184.23
        vpcmpgtd  %ymm11, %ymm1, %ymm10                         #184.23
        vpand     %ymm0, %ymm4, %ymm6                           #184.23
        vpand     %ymm0, %ymm10, %ymm12                         #184.23
        vpaddd    %ymm6, %ymm5, %ymm7                           #184.23
        vpaddd    %ymm12, %ymm11, %ymm13                        #184.23
        vpsrad    $1, %ymm7, %ymm8                              #184.23
        vpsrad    $1, %ymm13, %ymm14                            #184.23
        vpaddd    %ymm0, %ymm8, %ymm9                           #184.23
        vpaddd    %ymm0, %ymm14, %ymm15                         #184.23
        vpslld    $8, %ymm9, %ymm2                              #185.27
        vpslld    $8, %ymm15, %ymm3                             #185.27
        vmovdqu   %ymm2, 2100(%rsp,%rax,4)                      #185.10
        vmovdqu   %ymm3, 2132(%rsp,%rax,4)                      #185.10
        addq      $16, %rax                                     #182.7
        cmpq      %rdi, %rax                                    #182.7
        jb        ..B1.82       # Prob 99%                      #182.7

Thanks, K

On Tue, Dec 13, 2011 at 5:21 PM, Jakub Jelinek <ja...@redhat.com> wrote:
> On Tue, Dec 13, 2011 at 02:07:11PM +0100, Richard Guenther wrote:
>> > Hi guys,
>> > While looking at Spec2006/401.bzip2 I found such a loop:
>> >     for (i = 1; i <= alphaSize; i++) {
>> >       j = weight[i] >> 8;
>> >       j = 1 + (j / 2);
>> >       weight[i] = j << 8;
>> >     }
>
> It would be helpful to have a self-contained testcase, because we don't know
> the types of the variables in question.  Is j signed or unsigned?
> Signed divide by 2 is unfortunately not equivalent to >> 1.
> If j is signed int, on x86_64 we expand j / 2 as (j + (j >> 31)) >> 1.
> Sure, the pattern recognizer could try that if vector division isn't
> supported.
> If j is unsigned int, then I'd expect it to be already canonicalized into >>
> 1 by the time we enter the vectorizer.
>
>        Jakub
int weight [ 258 * 2 ];

void foo(int alphaSize) {       
  int j, i;
  for (i = 1; i <= alphaSize; i++) {
    j = weight[i] >> 8;
    j = 1 + (j / 2);
    weight[i] = j << 8;
  }
}

Reply via email to