Re: speed of double-precision divide

Richard Guenther Sat, 23 Jan 2010 08:52:35 -0800

On Sat, Jan 23, 2010 at 5:47 PM, Steve White <swh...@aip.de> wrote:
> Hi,
>
> I recently revised some speed tests of basic CPU operations.
> There were a few surprises, but one was that, a test of double-precision
> divide was a factor of ten slower when compiled with gcc than with the
> Intel compiler icc.
>
> This was with full optimization turned on, with an Intel Duo (Yonah)
> processor.
>
> I figured gcc was simply not using SSE2, and icc was.
>
> But that is not the case at all.  While gcc produces apparently SSE2
> assembler, icc does something quite different.
>
> What's going on?


        for( j = 0; j < ITERATIONS; j++ )
                for( i = 0; i < size; i++ )
                        dvec1[i] /= dvec2[i];

it seems that icc performed loop interchange and computes

   for (i = 0; i < size; i++)
     {
        double one_over_dv2 = 1.0/dvec2[i];
        for (j = 0; j < ITERATIONS; j++)
           dvec1[j] *= one_over_dv2;
      }

Richard.

> Find the .c file attached.  Assembler snippets follow.
>
> ------------
> gcc has this (gcc -std=c99 -O3 -msse2 -mfpmath=sse -lm -S dt.c)
> ------------
> .L27:
>        movapd  (%esi,%eax), %xmm3       ;move 2 dbls at *(esi+eax) to xmm3
>        divpd   192(%esp,%eax), %xmm3    ;(192 is xmm2) *(esp+eax), 
> result->xmm3
>        movapd  %xmm3, (%esi,%eax)       ;move 2 dbls from xmm3 back
>        addl    $16, %eax                ;add 16 (len of 2 doubles) to eax
>        cmpl    $16384, %eax             ;compare eax to 1024 * 16
>        jne     .L27                     ;if not equal, do it again
>
> ------------
> icc has this (icc -Wall -w2 -fast -c dt.c)
> ------------
>                                # LOE eax xmm2
> ..B1.69:                        # Preds ..B1.71 ..B1.68
>        movsd     8336(%esp,%eax,8), %xmm1                      #108.30
>        movsd     _2il0floatpacket.13, %xmm0                    #108.2
>        divsd     24720(%esp,%eax,8), %xmm0                     #108.2
>        unpcklpd  %xmm2, %xmm1                                  #108.30
>        xorl      %edx, %edx                                    #
>        movddup   %xmm0, %xmm0                                  #108.2
>        movddup   %xmm0, %xmm0                                  #108.2
>                                # LOE eax edx xmm0 xmm1 xmm2
> ..B1.70:                        # Preds ..B1.70 ..B1.69
>        mulpd     %xmm0, %xmm1                                  #108.2
>        mulpd     %xmm0, %xmm1                                  #108.2
>        mulpd     %xmm0, %xmm1                                  #108.2
>        mulpd     %xmm0, %xmm1                                  #108.2
>        addl      $8, %edx                                      #
>        cmpl      $131072, %edx                                 #108.2
>        jb        ..B1.70       # Prob 99%                      #108.2
>
>
> --
> | -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -
> | Steve White                                             +49(331)7499-202
> | e-Science / AstroGrid-D                                   Zi. 35  Bg. 20
> | -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -
> | Astrophysikalisches Institut Potsdam (AIP)
> | An der Sternwarte 16, D-14482 Potsdam
> |
> | Vorstand: Prof. Dr. Matthias Steinmetz, Peter A. Stolz
> |
> | Stiftung privaten Rechts, Stiftungsverzeichnis Brandenburg: III/7-71-026
> | -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -
>

Re: speed of double-precision divide

Reply via email to