Re: Calculating cosinus/sinus

Ondřej Bílka Sun, 12 May 2013 07:01:20 -0700

On Sun, May 12, 2013 at 02:14:31PM +0200, David Brown wrote:
> On 11/05/13 17:20, jacob navia wrote:
> >Le 11/05/13 16:01, Ondřej Bílka a écrit :
> >>As 1) only way is measure that. Compile following an we will see who is
> >>rigth.
> >>
> >>cat "
> >>#include <math.h>
> >>
> >>int main(){ int i;
> >>   double x=0;
> >>
> >>   double ret=0;
> >>   double f;
> >>   for(i=0;i<10000000;i++){
> >>      ret+=sin(x);
> >>     x+=0.3;
> >>   }
> >>   return ret;
> >>}
> >>" > sin.c
> >OK I did a similar thing. I just compiled sin(argc) in main.
> >The results prove that you were right. The single fsin instruction
> >takes longer than several HUNDRED instructions (calls, jumps
> >table lookup what have you)
> >
> >Gone are the times when an fsin would take 30 cycles or so.
> >Intel has destroyed the FPU.
> >
> 
> What makes you so sure that it takes more than 30 cycles to execute
> hundreds of instructions in the library?  Modern cpus often do
> several instructions per cycle (I am not considering multiple cores
> here).  They can issue several instructions per cycle, and predicted
> jumps can often be eliminated entirely in the decode stages.
>
To clarify numbers here 30 cycles library call is unrealistic, just
latency caused by call and saving/restoring xmm register overhead 
is often more than 30 cycles.
A sin takes around 150 cycles for normal inputs.


A fsin is slower for several reasons. One is that performance depends on
input. From http://www.agner.org/optimize/instruction_tables.pdf
fsin takes about 20-100 cycles.

Second problem is that xmm->memory->fpu->memory->xmm roundtrip is expensive.
There is performance penalty when switching between fpu and xmm
instructions.

> The moral here is that /you/ need to benchmark /your/ code on /your/
> processor - don't jump to conclusions, or accept other benchmarks as
> giving the complete picture.
> 
Agreed.

Re: Calculating cosinus/sinus

Reply via email to