On 19 Mar 2007 12:43:49 -0000, dominiq at lps dot ens dot fr <[EMAIL PROTECTED]> wrote:
Since sin() and cos() are non trivial functions, I am very surprised that a wrong API makes a 50% difference.
Well Here is how it can make a 50% difference (at least on the Cell, the 970 has less of a restriction and only the dispatch group is rejected). Modern PowerPC processors like not to store stuff to the stack and then load it again with in a number of cycles (cell is around 50 cycles while the 970 is just within a dispatch group). Transfering between the integer register set and the floating point register set can only be done via memory so you will get a LHS or a LRU reject (depending on what processor you are on). This can either cause a 50 cycle delay or reject of the dispatch group (the later can cause multiple rejects). The number of cycles used up by this issue can add up with both sides of the function having this hazard. Thanks, Andrew Pinski