Andrew,

> This is NOT a win on thunderX at least for single precision because you have 
> to do the divide and sqrt in the same time as it takes 5 multiples (estimate 
> and step are multiplies in the thunderX pipeline).  Doubles is 10 multiplies 
> which is just the same as what the patch does (but it is really slightly less 
> than 10, I rounded up). So in the end this is NOT a win at all for thunderX 
> unless we do one less step for both single and double.

Yes, the expected benefit from rsqrt estimation is implementation specific. If 
one has a better initial rsqrte or an application that can trade precision for 
execution time, we could offer a command line option to do only 2 steps for 
doulbe and 1 step for float; similar to -mrecip-precision for PowerPC.
What are your thoughts on that?

Best regards,
Benedikt

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to