Andrew, > This is NOT a win on thunderX at least for single precision because you have > to do the divide and sqrt in the same time as it takes 5 multiples (estimate > and step are multiplies in the thunderX pipeline). Doubles is 10 multiplies > which is just the same as what the patch does (but it is really slightly less > than 10, I rounded up). So in the end this is NOT a win at all for thunderX > unless we do one less step for both single and double.
Yes, the expected benefit from rsqrt estimation is implementation specific. If one has a better initial rsqrte or an application that can trade precision for execution time, we could offer a command line option to do only 2 steps for doulbe and 1 step for float; similar to -mrecip-precision for PowerPC. What are your thoughts on that? Best regards, Benedikt
signature.asc
Description: Message signed with OpenPGP using GPGMail