On 15.04.2013 02:10, Nozomi Kodama wrote:
Hello
thanks for the review.
I don't think that calling defines is the way to go. Indeed, I tested my patch
and yours. Yours is about 12% slower than mine in my computer.
And now, we try to take care of efficiency of this dll. So, it is not the time
to increase latency.
You are right. I'm not really sure why the code should be slower though.
The #defines shouldn't have an impact on the performance, but it might
be because it is translated to:
ta = 0.28209479f * a[0] + -0.12615662f * a[6] + -0.21850968f * a[8];
tb = 0.28209479f * b[0] + -0.12615662f * b[6] + -0.21850968f * b[8];
out[1] = 0.0f + ta * b[1] + tb * a[1];
t = a[1] * b[1];
out[0] = out[0] + 0.28209479f * t;
out[6] = 0.0f + -0.12615662f * t;
out[8] = 0.0f + -0.21850968f * t;
instead of:
ta = 0.28209479f * a[0] - 0.12615662f * a[6] - 0.21850968f * a[8];
tb = 0.28209479f * b[0] - 0.12615662f * b[6] - 0.21850968f * b[8];
out[1] = ta * b[1] + tb * a[1];
t = a[1] * b[1];
out[0] += 0.28209479f * t;
out[6] = -0.12615662f * t;
out[8] = -0.21850968f * t;
May be due to "out[8] = -0.21850968f * t;" vs "out[8] = 0.0f +
-0.21850968f * t;":
1. the extra 0.0f - Shouldn't the 0.0f get optimized away? Imho the
macro could be fixed (e.g. use an if / separate macro / don't use a
macro for this cases).
2. "+ -0.21850968f * t;" should be as fast as "-0.21850968f * t" isn't it?
Does anyone else have an objection about this? I just feel that the code
size and the always reused constants could be written a little bit
nicer. I'm not sure if the macro usage is really better, it was just a
quick idea to avoid that much code duplication. Also changes like the
suggested below are really error prone, because they change a lot of
code mostly using copy and paste. Though I vote for the same style and
precision usage in all cases.
Hope this helps a bit
Rico
I used 10 digits since there are a lot of computation, I want to avoid as much
as possible big rounding errors. If we want to uniformize, then we should
uniformize d3dxshmultiply 2,3,4 with 10 digits.
But that is for an another patch.
Nozomi.