On 08/03/2015 09:50 AM, Ilya Enkovich wrote:
The original code looks better, tree height is just 2 and therefore it can be executed in 2 cycles. New code has more dependencies and tree height becomes 5. It is always hard to say for all x86 targets but as a generic code the original version is better.
Agreed. Reducing tree height is definitely a good thing as a general rule. jeff