On Aug 25, 2005, at 2:09 PM, Fariborz Jahanian wrote:
Compiled with -O1 -mdynamic-no-pic -march=pentium4 produces:
pxor %xmm0, %xmm0
movsd %xmm0, 16(%eax)
movsd %xmm0, 8(%eax)
movsd %xmm0, (%eax)
But following code results in 7% performance gain in eon as reported
by one of Apple's performance people:
movl $0, 16(%eax)
movl $0, 20(%eax)
movl $0, 8(%eax)
movl $0, 12(%eax)
movl $0, (%eax)
movl $0, 4(%eax)
Actually that does not make sense for a real processor but then again
this is x86 we
are talking about. Because we have less complex instructions and less
instructions
at that matter.
-- Pinski