Re: gcc will become the best optimizing x86 compiler

Denys Vlasenko Wed, 30 Jul 2008 09:09:13 -0700

On Wed, Jul 30, 2008 at 5:57 PM, Denys Vlasenko
<[EMAIL PROTECTED]> wrote:
> On Fri, Jul 25, 2008 at 9:08 AM, Agner Fog <[EMAIL PROTECTED]> wrote:
>> Raksit Ashok wrote:
>>>There is a more optimized version for 64-bit:
>>>http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/amd64/gen/memcpy.s
>>>I think this looks similar to your implementation, Agner.
>>
>> Yes it is similar to my code.
>
> 3164 line source file which implements memcpy().
> You got to be kidding.
> How much of L1 icache it blows away in the process?
> I bet it performs wonderfully on microbenchmarks though.
>
>   2991                 .balign 16               # sadistic alignment strikes 
> again
>   2992 L(bkPxQx):      .int L(bkP0Q0)-L(bkPxQx) # why use two bytes when
> we can use four?
>
> Seriously. What possible reason there can be to align
> a randomly accessed data table to 16 bytes?
> 4 bytes I understand, but 16?


I'm afraid I sounded a bit confrontational above, here comes the
clarification. I have nothing against making code faster.
But there should be some balance between -O999 mindset
and -Os midset. If you just found a tweak which gives you 1.2%
speedup in microbencmark but code grew 4 times bigger, *stop*.
Think about it.

"We unrolled the loop two gazillion times and it's 3% faster now"
is a similarly bad idea.

I must admit that I didn't look too closely at
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/amd64/gen/memcpy.s
but at the first glance it sure looks like someone
got carried away a bit.
--
vda

Re: gcc will become the best optimizing x86 compiler

Reply via email to