Re: Performance degradation on g++ 4.6

Oleg Smolsky Mon, 22 Aug 2011 18:34:37 -0700

On 2011/8/22 18:09, Oleg Smolsky wrote:

Both compilers fully inline the templated function and the emittedcode looks very similar. I am puzzled as to why one of these loops issignificantly slower than the other. I've attached disassembledlistings - perhaps someone could have a look please? (the body of theloop starts at 0000000000400FD for gcc41 and at 0000000000400D90 forgcc46)

The difference, theoretically, should be due to the inner loop:


v4.6:
.text:0000000000400DA0 loc_400DA0:
.text:0000000000400DA0                 add     eax, 0Ah
.text:0000000000400DA3                 add     al, [rdx]
.text:0000000000400DA5                 add     rdx, 1
.text:0000000000400DA9                 cmp     rdx, 5034E0h
.text:0000000000400DB0                 jnz     short loc_400DA0

v4.1:
.text:0000000000400FE0 loc_400FE0:
.text:0000000000400FE0                 movzx   eax, ds:data8[rdx]
.text:0000000000400FE7                 add     rdx, 1
.text:0000000000400FEB                 add     eax, 0Ah
.text:0000000000400FEE                 cmp     rdx, 1F40h
.text:0000000000400FF5                 lea     ecx, [rax+rcx]
.text:0000000000400FF8                 jnz     short loc_400FE0

However, I cannot see how the first version would be slow... The customtemplated "shifter" degenerates into "add 0xa", which is the point ofthe test... Hmm...


Oleg.

Re: Performance degradation on g++ 4.6

Reply via email to