On 3/12/06, Nickolay Kolchin <[EMAIL PROTECTED]> wrote: > During "bashmark" memory benchmark perfomance analyze, I found 100x perfomance > regression between gcc 3.4.5 and gcc 4.X. > > ------ test_cmd.cpp (simplified bashmark memory RW test) ------- > #include <stdint.h> > #include <cstring> > > template <const uint8_t Block_Size, const uint32_t Loops> > static void int_membench(uint8_t* mb1, uint8_t* mb2) > { > for(uint32_t i = 0; i < Loops; i+=1) > { > #define T memcpy(mb1, mb2, Block_Size); memset(mb2, i, Block_Size); > T T T T T > T T T T T > #undef T > } > } > > template <const uint32_t Buf_Size, const uint32_t Loops> > static void membench() > { > static uint8_t mb1[Buf_Size]; > static uint8_t mb2[Buf_Size]; > for(uint32_t i = 0; i < 10000; i+=1) > int_membench<Buf_Size, Loops>(mb1, mb2); > } > > int main() > { > membench<128, 4000>(); > return 0; > } > > --------------------------------------------------------------- > GCC 3.4.5: 0.43user 0.00system 0:00.44elapsed > GCC 4.0.2: 34.83user 0.68system 0:36.09elapsed > GCC 4.1.0: 33.86user 0.58system 0:34.96elapsed > > Compiler options: > -march=athlon-xp > -O3 > -fomit-frame-pointer > -mfpmath=sse -msse > -ftracer -fweb > -maccumulate-outgoing-args > -ffast-math > > I've played with various settings (-O2, -O1, without march, without tracer and > web, etc) without any serious difference. I.e. GCC4 is always many times > slower > than GCC 3.4.5. > > Lurking inside assembler generation showed that GCC4 don't inline memcpy and > memset calls. > > ------ test.c (uber simplified problem demonstration) --------- > #include <string.h> > > char* f(char* b) > { > static char a[64]; > memcpy(a, b, 64); > memset(a, 0, 64); > return a; > } > ---------------------------------------------------------------- > > GCC4 will generate calls to memcpy and memset in this example. GCC3 will > inline > all calls. > > So, it looks like GCC4 inliner is broken at some point.
Inlining of memcpy/memset is architecture dependent (I see calls on ppc for gcc 3.4, too). This is a stupid benchmark and as such not worth optimizing for. Richard.