This sequence makes no less than 2 calls memmoving 0 bytes: vector<char> content; content.resize(12);
>From *0x0 no less! On a 64 bit system, the effect is far less pronounced (25%), but on 32 bits, this loop takes 2.3 seconds on a vanilla g++; for(int n=0; n < 10000000; ++n) { vector<char> content; content.resize(12); } But if we add: if(__last != __first) before the __builtin_memmove on line 377 of stl_algobase.h, it takes 0.9 seconds! On a real 32 bit benchmark of PowerDNS, this single line gave a 9% performance boost. Seems worthwhile. Andrew Pinski suggested _M_fill_insert may need to do this check instead. #0 *__GI_memmove (dest=0x8b8dc18, src=0x0, len=0) at memmove.c:47 #1 0x080557f0 in std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<unsigned char> (__first=0x0, __last=0x0, __result=0x8b8dc18 "") at /usr/include/c++/4.3/bits/stl_algobase.h:378 #2 0x08055a3f in std::__copy_move_a<false, unsigned char*, unsigned char*> (__first=0x0, __last=0x0, __result=0x8b8dc18 "") at /usr/include/c++/4.3/bits/stl_algobase.h:397 #3 0x08055a7e in std::__copy_move_a2<false, unsigned char*, unsigned char*> (__first=0x0, __last=0x0, __result=0x8b8dc18 "") at /usr/include/c++/4.3/bits/stl_algobase.h:436 #4 0x08055ab9 in std::copy<unsigned char*, unsigned char*> (__first=0x0, __last=0x0, __result=0x8b8dc18 "") at /usr/include/c++/4.3/bits/stl_algobase.h:467 #5 0x08055ade in std::__uninitialized_copy<true>::uninitialized_copy<unsigned char*, unsigned char*> (__first=0x0, __last=0x0, __result=0x8b8dc18 "") at /usr/include/c++/4.3/bits/stl_uninitialized.h:98 #6 0x08055aff in std::uninitialized_copy<unsigned char*, unsigned char*> (__first=0x0, __last=0x0, __result=0x8b8dc18 "") at /usr/include/c++/4.3/bits/stl_uninitialized.h:122 #7 0x08055b20 in std::__uninitialized_copy_a<unsigned char*, unsigned char*, unsigned char> (__first=0x0, __last=0x0, __result=0x8b8dc18 "") at /usr/include/c++/4.3/bits/stl_uninitialized.h:262 #8 0x08055b48 in std::__uninitialized_move_a<unsigned char*, unsigned char*, std::allocator<unsigned char> > (__first=0x0, __last=0x0, __result=0x8b8dc18 "", __all...@0xff815c2c) at /usr/include/c++/4.3/bits/stl_uninitialized.h:272 #9 0x0805af27 in std::vector<unsigned char, std::allocator<unsigned char> >::_M_fill_insert (this=0xff815c2c, __position={_M_current = 0x0}, __n=12, _...@0xff815ab0) at /usr/include/c++/4.3/bits/vector.tcc:399 #10 0x0805b034 in std::vector<unsigned char, std::allocator<unsigned char> >::insert (this=0xff815c2c, __position={_M_current = 0x0}, __n=12, _...@0xff815ab0) at /usr/include/c++/4.3/bits/stl_vector.h:792 #11 0x0805b0b7 in std::vector<unsigned char, std::allocator<unsigned char> >::resize (this=0xff815c2c, __new_size=12, __x=0 '\0') at /usr/include/c++/4.3/bits/stl_vector.h:509 #12 0x08083057 in DNSPacketWriter (this=0xff815b9c, conte...@0xff815c2c, qna...@0xff815c04, qtype=6, qclass=1, opcode=0 '\0') at dnswriter.cc:21 #13 0x0804d449 in makeRootReferral () at speedtest.cc:259 #14 0x080601e9 in RootRefTest::operator() (this=0xff815e87) at speedtest.cc:329 #15 0x080602cc in doRun<RootRefTest> (c...@0xff815e87, mseconds=100) at speedtest.cc:37 #16 0x0804da3e in main () at speedtest.cc:443 -- Summary: vector<>::resize() from an empty vector calls memmove for 0 bytes, wasting a lot of cpu time in a production PowerDNS Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: ahu at ds9a dot nl GCC build triplet: x86_64-unknown-linux-gnu GCC host triplet: x86_64-unknown-linux-gnu GCC target triplet: x86_64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41267