http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55812
--- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> 2012-12-26 15:43:50 UTC --- More precisely, the following seems equivalent to me and gets back all the performance, so it would be good if gcc could turn the original code into this one. #include <vector> thread_local std::vector<int> v; int main(){ std::vector<int>*vp=&v; for(long i=0;i<400000000;++i){ vp->push_back(i); } return vp->size(); }