Curious. I ran both g++ variants in oprofile, and then compared the generated assembler code for the most critical functions.
The top 1 function in both cases is pointer_set_insert, and there the assembler code is 100% identical (module one choice between r14 and r15). The second most critical function in the gcc-in-cxx build is walk_tree_1, which is only place 4 in mainline gcc. There the code seems to be identical, too, except for code layout: The compiler arranges the code in a different order, and apparently has different a different branch prediction. The non-branching code is nearly identical, too. The "hottest" assembler instructions in walk_tree_1 are memory accesses, apparently the mainline version causes slightly less cache misses or better prediction? (my interpretation, not measured yet) I am a bit unsure how to proceed. The gcc-in-cxx assembler code looks ok, as it is nearly identical to the mainline code. The main differences are in the code/branch layout, and I wouldn't know how to debug this. Thomas