On Tue, Aug 07, 2007 at 12:08:48PM -0500, Ben White wrote: > From the GCC Wiki, Speedup Areas page, Strings/identifiers section: > > /3. Replace identifier hash table with a better data structure (have > already tried a ternary tree, it's not faster; could try to code it even > cleverer than it already was; B* trees might be worth looking into)/ > > A different approach: > [ two pass approach, see parent ]
It's not clear to me that the two passes separate as cleanly as you describe. In C-like languages, you can't determine what identifiers refer to without doing the full parse, together with semantic analysis, to know if a given string is an identifier, a typedef, the name of a class, etc., and to which of a number of objects the identifier refers. A fast scan that just finds the identifiers doesn't work (especially for C++), you need to do the full parse. So I think that your first pass will be roughly as expensive as what gcc does already. But you are welcome to conduct an experiment to prove me wrong. Before you start, however, you might want to make certain that the part you want to speed up is really a significant cost.