------- Additional Comments From steven at gcc dot gnu dot org 2004-12-23 12:19 ------- The initial CP lexer bugger size is 10000:
#define CP_LEXER_BUFFER_SIZE 10000 That came in with the lex-all-ahead patch from Matt and Zack, on 2004-09-20 (parser.c rev. 1.250 for the CVS history diggers) but it seems a bit low to me if you're going to lex the whole file up front. I would not be surprised if the average C++ code with lots of templates has several 100,000 tokens... Let me see: - preprocessed sources for generate.ii from PR8361, blank and pound lines stripped: 36200 lines - an average of 7 tokens per line in the first 500 lines, let's assume that's a reasonable average for the whole file (it's easy to instrument g++ to get the exact number of tokens, if you want more accurate numbers ;-) That makes it >250,000 tokens for this file. Since we double the buffer, we have: 10,000 + 20,000 + 40,000 + 80,000 + 160,000 + 320,000 = 630000 That is the number of tokens we have allocate room for, with no ggc-collect in the middle. With ggc-page, which has power-of-2 based page sizes, it's safe to assume that each previous buffer is too small to be reallocated, so a full new buffer is allocated and the old one is memcpy-ed to the new one. With checking off, we ggc_free the old buffer, but with checking enabled we don't so after finishing the whole lexing process, we have keep around a buffer of ~380,000*sizeof(cp_token), so that's roughly 10MB of memory we can't reclaim until the first ggc_collect call. Maybe buffer should not be in GC memory at all? We know the exact live time of buffer, and as far as I can tell we never ggc_collect while it is live. According to the comments for cp_lexer, "Tokens are never added to the cp_lexer after it is created." So it may be cheaper to have the buffer xmalloced, and memcpy-ed to a buffer in GC space just before saving it in the new cp_lexer object. So two suggestions for a person who wants to make g++ a little faster here: - make CP_LEXER_BUFFER_SIZE larger. To make it use pages more efficiently, look for some ratio of pagesize/(sizeof (cp_token)) - see buffer in parser.c:cp_lexer_new_main can be moved out of GC space as suggested above. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12850