------- Additional Comments From steven at gcc dot gnu dot org  2004-12-23 
12:19 -------
The initial CP lexer bugger size is 10000:

#define CP_LEXER_BUFFER_SIZE 10000

That came in with the lex-all-ahead patch from Matt and Zack,
on 2004-09-20 (parser.c rev. 1.250 for the CVS history diggers)
but it seems a bit low to me if you're going to lex the whole
file up front.  I would not be surprised if the average C++
code with lots of templates has several 100,000 tokens...  Let
me see:
- preprocessed sources for generate.ii from PR8361, blank and
  pound lines stripped:  36200 lines
- an average of 7 tokens per line in the first 500 lines, let's
  assume that's a reasonable average for the whole file (it's
  easy to instrument g++ to get the exact number of tokens, if
  you want more accurate numbers ;-)

That makes it >250,000 tokens for this file.

Since we double the buffer, we have:
10,000 + 20,000 + 40,000 + 80,000 + 160,000 + 320,000 = 630000

That is the number of tokens we have allocate room for, with no
ggc-collect in the middle.  With ggc-page, which has power-of-2
based page sizes, it's safe to assume that each previous buffer
is too small to be reallocated, so a full new buffer is allocated
and the old one is memcpy-ed to the new one.  With checking off,
we ggc_free the old buffer, but with checking enabled we don't
so after finishing the whole lexing process, we have keep around
a buffer of ~380,000*sizeof(cp_token), so that's roughly 10MB
of memory we can't reclaim until the first ggc_collect call.

Maybe buffer should not be in GC memory at all?  We know the
exact live time of buffer, and as far as I can tell we never
ggc_collect while it is live.  According to the comments for
cp_lexer, "Tokens are never added to the cp_lexer after it is
created."  So it may be cheaper to have the buffer xmalloced,
and memcpy-ed to a buffer in GC space just before saving it
in the new cp_lexer object.

So two suggestions for a person who wants to make g++ a little
faster here:
- make CP_LEXER_BUFFER_SIZE larger.  To make it use pages more
  efficiently, look for some ratio of pagesize/(sizeof (cp_token))
- see buffer in parser.c:cp_lexer_new_main can be moved out of GC
  space as suggested above.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12850

Reply via email to