https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114563

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Andi Kleen from comment #10)
> It doesn't really help for the PR119387 test case, perhaps not surprising
> because it optimizes freeing not allocation:
> 
> Summary
>   ./gcc/cc1plus-opt -w -std=c++20 ~/gcc/git/tsrc/119387-formatted.ii  
> -quiet ran
>     1.02 ± 0.03 times faster than ./gcc/cc1plus -w -std=c++20
> ~/gcc/git/tsrc/119387-formatted.ii   -quiet
> 
> 
> So still need a real test case.

Hmm.

@@ -782,8 +836,10 @@ alloc_page (unsigned order)
   entry = NULL;
   page = NULL;

+  free_list = find_free_list_order (order, entry_size);
+
   /* Check the list of free pages for one we can use.  */
-  for (pp = &G.free_pages, p = *pp; p; pp = &p->next, p = *pp)
+  for (pp = &free_list->free_pages, p = *pp; p; pp = &p->next, p = *pp)
     if (p->bytes == entry_size)
       break;


so my idea was to have multiple freelists so that p->bytes == entry_size
and this list walk, which is the bottleneck for PR119387 I think, is
improved.

I'm testing the PR119387 testcase again, with -O2 -g -std=c++20 -fno-checking
it
is the above loop that results in the following in a cc1plus build with -O0:

Samples: 2M of event 'cycles:Pu', Event count (approx.): 2787093272529
  61.29%       1373199  cc1plus  cc1plus               [.] alloc_page

Using your patch this changes to

Samples: 1M of event 'cycles:Pu', Event count (approx.): 1053172130606
   0.02%           234  cc1plus  cc1plus  [.] alloc_page(unsigned int)

so the patch works as intended!

Btw, I think you can avoid adding the freelist pointer to page_entry
by using the O(1) lookup from page_entry->order & page_entry->bytes?
The structs size doesn't seem very optimized, so I'm not sure how
important that is.

Reply via email to