Re: How to debug/improve excessive compiler memory usage and compile times

2024-10-01 Thread Richard Biener via Gcc



> Am 01.10.2024 um 17:11 schrieb Matthias Kretz via Gcc :
> 
> Hi,
> 
> the  unit tests are my long-standing pain point of
> excessive compiler memory usage and compile times. I've always worked around
> the memory usage problem by splitting the test matrix into multiple
> translations (with different -D flags) of the same source file. I.e. pay with
> a huge number of compiler invocations to be able to compile at all. OOM kills
> / thrashing isn't fun.
> 
> Recently, the GNU Radio 4 implementation hit a similar issue of excessive
> compiler memory usage and compile times. Worst case example I have tested (a
> single TU on a Xeon @ 4.50 GHz, 64 GB RAM (no swapping while compiling)):
> 
> GCC 15: 13m03s, 30.413 GB (checking enabled)
> GCC 14: 12m03s, 15.248 GB
> GCC 13: 11m40s, 14.862 GB
> Clang 18: 8m10s, 10.811 GB
> 
> That's supposed to be a unit test. But it's nothing one can use for test-
> driven development, obviously. But how do mere mortals optimize code for
> better compile times? -ftime-report is interesting but not really helpful. -Q
> has interesting information, but the output format is unusable for C++ and
> it's really hard to post-process.
> 
> When compiler memory usage goes through the roof it's fairly obvious that
> compile times have to suffer. So I was wondering whether there are any low-
> hanging fruit to pick. I've managed to come up with a small torture test that
> shows interesting behavior. I put it at 
> https://github.com/mattkretz/template-torture-test. Simply do
> 
> git clone https://github.com/mattkretz/template-torture-test
> cd template-torture-test
> make STRESS=7
> make TORTURE=1 STRESS=5
> 
> These numbers can already "kill" smaller machines. Be prepared to kill cc1plus
> before things get out of hand.
> 
> The bit I find interesting in this test is switched with the -D GO_FAST macro
> (the 'all' target always compiles with and without GO_FAST). With the macro,
> template arguments to 'Operand' are tree-like and the resulting
> type name is *longer*. But GGC usage is only at 442M. Without GO_FAST,
> template arguments to 'Operand' are a flat list. But GGC usage is
> at 22890M. The latter variant needs 24x longer to compile.
> 
> Are long flat template argument/parameter lists a special problem? Why does it
> make overload resolution *so much more* expensive?
> 
> Beyond that torture test (should I turn it into a PR?), what can I do to help?

Analyze where the compile time is spent and where memory is spent.  Identify 
unfitting data structures and algorithms causing the issue.  Replace with 
better ones.  That’s what I do for these kind of issues in the middle end.

Richard 

> Thanks,
>  Matthias
> 
> --
> ──
> Dr. Matthias Kretz   https://mattkretz.github.io
> GSI Helmholtz Center for Heavy Ion Research   https://gsi.de
> std::simd
> ──
> 


Re: How to debug/improve excessive compiler memory usage and compile times

2024-10-01 Thread Ben Boeckel via Gcc
On Tue, Oct 01, 2024 at 18:06:35 +0200, Richard Biener via Gcc wrote:
> Analyze where the compile time is spent and where memory is spent.
> Identify unfitting data structures and algorithms causing the issue.
> Replace with better ones.  That’s what I do for these kind of issues
> in the middle end.

To this end, are there any plans or progress to implementing
`-ftime-trace` profiling output? (It's a clang thing, but there is
tooling that would be able to use GCC's reports if it made them.)


https://aras-p.info/blog/2019/01/12/Investigating-compile-times-and-Clang-ftime-report/

This is also available to do build-wide perf visualization (when using
`ninja`) with this tool that reads `-ftime-trace` files when they're
available):

https://github.com/nico/ninjatracing

(FWIW, I have a dream to have CI builds be able to save out this
information so that longer-term perf trends can be investigated. If this
comes up with project changes to do less silly template magic or
improvements to compilers to be faster, all the better.)

Thanks,

--Ben


Re: GCC devroom at FOSDEM 2025?

2024-10-01 Thread Jose E. Marchesi via Gcc


Hi Thomas.

> FOSDEM have recently posted the "FOSDEM 2025 call for devrooms",
> .
> Given the great success of last year's GCC devroom:
> , I'd like to apply again,
> for a GCC devroom (and related toolchain parts) at FOSDEM 2025.
>
> ..., but before I do that: I need two, three people as co-organizers:
> for evaluating submissions (before 2024-12-15), and/or for working
> with suitable students and the GNU Toolchain Fund trustees regarding
> travel grants, and/or for helping on-site (2025-02-01/02)?

I can help.  I have co-organized FOSDEM devrooms several times in the
past.

>
>
> Grüße
>  Thomas


GCC devroom at FOSDEM 2025?

2024-10-01 Thread Thomas Schwinge
Hi!

FOSDEM have recently posted the "FOSDEM 2025 call for devrooms",
.
Given the great success of last year's GCC devroom:
, I'd like to apply again,
for a GCC devroom (and related toolchain parts) at FOSDEM 2025.

..., but before I do that: I need two, three people as co-organizers:
for evaluating submissions (before 2024-12-15), and/or for working
with suitable students and the GNU Toolchain Fund trustees regarding
travel grants, and/or for helping on-site (2025-02-01/02)?


Grüße
 Thomas


How to debug/improve excessive compiler memory usage and compile times

2024-10-01 Thread Matthias Kretz via Gcc
Hi,

the  unit tests are my long-standing pain point of 
excessive compiler memory usage and compile times. I've always worked around 
the memory usage problem by splitting the test matrix into multiple 
translations (with different -D flags) of the same source file. I.e. pay with 
a huge number of compiler invocations to be able to compile at all. OOM kills 
/ thrashing isn't fun.

Recently, the GNU Radio 4 implementation hit a similar issue of excessive 
compiler memory usage and compile times. Worst case example I have tested (a 
single TU on a Xeon @ 4.50 GHz, 64 GB RAM (no swapping while compiling)):

GCC 15: 13m03s, 30.413 GB (checking enabled)
GCC 14: 12m03s, 15.248 GB
GCC 13: 11m40s, 14.862 GB
Clang 18: 8m10s, 10.811 GB

That's supposed to be a unit test. But it's nothing one can use for test-
driven development, obviously. But how do mere mortals optimize code for 
better compile times? -ftime-report is interesting but not really helpful. -Q 
has interesting information, but the output format is unusable for C++ and 
it's really hard to post-process.

When compiler memory usage goes through the roof it's fairly obvious that 
compile times have to suffer. So I was wondering whether there are any low-
hanging fruit to pick. I've managed to come up with a small torture test that 
shows interesting behavior. I put it at 
https://github.com/mattkretz/template-torture-test. Simply do

git clone https://github.com/mattkretz/template-torture-test
cd template-torture-test
make STRESS=7
make TORTURE=1 STRESS=5

These numbers can already "kill" smaller machines. Be prepared to kill cc1plus 
before things get out of hand.

The bit I find interesting in this test is switched with the -D GO_FAST macro 
(the 'all' target always compiles with and without GO_FAST). With the macro, 
template arguments to 'Operand' are tree-like and the resulting 
type name is *longer*. But GGC usage is only at 442M. Without GO_FAST, 
template arguments to 'Operand' are a flat list. But GGC usage is 
at 22890M. The latter variant needs 24x longer to compile.

Are long flat template argument/parameter lists a special problem? Why does it 
make overload resolution *so much more* expensive?

Beyond that torture test (should I turn it into a PR?), what can I do to help?
 
Thanks,
  Matthias

-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Center for Heavy Ion Research   https://gsi.de
 std::simd
──


signature.asc
Description: This is a digitally signed message part.