https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95348

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |WAITING

--- Comment #7 from Martin Liška <marxin at gcc dot gnu.org> ---
Ok, I spent some time thinking about your workload and I would recommend the
following steps:

1) You should not generate profile data for each process to a different folder,
but rather merge it.

GCC PGO bootstrap contains ~500 .gcda files where the process is executed
~2000x.
Note that .gcda file merging happens per-file and the file is locked. It should
be a reasonable small window that can delay parallel process execution.

2) I would like to know how long does one process run and what portion is spent
in merging (and dumping) of a profile.

3) You may consider shrinking training run, 10.000 executions seems like a
massive training run to me.

4) GCDA file format is not ideal and can be simply and rapidly shrank by e.g.
gzip. For GCC PGO, it shrinks 10x.

Please provide as much information about the workload so that we can find a
feasible solution.

Reply via email to