https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99105
--- Comment #4 from Martin Liška <marxin at gcc dot gnu.org> --- (In reply to Jan Hubicka from comment #3) > > A small improvement can be achieved by the removal of libgcov I/O buffering: > > https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=5a17015c096012b9e43a8dd45768a8d5fb3a3aee > > So it effectively replaces gcov's own buffered I/O by stdio. First I am > not sure how safe it is (as we had a lot of fun about using malloc) Why is not safe? We use filesystem locking for .gcda file. > also it adds dependency on stdio that is not necessarily good idea for > embedded targets. Not sure how often it is used there. It was motivated by PR97834. Well, I think it's better to rely on a system C library as it provides a faster implementation of buffered I/O. For embedded targets, I plan to implement hooks that can be used instead of I/O: https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559342.html > > But why glibc stdio is more effective? Is it because our buffer size of > 1k is way too small (as it seems juding from the profile that is > dominated by fread calls rather than open/lock/close)? It behaved the same on my machine, but BSD impact was more significant. > > > > But the key thing is likely the ability to omit profile modifications > > (read/modify/write) for parts of a binary that are not trained. > Problem there are the per-program summaries that needs to be updated > even for files never visited. > > It seems that producing one file with tar-like format that can be > expanded to gcda files by gcov-tool would be good idea. Even if we need > to lock whole file it is probably faster than a lot of small I/Os. I'm planning to collect more detailed statistics about why is a lot of small I/Os slower. In the case of Clang, I would expect 100s (or even 1000s) of object files. During profiling run (using all cores), I would expect each run takes 100ms (or even seconds), so waiting for a file lock of an object file should not block it much. > To avoid waiting for lock one can simply allow multiple profile files to > be created and teach libgcov to acquire unlocked file in pseudorandom > order. > > Honza