Hi Samuel, On Tue, Mar 22, 2022 at 3:15 PM Samuel Henrique <samuel...@debian.org> wrote: > > I believe there could be noticeable performance gains from using all > the threads available.
I share your hope and have implemented two attempts to parallelize the ~300 or so checks. My first attempt used IO::Async but failed. That module is probably the best one currently available, but it replaces the SIGCHLD handler. Lintian uses dozens of other modules that call external programs via other means. Unfortunately, those do not interact well with IO::Async, which causes the parallel execution to freeze or otherwise experience strange bugs. A particularly serious problem for Lintian was the interaction with Path::Tiny. [1] You may be able to find some details by searching the Git log for "Heisenbug" (capital H, please). My current implementation uses MCE [2] which works okay, but does not yet yield the performance gains you and I are hoping for. That is why the experimental branch has not been merged. As far as I can tell, the degradation relates to the serializations Perl performs between parent and child processes. It is possible to "close" on the in-memory file indexes as part of the fork() but it's not enough to explain the difference. (The indexes are large and also being transitioned to disk for unrelated reasons.) Memory usage is higher, as well. I may have to implement better profiling before we make significant progress. That is because at least half the time is spent generating the file indexes, which require a different parallelization strategy than the checks. One long-term plan could be to have a data interchange format between the parent and the child processes. It would also allow checks to be written in other programming languages, such as Haskell, but I would seek further community input before proceeding with anything like that. [1] https://github.com/dagolden/Path-Tiny/issues/224 [2] https://metacpan.org/pod/MCE > Although I don't know how feasible that is with > lintian+perl. Perl performs surprisingly well for an interpreted language, but I am not sure true "threading" works well. In Lintian, we use multiple processes, if at all. That is how I interpreted your use of the word "threads". > Note that I didn't go all the way to debugging lintian to confirm it's > single-threaded You are right. For the purposes of your analysis, Lintian uses a single process. Thank you for your valuable suggestions! Kind regards, Felix Lechner