Hi, Richi Thank you for your review!
On 03/16, Richard Biener wrote: > On Fri, 13 Mar 2020, Giuliano Belinassi wrote: > > > Hi, all > > > > I want to propose and apply for the following GSoC project: Automatic > > Detection of Parallel Compilation Viability. > > > > Here is the proposal, and I am attaching a pdf file for better > > readability: > > > > **Automatic Detection of Parallel Compilation Viability** > > > > [Giuliano Belinassi]{style="color: darkgreen"}\ > > Timezone: GMT$-$3:00\ > > University of São Paulo -- Brazil\ > > IRC: giulianob in \#gcc\ > > Email: [`giuliano.belina...@usp.br`](mailto:giuliano.belina...@usp.br)\ > > Github: <https://github.com/giulianobelinassi/>\ > > > > About Me: Computer Science Bachelor (University of São Paulo), currently > > pursuing a Masters Degree in Computer Science at the same institution. > > I've always been fascinated by topics such as High-Performance Computing > > and Code Optimization, having worked with a parallel implementation of a > > Boundary Elements Method software in GPU. I am currently conducting > > research on compiler parallelization and developing the > > [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc) project, having > > already presented it in [GNU Cauldron > > 2019](https://www.youtube.com/watch?v=jd6R3IK__1Q). > > > > **Skills**: Strong knowledge in C, Concurrency, Shared Memory > > Parallelism, Multithreaded Debugging and other typical programming > > tools. > > > > Brief Introduction > > > > In [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc), we showed that > > parallelizing the Intra Procedural optimizations improves speed when > > compiling huge files by a factor of 1.8x in a 4 cores machine, and also > > showed that this takes 75% of compilation time. > > > > In this project we plan to use the LTO infrastructure to improve > > compilation performance in the non-LTO case, with a tradeoff of > > generating a binary as good as if LTO is disabled. Here, we will > > automatically detect when a single file will benefit from parallelism, > > and proceed with the compilation in parallel if so. > > > > Use of LTO > > > > The Link Time Optimization (LTO) is a compilation technique that allows > > the compiler to analyse the program as a whole, instead of analysing and > > compiling one file at time. Therefore, LTO is able to collect more > > information about the program and generate a better optimization plan. > > LTO is divided in three parts: > > > > - *LGEN (Local Generation)*: Each file is translated to GIMPLE. This > > stage runs sequentially in each file and, therefore, in parallel in > > the project compilation. > > > > - *WPA (Whole Program Analysis)*: Run the Inter Procedural Analysis > > (IPA) in the entire program. This state runs serially in the > > project. > > > > - *LTRANS (Local Transformation)*: Execute all Intra Procedural > > Optimizations in each partition. This stage runs in parallel. > > > > Since WPA can bottleneck the compilation because it runs serially in the > > entire project, LTO was designed to produce faster binaries, not to > > produce binaries fast. > > > > Here, the proposed use of LTO to address this problem is to run the IPA > > for each Translation Unit (TU), instead in the Whole Program, and > > automatically detect when to partition the TU into multiple LTRANS to > > improve performance. The advantage of this approach is: > > "to improve compilation performance" > > > - It can generate binaries as good as when LTO is disabled. > > > > - It is faster, as we can partition big files into multiple partitions > > and compile these partitions in parallel > > > > - It can interact with GNU Make Jobserver, improving CPU utilization. > > The previous already improves CPU utilization, I guess GNU make jobserver > integration avoids CPU overcommit. > > > Planned Tasks > > > > I plan to use the GSoC time to develop the following topics: > > > > - Week \[1, 3\] -- April 27 to May 15:\ > > Update `cc1`, `cc1plus`, `f771`, ..., to partition the data after > > IPA analysis directly into multiple LTRANS partitions, instead of > > generating a temporary GIMPLE file. > > To summarize in my own words: > > After IPA analysis partition the CU into possibly multiple LTRANS > partitions even for non-LTO compilations. Invoke LTRANS compilation > for partitions 2..n without writing intermediate IL through mechanisms > like forking. > > I might say that you could run into "issues" here with asm_out_file > already opened and partially written to. Possibly easier (but harder > on the driver side) would be to stream LTO LTRANS IL for partitions > 2..n and handle those like with regular LTO operation. But I guess > I'd try w/o writing IL first and only if it turns out too difficult > go the IL writing way. Ok. I changed the application text based on that. > > > - Week \[4, 7\] -- May 18 to June 12:\ > > Update the `gcc` driver to take these multiple LTRANS partitions, > > then call the compiler and assembler for each of them, and merge the > > results into one object file. Here I will use the LTO LTRANS object > > streaming, therefore it should interact with GNU Make Jobserver. > > Hmm, so if you indeed want to do that as second step the first step > would still need driver modifications to invoke the assembler. I think > in previous discussions I suggested to have the driver signal cc1 and > friends via a special -fsplit-tu-to-asm-outputs=<tempfile> argument that > splitting is desirable and that the used output assembler files should > be written to <tempfile> so the driver can pick them up for assembling > and linking. > > You also miss the fact that the driver also needs to invoke the linker > to merge the N LTRANS objects back to one. Actually I tried to avoid entering in such technical detail here, but it indeed makes the proposal more concrete. > > I suggest you first ignore the jobserver and try doing without > LTRANS IL streaming. I think meanwhile lto1 got jobserver support > for the WPA -> LTRANS streaming so you can reuse that for jobserver > aware "forking" (and later assembling in the driver). Using > a named pipe or some other mechanism might also allow to pick up > assembler output for the individual units as it becomes ready rather > than waiting for the slowest LTRANS unit to finish compiling. Ok. I updated the proposal with this information. > > > - Week 8 -- June 15 to 19: **First Evaluation**\ > > Deliver a non-optimized version of the project. Some programs ought > > to be compiled correctly, but probably there will be a huge overhead > > because so far there will not be any criteria about when to > > partition. Some tests are also planned for this evaluation. > > > > - Week \[9, 11\] -- June 22 to July 10:\ > > Implement a criteria about when to partition, and interactively > > improve it based on data. > > I think this should be already there (though we error on the side of > generating "more" partitions). For non-LTO parallelizing operation > we maybe want to tune the various --params that are available > though (lto-min-partition and lto-partitions). This is interesting. Here at the Lab we have a student which developed experiment design model to predict how some parameters can impact in the final result. Could you please give me more details about these parameters? > > So I'd suggest to concentrate on the jobserver integration for the > second phase? I just changed the proposal to focus in jobserver integration at this stage. > > Otherwise the proposal looks good and I'm confident we can deliver > something that will be ready for real-world usage for GCC 11! Thank you :) Giuliano. > > Thanks, > Richard. > > > - Week 12 -- July 13 to 17: **Second Evaluation**\ > > Deliver a more optimized version of the project. Here we should > > filter files that would compile fast from files that would require > > partitioning, and therefore we should see some speedup. > > > > - Week \[13, 15\] -- July 20 to August 10:\ > > Develop adequate tests coverage and address unexpected issues so > > that this feature can be merged to trunk for the next GCC release. > > > > - Week 16: **Final evaluation**\ > > Deliver the final product as a series of patches for trunk. > > > > > > Thank you > > Giuliano. > > > > -- > Richard Biener <rguent...@suse.de> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)