Re: GSOC 2018 - Textual LTO dump tool project

Martin Liška Wed, 28 Feb 2018 02:36:40 -0800

On 02/25/2018 10:46 AM, Martin Jambor wrote:
> Hello Hrishikesh,
> 
> I apologize for replying to you this late, this has been a busy week
> and now I am traveling.
> 
> On Mon, Feb 19 2018, Hrishikesh Kulkarni wrote:
>> Hi,
>>
>> I am Hrishikesh Kulkarni currently studying as an undergrad student in
>> Computer Engineering at Pune University, India. I find compilers quite
>> interesting as a subject,  and would like to apply to GSoC to gain some
>> understanding of how real-world compilers work. So far, I have managed to
>> build gcc and perform some simple tweaks to the codebase. In particular, I
>> would like to apply to the Textual LTO dump tool project.
>>
> 
> I must say I am impressed by the research you have already done.
> Nevertheless, please note that Ray Kim has also expressed interest in
> the project.  Martin Liska will be the mentor, so I will let him drive
> the selection process.  On the other hand, Ray also liked another
> project, so maybe he will pick that and everyone will be happy.


Hello.

I'm really happy that there are multiple volunteers that want to work on LTO 
dump
tool project. According to what I've took a look I would like to have Hrishikesh
working on the project. He's got experience with C, C++ and also with Python 
language
that can be well used for prototyping. Apart from that he's spent quite some 
time
with investigation of LTO internals in GCC.

That said, may I please ask other candidates to seek for a different GSoC 
project
we offered? I believe the other topics are also interesting and important
for the project.

> 
>> As far as I understand, the motivation for LTO framework was to enable
>> cross file interprocedural optimizations, and for this purpose an ipa pass
>> is divided into following three stages:
>>
>>    1.
>>
>>    LGEN: The pass does a local analysis of the function and generates a
>>    “summary”, ie, the information relevant to the pass and writes it to LTO
>>    object file.
> 
> A pass might do that, but the output of the whole stage is not just the
> pass summaries, it also writes the function IL (the function gimple
> statements, above all) to the object file.
> 
>>    2.
>>
>>    WPA: The LTO object files are given as input to the linker, which then
>>    invokes the lto1 frontend to perform global ipa analysis over the
>>    call-graph and write optimized summaries to LTO object files
>>    (partitioning). The global ipa analysis is done over summary and not the
>>    actual function bodies.
> 
> Well... note that partitioning actually means dividing the whole
> compiled program/library into chunks that are then compiled
> independently in the LTRANS stage.  But you are basically right that WPA
> does also do whole-program analysis based on summaries and then writes
> its decisions to optimization summaries, yes.
> 
>>    3.
> 
>>
>>    LTRANS: The partitions are read back, and the function bodies are
>>    reconstructed from summary and are then compiled to produce real object
>>    files.
> 
> Function bodies and the summaries are distinct things.  The body
> consists of gimple statements and all the associated stuff (such as
> types, so a lot of stuff), whereas when we refer to summaries, we mean
> small chunks of data that interprocedural optimizations such as inlining
> or IPA-CP scurry away because they cannot feasibly work on bodies of the
> entire program.
> 
> But apart from this terminology issue, you are basically correct, at the
> LTRANS stage, IPA passes apply transformations to the bodies according
> to the optimization summary generated by the WPA phase.  And then, all
> normal, intra-procedural passes and code generation runs.
> 
>>
>>
>> If I understand correctly, the motivation for textual LTO dump tool is to
>> easily analyze contents of LTO object file, similar to readelf or objdump ?

Yes. Richi in previous email defined how that could be done.

> 
> That is how I understand it too, but Martin may have some further uses
> in mind.
> 
>>
>> Assume that LTO object file contains in pureconst section: 0b0110 (0b for
>> binary prefix) corresponding to values of fs->pure_const_state and
>> fs->state_previously_known.
>>
>> If I understand correctly, the output of dump tool should then be:
>>
>> pure_const pass:
>>
>> pure_const_state = IPA_PURE (enum value of pure_const_state_e corresponding
>> to 0b01)
>>
>> state_previously_known = IPA_NEITHER (enum value of pure_const_state_e
>> corresponding to 0b10)
>>
>> Is this the expected output of the dump tool ?
> 
> I think the tool would have to a bit more than just dumping summaries of
> IPA passes.  I tend to think that the task should also include dumping
> gimple bodies (but we already do that in GCC and so it should be mostly
> easy) and also of types (that are merged as one of the first steps of
> WPA and interesting things happen when mergingit does something
> "interesting").  And perhaps quite a bit more.  Martin?

Yes, as we transitioned to early-debug info in LTO mode, printing tree types
that reside in LTO stream would help us to reduce the stream in the future.

> 
>>
>> I am reasonably familiar working with C, C++ and python. My prior
>> experience includes opportunities to work in areas of NLP. Some of my
>> accomplishments in the area include presenting project VicharDhara- A
>> thought Mapper that was selected among top five ideas in Accenture
>> Innovation Challenge among 7000 nationwide entries. My paper on this topic
>> won the best paper award in IEEE Conference ICCUBEA-2017. My previous work
>> was focused on simple parsers, student psychology, thought process
>> detection for team selection.
> 
> Interesting, congratulations.
> 
>>
>> In the interim, I have been through a few docs on GCC and LTO [1][2][3] and
>> am trying to write a toy ipa pass to better understand LTO/IPA
>> infrastructure. 
> 
> Great, I believe that's exactly what my advice would be
> 
>> I would be grateful for feedback on the textual LTO dump
>> tool.
> 
> I hope that Martin will shed a bit more light on what output he
> envisions the tool to have.  I will talk to him about it too when I get
> back to the office (so maybe on Tuesday but probably on Wednesday).

As mentioned above it was mentioned by Richard. First step would be to provide
write-only mode, where lto-dump will only provide verbose information usable
for debugging.

One another topic is current LTO dumping infrastructure. I know Honza does not
like the interface. Maybe it can be improved in respect to bitpack_d and maybe
some generalization can be done. Honza?

Thanks,
Martin

> 
> Thanks,
> 
> Martin
> 
> 
> 
>>
>> [1] http://www.ucw.cz/~hubicka/slides/labs2013.pdf
>>
>> [2] https://gcc.gnu.org/wiki/LinkTimeOptimizatio
>> <https://gcc.gnu.org/wiki/LinkTimeOptimization>
>>
>> [3] https://gcc.gnu.org/onlinedocs/gccint/LTO-Overview.html
>>
>> My two recent publications are listed below:
>>
>> [A] Hrishikesh Kulkarni, "Contextual Data Representation Using Prime Number
>> Route Mapping Method and Ontology" IEEE Conference, ICCUBEA, 2017
>>
>> [B] Hrishikesh Kulkarni, “Multi-Graph based Intent Hierarchy Generation to
>> Determine Action Sequence”, Springer Conference, ICDECT, December 2017, Pune
>>
>> Thanks,
>>
>> Hrishikesh Kulkarni

Re: GSOC 2018 - Textual LTO dump tool project

Reply via email to