ChuanqiXu added a comment.

In D119409#3332313 <https://reviews.llvm.org/D119409#3332313>, @dblaikie wrote:

> (maybe relevant: For what it's worth: I originally implemented inline 
> function homing in modules codegen for Clang Header Modules - the results I 
> got for object file size in an -O0 build were marginal - a /slight/ win in 
> object file size, but not as much as we might've expected. Part of the reason 
> might be that there can be inline functions that are never called, or at 
> higher optimization levels, inline functions that always get inlined (via 
> "available externally" definitions) - in that case, providing "homed" 
> definitions creates inline function definitions that are unused during 
> linking/a waste of space. It's possible the workload I was dealing with 
> (common Google internal programs) skewed compared to broader C++ code - for 
> instance heavy use of protobufs could be leading to a lot of generated 
> code/inline functions that are mostly unused. I didn't iterate further to 
> tweak/implement heuristics about which inline functions should be homed. I'm 
> not sure if Richard Smith made a choice about not homing inline functions in 
> C++20 modules because of these results, or for other reasons, or just as a 
> consequence of the implementation - but given we had the logic in Clang to do 
> inline function homing for Clang Header Modules, I'm guessing it was an 
> intentional choice to not use that functionality in C++20 modules when they 
> have to have an object file anyway)

Thanks for sharing this. I didn't consider code size before. I agree the result 
should depends on the pattern of the program. I guess the code size may 
increase or decrease between different projects.

> Richard and I discussed taking advantage of this kind of new home location, 
> certainly for key-less polymorphic classes. I was against it as it was more 
> work :) Blame me.

From my experience, it depends on how well we want to implement. A plain 
implementation is relatively easy. It would speed up the compilation 
significantly in **O0**. But it doesn't work well with optimization turned on, 
since we need to do optimization and we must import all the function  by 
`available_externally` to enable a complete optimization. In this case (with 
optimization), we could only save the time for Preprocessor, Parser, Semantic 
analysis and backend. But the big part of compilation takes on the middle end 
and we need to pay for the serialization and deserialization. My experiment 
shows that we could only get 5% improvement on compilation time with named 
module in optimization turned on. (We could offer an option to make it compile 
fast at On by not emitting functions in other module unit, but it would hurt 
the performance obviously).

A good implementation may attach optimized IR to the PCM files. Since the 
standard talks nothing about CMI/BMI (PCM), we are free to compile it during 
the middle end and attach the optimized IR to PCM files. And when we imports 
the optimized PCM, we could extract the optimized function on need. We could 
mark such functions with a special attribute (like 
'optimized_available_externally'?) to tell the compiler not optimize it and 
delete it after middle end optimization gets done. Such functions is only 
available for inlining (or any other IPO). I think this wouldn't hurt 
performance and we could get a win for the compilation speed. But I agree this 
is not easy to implement.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119409/new/

https://reviews.llvm.org/D119409

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to