On Thu, Oct 3, 2019 at 12:46 AM Fangrui Song <i...@maskray.me> wrote: > > > On 2019-10-03, Andrew Pinski wrote: > >On Wed, Oct 2, 2019@9:52 PM Fangrui Song <i...@maskray.me> wrote: > >> > >> On 2019-09-24, Martin Liška wrote: > >> >On 9/19/19 10:33 AM, Martin Liška wrote: > >> >> - One needs modified binutils and I that would probably require a > >> >> configure detection. The only way > >> >> which I see is based on ld --version. I'm planning to make the > >> >> binutils submission soon. > >> > > >> >The patch submission link: > >> >https://sourceware.org/ml/binutils/2019-09/msg00219.html > >> > >> Hi Martin, > >> > >> I have a question about why .text.sorted.* are needed. > >> > >> The Sony presentation (your [2] link) embedded a new section > >> .llvm.call-graph-profile[3] to represent edges in the object files. The > >> linker (lld) collects all .llvm.call-graph-profile sections and does a > >> C3 layout. There is no need for new section type .text.sorted.* > >> > >> [3]: > >> https://github.com/llvm/llvm-project/blob/master/lld/test/ELF/cgprofile-obj.s > >> > >> (Please CC me. I am not subscribed.) > > > >The idea of GCC's version is to that the modification needed to the > >linker is very little. And even without patching the linker > >script/linker, you don't need to much special and you don't need LD to > >do much work@all. > > I am afraid this can be a large limitation. Then .text.sorted.* can > only a) reorder functions within a translation unit, b) or reorder all > functions when LTO is enabled.
I don't think it is that limited. In fact I think the other way around is more limited as you are now limited on changing the call graph data format. This reduces the complexity of the linker really. Linkers are already slow; why make them even slower? This patch is you only need to change the linker once and you don't have dependencies between the linker and compiler. Thanks, Andrew > > b) is possible only if all .text.sorted.* sections can be numbered. > > For the LLVM case, call graph profile can be used without LTO. When both > ThinLTO+PGO are enabled, however, the additional performance improvement > offered by the global call graph profile reordering is insignificant, > smaller than 1%.See you said it was less than 1% so why do it then? Thanks, Andrew Pinski