https://sourceware.org/bugzilla/show_bug.cgi?id=33530
Bug ID: 33530
Summary: ld --gc-sections is quadratic-slow on input with huge
list of sections (20K)
Product: binutils
Version: 2.45
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: ld
Assignee: unassigned at sourceware dot org
Reporter: slyich at gmail dot com
Target Milestone: ---
It's reproducible on x86_64-linux target, but I expect the problem to be more
generic.
I'll start from the reproducer:
$ printf "int f_0() __attribute__ ((section (\".text.0\"))); int f_0() { return
0; };\n" > main.c; for (( i=1; i<20000; i++ )); do printf "int f_$i()
__attribute__ ((section (\".text.$i\"))); int f_$i() { return f_$((i-1))();
};\n"; done >> main.c; printf "int main() { return f_19999(); }" >> main.c; gcc
-O0 -c main.c -o main.o; echo "bfd:"; time gcc main.o -o main -fuse-ld=bfd
-Wl,--gc-sections; echo "gold:"; time gcc main.o -o main -fuse-ld=gold
-Wl,--gc-sections
bfd:
real 0m5,175s
user 0m2,586s
sys 0m2,575s
gold:
real 0m0,049s
user 0m0,036s
sys 0m0,014s
Note how faster `gold` is in this case. I hope `bfd` can be fixed to be on par
with it.
IN the example we generate a function per section and build a chain references:
$ head -n 5 main.c
int f_0() __attribute__ ((section (".text.0"))); int f_0() { return 0; };
int f_1() __attribute__ ((section (".text.1"))); int f_1() { return f_0(); };
int f_2() __attribute__ ((section (".text.2"))); int f_2() { return f_1(); };
int f_3() __attribute__ ((section (".text.3"))); int f_3() { return f_2(); };
int f_4() __attribute__ ((section (".text.4"))); int f_4() { return f_3(); };
$ tail -n 5 main.c
int f_19996() __attribute__ ((section (".text.19996"))); int f_19996() { return
f_19995(); };
int f_19997() __attribute__ ((section (".text.19997"))); int f_19997() { return
f_19996(); };
int f_19998() __attribute__ ((section (".text.19998"))); int f_19998() { return
f_19997(); };
int f_19999() __attribute__ ((section (".text.19999"))); int f_19999() { return
f_19998(); };
int main() { return f_19999(); }
>From what I understand staring at the profiles the trigger here is
`-Wl,--gc-sections`. Otherwise `bfd` is as fast.
Where the time I think is spent: `_bfd_elf_gc_mark / _bfd_elf_gc_mark_reloc`
finds out unused sections for removal:
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=bfd/elflink.c;h=91c77c211ef065a77883004eb696adacd92a00be;hb=815d9a14cbbb3b81843f7566222c87fb22e7255d#l14063
```c
bool
_bfd_elf_gc_mark_reloc (struct bfd_link_info *info,
asection *sec,
elf_gc_mark_hook_fn gc_mark_hook,
struct elf_reloc_cookie *cookie)
{
asection *rsec;
bool start_stop = false;
rsec = _bfd_elf_gc_mark_rsec (info, sec, gc_mark_hook, cookie, &start_stop);
while (rsec != NULL)
{
if (!rsec->gc_mark)
{
if (bfd_get_flavour (rsec->owner) != bfd_target_elf_flavour
|| (rsec->owner->flags & DYNAMIC) != 0)
rsec->gc_mark = 1;
else if (!_bfd_elf_gc_mark (info, rsec, gc_mark_hook))
return false;
}
if (!start_stop)
break;
rsec = bfd_get_next_section_by_name (rsec->owner, rsec);
}
return true;
}
```
This `while()` loop looks suspicious. If we do such marking for each section it
probably has quadratic complexity.
The performance drop is picked from a real example: such huge section count is
typical for binaries built by haskell GHC compiler in `-split-sections` mode:
https://downloads.haskell.org/ghc/9.12.2/docs/users_guide/phases.html#ghc-flag-fsplit-sections
I think it's similar to `-ffunction-sections` `gcc` option at least in spirit.
`gold` seems to work quite a bit faster on such inputs. While `bfd` uses a lot
of RAM and gets slower and slower with the increase of functions I put to the
file.
This was noticed in
https://discourse.nixos.org/t/removing-gold-from-nixpkgs/70496 where `nixpkgs`
considers switching from `gold` linker as `gold` is deprecated for removal.
Switching to `bfd` so far looks like a big link-time regression: on larger
inputs it's extra minutes of linking on individual binaries.
I hope that `bfd` could be fixed not to degrade as much on such pathological
inputs.
Thanks!
--
You are receiving this mail because:
You are on the CC list for the bug.