[Bug binutils/11843] New: ld long link times due to compute_bucket_count() choosing hash table size
In bfd/elflink.c, in compute_bucket_count(), there is a $\Theta(n^2)$ loop (with n=nsyms) to find a good hash table size. This is okay for n=50, but we have links taking 15 minutes for a library with n=162099 symbols. The code in compute_bucket_count() is a bit pointless for large values of n. Perhaps a different strategy for n > 1024 is needed, e.g., just return size nsyms/2, or choose the best of (say) sizes n0...n0+k where n0 is the target size (with k=4 for example, just to eliminate the possibility that the first choice is astronomically unlucky). -- Summary: ld long link times due to compute_bucket_count() choosing hash table size Product: binutils Version: 2.21 (HEAD) Status: NEW Severity: normal Priority: P2 Component: binutils AssignedTo: unassigned at sources dot redhat dot com ReportedBy: todd dot veldhuizen at logicblox dot com CC: bug-binutils at gnu dot org http://sourceware.org/bugzilla/show_bug.cgi?id=11843 --- You are receiving this mail because: --- You are on the CC list for the bug, or are watching someone who is. ___ bug-binutils mailing list bug-binutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-binutils
[Bug binutils/11843] ld long link times due to compute_bucket_count() choosing hash table size
--- Additional Comments From todd dot veldhuizen at logicblox dot com 2010-07-28 22:05 --- Here is a C++ program that will create a bunch of .c files, compile and then link them, hitting the bug. It links very quickly without the -O1 option, and with the -O1 option it takes a very long time. --- cut-here #include #include #include #include int main() { const int num_symbols = 199933; const int ref_symbol_delta = 37199; const int num_files = 25; char tbuf[128]; srand(1); std::string objfiles; int ref_symbol = 0; int i = 0; for (int j=0; j < num_files; ++j) { int symbols_ceil = (num_symbols*(j+1))/num_files; sprintf(tbuf, "many_symbols%03d.c", j+1); FILE* dst = fopen(tbuf, "w"); std::map sites; for (; i < symbols_ceil; ++i) { ref_symbol = (ref_symbol + ref_symbol_delta) % num_symbols; sites[i] = ref_symbol; fprintf(dst, "void foo%d();\n", ref_symbol); } for (std::map::iterator iter = sites.begin(); iter != sites.end(); ++iter) { fprintf(dst, "void foo%d() { foo%d(); }\n", iter->first, iter->second); } fclose(dst); sprintf(tbuf, "gcc -c -fPIC many_symbols%03d.c -o many_symbols%03d.o", j+1, j+1); printf("%s\n", tbuf); system(tbuf); sprintf(tbuf, "many_symbols%03d.o", j+1); objfiles += ((j > 0) ? " " : "") + std::string(tbuf); } system("time ld -O1 many_symbols???.o -shared -o libxxx.so"); } --- cut-here -- http://sourceware.org/bugzilla/show_bug.cgi?id=11843 --- You are receiving this mail because: --- You are on the CC list for the bug, or are watching someone who is. ___ bug-binutils mailing list bug-binutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-binutils
[Bug binutils/11843] ld long link times due to compute_bucket_count() choosing hash table size
--- Additional Comments From todd dot veldhuizen at logicblox dot com 2010-07-28 22:09 --- Here some tracing for compute_bucket_count() that illustrates the futility of searching large numbers of hash table sizes. Index: bfd/elflink.c === RCS file: /cvs/src/src/bfd/elflink.c,v retrieving revision 1.372 diff -r1.372 elflink.c 31a32 > #include 5345a5347,5354 > > static double wallTime(void) > { >struct timeval t; >gettimeofday(&t, 0); >return t.tv_sec + t.tv_usec/100.0; > } > 5361a5371,5373 > double startTime; > > startTime = wallTime(); 5368a5381 > long insert_count; 5377a5391,5392 > insert_count=0; > 5419a5435,5436 > insert_count += nsyms; > 5458a5476,5481 > printf("elapsed=%7.4lfs size=%ld max=%lf (%lf%%) after %ld inserts\n", > wallTime() - startTime, > (long)i, > (double)max, ((double)max-best_chlen)/best_chlen*100.0, > insert_count); > insert_count=0; 5461a5485,5491 > else if (insert_count > 10L) > { > printf("elapsed=%7.4lfs No improvement after %ld inserts\n", >wallTime() - startTime, >insert_count); > insert_count=0; > } Its output for the test case, on my machine, shows that after trying the first three hash table sizes, no improvements of more than 1% occur, and after the first 200 hash table sizes, no improvements whatsoever occur. -bash-3.2$ ld -O1 many_symbols*.o -shared -o libxxx.so elapsed= 0.0045s size=99984 max=43186095512.00 (-100.00%) after 399936 inserts elapsed= 0.0087s size=99985 max=34095909512.00 (-21.048872%) after 399936 inserts elapsed= 0.0129s size=99986 max=33926687032.00 (-0.496313%) after 399936 inserts elapsed= 0.0170s size=99987 max=33836351808.00 (-0.266266%) after 399936 inserts elapsed= 0.0586s size=7 max=33830992776.00 (-0.015838%) after 3999360 inserts elapsed= 0.0628s size=8 max=33828995144.00 (-0.005905%) after 399936 inserts elapsed= 0.0836s size=13 max=33823059872.00 (-0.017545%) after 1999680 inserts elapsed= 0.1293s size=100014 max=33802046320.00 (-0.062128%) after 4399296 inserts elapsed= 0.1625s size=100022 max=33791424296.00 (-0.031424%) after 3199488 inserts elapsed= 0.1833s size=100027 max=33740062104.00 (-0.151998%) after 1999680 inserts elapsed= 0.2167s size=100035 max=33725848184.00 (-0.042128%) after 3199488 inserts elapsed= 0.4254s size=100085 max=33716109728.00 (-0.028875%) after 19996800 inserts elapsed= 0.5258s size=100109 max=33682995136.00 (-0.098216%) after 9598464 inserts elapsed= 0.6515s size=100139 max=33669972112.00 (-0.038663%) after 11998080 inserts elapsed= 0.7270s size=100157 max=33644252600.00 (-0.076387%) after 7198848 inserts elapsed=11.1804s No improvement after 1000239936 inserts elapsed=21.6568s No improvement after 1000239936 inserts elapsed=32.1744s No improvement after 1000239936 inserts elapsed=42.7346s No improvement after 1000239936 inserts elapsed=53.3344s No improvement after 1000239936 inserts elapsed=63.9556s No improvement after 1000239936 inserts elapsed=74.6303s No improvement after 1000239936 inserts elapsed=85.3426s No improvement after 1000239936 inserts elapsed=96.0896s No improvement after 1000239936 inserts elapsed=106.8647s No improvement after 1000239936 inserts elapsed=117.6750s No improvement after 1000239936 inserts elapsed=128.5248s No improvement after 1000239936 inserts elapsed=139.4285s No improvement after 1000239936 inserts elapsed=150.3337s No improvement after 1000239936 inserts elapsed=161.2662s No improvement after 1000239936 inserts elapsed=172.2233s No improvement after 1000239936 inserts elapsed=183.2224s No improvement after 1000239936 inserts elapsed=194.2606s No improvement after 1000239936 inserts elapsed=205.3407s No improvement after 1000239936 inserts elapsed=216.4314s No improvement after 1000239936 inserts elapsed=227.5811s No improvement after 1000239936 inserts elapsed=238.7598s No improvement after 1000239936 inserts elapsed=249.9598s No improvement after 1000239936 inserts etc. -- http://sourceware.org/bugzilla/show_bug.cgi?id=11843 --- You are receiving this mail because: --- You are on the CC list for the bug, or are watching someone who is. ___ bug-binutils mailing list bug-binutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-binutils
[Bug binutils/11843] ld long link times due to compute_bucket_count() choosing hash table size
--- Additional Comments From todd dot veldhuizen at logicblox dot com 2010-07-28 22:23 --- Here is a conservative patch. Index: bfd/elflink.c === RCS file: /cvs/src/src/bfd/elflink.c,v retrieving revision 1.372 diff -u -r1.372 elflink.c --- bfd/elflink.c 14 Apr 2010 08:29:33 - 1.372 +++ bfd/elflink.c 28 Jul 2010 22:21:53 - @@ -5374,6 +5374,7 @@ const struct elf_backend_data *bed = get_elf_backend_data (dynobj); unsigned long int *counts; bfd_size_type amt; + unsigned int no_improvement_count = 0; /* Possible optimization parameters: if we have NSYMS symbols we say that the hashing table must at least have NSYMS/4 and at most @@ -5458,7 +5459,10 @@ { best_chlen = max; best_size = i; + no_improvement_count = 0; } + else if (++no_improvement_count == 100) +break; } free (counts); -- http://sourceware.org/bugzilla/show_bug.cgi?id=11843 --- You are receiving this mail because: --- You are on the CC list for the bug, or are watching someone who is. ___ bug-binutils mailing list bug-binutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-binutils