Thanks for the suggestions. I have captured a bunch more data and hence the delay. First I have started by reproducing the behavior with 4.2.1:
gmake realclean >& cleanlog.txt /usr/bin/gmake-4.2.1 -j32 -Otarget dirsparallel=all release >& gmake421dpa32.log; date Tue Jul 19 11:24:47 EDT 2016 Tue Jul 19 20:43:17 EDT 2016 gmake realclean >& cleanlog.txt /usr/bin/gmake-4.2.1 -j32 -Onone dirsparallel=all release > & gmake421dpn32.log ; date Tue Jul 19 21:01:48 EDT 2016 Tue Jul 19 22:05:40 EDT 2016 Essentially the same as 4.1. This –Otarget run is not just “a little slower,” it is pathologically slow. The 9 hour build time is slower than if I had run the build with no parallelization whatsoever. (The –Onone time is a bit faster than what I see doing parallelization at just the leaf level) Our build is a relatively straight-forward “choreographed” build as followed: Top-Level Makefile Makefile.libs – This is a list of directories containing C++ code getting built into different relocatable libraries (one per dir) (sub-directory Makefiles) Mikefile.bins – This is a list of directories that link executables (sub-directory Makefiles) There are O(120) libraries to build, each of which ranges from 10-50 objects. There are about 30 executables to build. What “dirsparallel=all” does is to turn on parallel make for Makefile.libs, Makefile.dirs, etc. The actual build is more complicated, having codegen, unit test, lint, and packaging activities, but it follows this simple model. The defining characteristic here is a wide fan-out. There are a large number of relatively simple and brief makefiles. There will therefore be a few “middle management” makefiles that are waiting on children to complete (I imagine each of these occupies a make job??). I have confirmed that my count of “g++” invocations is exactly the same for the two runs (4403) The TMPDIR variable was not set. I set it to /tmp to insure that it is on a physical device. (The build and our home directories are NFS mounted) The total elapsed time for the “-Otarget” run was then 6 hours. This is enough of a reduction to be a little suspicious, but not much more. I ran with strace for both the “none” case and the “target” case. The summaries are attached. Unfortunately, I don’t think there is a smoking gun here. I have also run the “-Otrace” case at a few different job-levels: j8 – 6:34 j32 – 6:00 j256 – 9:49 From: David Boyce [mailto:david.s.bo...@gmail.com] Sent: Sunday, July 17, 2016 5:03 PM To: Gardell, Steven <sgard...@sonusnet.com> Cc: psm...@gnu.org; Bug-make@gnu.org Subject: Re: Parallel builds across makefiles A couple of suggestions: 1. Check your value of TMPDIR if any. All -O is doing is redirecting output into a temp file and dumping it later. Effectively it turns command "foo" into "foo > $TMPDIR/blah; cat $TMPDIR/blah; unlink $TMPDIR/blah". This is why it seems almost impossible for it to slow things down appreciably, and a slow temp dir device might be one explanation. Along similar lines you could try "TMPDIR=/tmp make ..." 2. Try strace. It has a mode which will timestamp each system call and another which prints a table of system calls used and how much time it took. One of these will probably be instructive, especially when compared with the same without -O. David On Sun, Jul 17, 2016 at 12:22 PM, Gardell, Steven <sgard...@sonusnet.com<mailto:sgard...@sonusnet.com>> wrote: No I have not gotten to 4.2.1 yet. I have experimented using j values ranging from 16 to 64. The majority of the testing is with -Otarget. But I have tried all of them. -----Original Message----- From: Paul Smith [mailto:psm...@gnu.org<mailto:psm...@gnu.org>] Sent: Sunday, July 17, 2016 3:14 PM To: Gardell, Steven <sgard...@sonusnet.com<mailto:sgard...@sonusnet.com>>; Bug-make@gnu.org<mailto:Bug-make@gnu.org> Subject: Re: Parallel builds across makefiles On Sun, 2016-07-17 at 15:48 +0000, Gardell, Steven wrote: > OK. Thanks! I will try the latest version. > > FWIW, I am measuring is total wall clock time to complete the build. > (date; gmake... ; date) This goes up substantially with all sync modes > other than "none" if gmake has parallel invocations of sub-makes going > (e.g. "gmake -f SomeOtherMakefile"). When I am only parallelizing > individual makefiles (with ACE, these tend to be simple lists of files > to compile), then the behavior is as promised by the documentation. Assuming you determine the behavior exists even in version 4.2.1, please provide more details. What level of -j are you using? Which sync mode are you using? Do you get the same behavior even with lower levels of -j, if yours is very large? _______________________________________________ Bug-make mailing list Bug-make@gnu.org<mailto:Bug-make@gnu.org> https://lists.gnu.org/mailman/listinfo/bug-make
gmake-4.2.1: *** [Makefile:333: release] Error 1 % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 93.92 48988.895003 281029 174320 42047 wait4 3.56 1857.549713 8133 228409 47431 futex 1.12 582.114172 10 59230554 48886748 stat 0.81 421.206442 11 37319045 29591972 open 0.40 209.945060 3 71120290 3458259 lstat 0.06 33.636522 2 13616492 4796 read 0.02 11.818771 1 8065605 14388 close 0.02 9.514375 1 7618567 fstat 0.01 7.582567 2 3842736 munmap 0.01 7.392948 1 5568114 mmap 0.01 7.124394 92 77606 sched_yield 0.01 2.657384 12 213754 getdents 0.00 2.140097 3 847396 371561 access 0.00 2.071326 1 3078031 write 0.00 1.887237 4 538715 3218 fcntl 0.00 1.749332 1 1578161 brk 0.00 1.286288 0 8765372 5684 lseek 0.00 1.036779 93 11125 pwrite 0.00 0.898698 2 593430 mprotect 0.00 0.840105 0 1990271 rt_sigprocmask 0.00 0.712259 3 243835 137443 execve 0.00 0.699540 11 65687 14 vfork 0.00 0.682775 1 964050 28 rt_sigaction 0.00 0.611575 273 2237 symlink 0.00 0.428026 67 6358 3569 mkdir 0.00 0.413624 2 189901 584 readlink 0.00 0.364815 10 37611 35308 ioctl 0.00 0.286657 1 192127 dup2 0.00 0.225329 112 2005 627 unlinkat 0.00 0.175723 16 10961 2 unlink 0.00 0.175075 1 161868 getuid 0.00 0.152934 1 160100 geteuid 0.00 0.144751 1 161788 getgid 0.00 0.141394 1 158100 getegid 0.00 0.140271 15 9374 3420 pselect6 0.00 0.110696 42 2650 627 newfstatat 0.00 0.105560 12 9017 ftruncate 0.00 0.095245 1 106117 arch_prctl 0.00 0.092465 1 70043 pipe 0.00 0.086211 1 80684 getrlimit 0.00 0.083629 69 1218 rename 0.00 0.077480 2 49361 pread 0.00 0.074814 1 50945 clone 0.00 0.066864 9 7545 poll 0.00 0.058354 1 85242 3614 rt_sigreturn 0.00 0.045559 1 65173 uname 0.00 0.036858 0 75027 setrlimit 0.00 0.036661 2 16581 madvise 0.00 0.030539 46 669 fchmodat 0.00 0.023980 45 538 chmod 0.00 0.022452 1 37443 getcwd 0.00 0.021574 2 8659 statfs 0.00 0.019853 5 3970 chdir 0.00 0.019411 1 33296 getpgrp 0.00 0.018386 2 8888 socket 0.00 0.015994 49 326 fchmod 0.00 0.015333 3 5238 sched_getaffinity 0.00 0.013102 0 34870 getpid 0.00 0.012723 0 33299 getppid 0.00 0.009763 11 907 906 connect 0.00 0.009151 29 321 fchown 0.00 0.008814 45 197 chown 0.00 0.008683 5 1706 1227 utimensat 0.00 0.008137 1 7948 7512 bind 0.00 0.007875 0 22244 mremap 0.00 0.007423 2 4418 getrusage 0.00 0.006924 1 9939 set_robust_list 0.00 0.005535 1 7513 sendto 0.00 0.005124 1 7977 set_tid_address 0.00 0.003982 0 9345 dup 0.00 0.003629 1 4886 getgroups 0.00 0.002308 8 279 utimes 0.00 0.001354 0 9256 setsockopt 0.00 0.001206 0 7513 recvfrom 0.00 0.001084 1 2142 umask 0.00 0.001003 0 5041 gettid 0.00 0.000981 36 27 openat 0.00 0.000950 3 321 getxattr 0.00 0.000643 18 36 faccessat 0.00 0.000183 0 2107 2107 getpeername 0.00 0.000091 0 493 57 getsockname 0.00 0.000075 0 478 fchdir 0.00 0.000020 0 71 clock_getres 0.00 0.000000 0 202 writev 0.00 0.000000 0 12 1 select 0.00 0.000000 0 3 alarm 0.00 0.000000 0 22 socketpair 0.00 0.000000 0 872 getsockopt 0.00 0.000000 0 6 kill 0.00 0.000000 0 3 creat 0.00 0.000000 0 6 sysinfo 0.00 0.000000 0 177 times 0.00 0.000000 0 4 sigaltstack 0.00 0.000000 0 5 5 fsetxattr 0.00 0.000000 0 5 5 fgetxattr 0.00 0.000000 0 1 linkat ------ ----------- ----------- --------- --------- ---------------- 100.00 52157.980642 227737277 82623160 total System call usage summary for 32 bit mode: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 66.72 49.342526 54764 901 wait4 29.96 22.157386 332 66751 7059 futex 1.23 0.911634 9 96806 65138 open 0.91 0.673931 0 2312182 write 0.46 0.340610 1 556057 read 0.19 0.139320 0 997680 _llseek 0.16 0.120858 4 31291 157 close 0.10 0.073414 2 30895 1751 stat64 0.07 0.052509 21 2538 856 unlink 0.05 0.037090 0 320275 gettimeofday 0.03 0.022479 1 28604 munmap 0.02 0.015609 1 12976 rt_sigaction 0.02 0.013604 27 508 rename 0.02 0.012859 0 60486 brk 0.01 0.009089 0 27988 old_mmap 0.01 0.008663 0 19261 fstat64 0.01 0.005501 1 5980 getpid 0.01 0.005287 1 4598 3014 lstat64 0.01 0.004514 5 925 execve 0.00 0.002443 0 5415 mmap2 0.00 0.001559 1 2299 times 0.00 0.000977 1 1283 time 0.00 0.000689 14 48 poll 0.00 0.000649 1 530 getcwd 0.00 0.000593 0 3954 clock_gettime 0.00 0.000563 0 2756 mprotect 0.00 0.000401 17 24 ftruncate 0.00 0.000327 0 901 fork 0.00 0.000322 1 332 lseek 0.00 0.000296 1 333 285 access 0.00 0.000197 0 1209 uname 0.00 0.000146 1 168 clone 0.00 0.000144 3 48 getrlimit 0.00 0.000115 0 456 sched_getaffinity 0.00 0.000096 0 456 rt_sigprocmask 0.00 0.000077 2 48 setsockopt 0.00 0.000073 2 48 sendto 0.00 0.000071 1 96 getdents64 0.00 0.000060 1 48 old_getrlimit 0.00 0.000048 0 312 sigaltstack 0.00 0.000043 2 24 24 mkdir 0.00 0.000038 0 96 madvise 0.00 0.000027 0 192 192 sched_setscheduler 0.00 0.000024 1 48 pread64 0.00 0.000018 0 48 set_tid_address 0.00 0.000000 0 24 setrlimit 0.00 0.000000 0 48 readlink 0.00 0.000000 0 48 getdents 0.00 0.000000 0 24 sched_getparam 0.00 0.000000 0 24 sched_getscheduler 0.00 0.000000 0 24 sched_yield 0.00 0.000000 0 72 getuid32 0.00 0.000000 0 48 getgid32 0.00 0.000000 0 72 geteuid32 0.00 0.000000 0 48 getegid32 0.00 0.000000 0 48 fcntl64 0.00 0.000000 0 216 gettid 0.00 0.000000 0 87 set_thread_area 0.00 0.000000 0 216 set_robust_list 0.00 0.000000 0 96 socket 0.00 0.000000 0 48 48 bind 0.00 0.000000 0 48 48 connect 0.00 0.000000 0 48 recvfrom ------ ----------- ----------- --------- --------- ---------------- 100.00 73.956879 4599113 78572 total
% time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 93.82 53268.050950 302081 176337 41624 wait4 3.18 1805.057765 7853 229844 48664 futex 1.06 599.730842 8 71119946 3458260 lstat 0.94 532.243350 14 37295656 29590966 open 0.73 416.897838 7 59083695 48774860 stat 0.09 50.603194 4 13780850 4946 read 0.04 21.309529 3 8034030 14388 close 0.03 17.913204 2 7594725 fstat 0.02 14.098519 3 5545626 mmap 0.02 10.953696 3 3820511 munmap 0.01 5.575016 75 74393 sched_yield 0.01 3.563169 4 847404 371559 access 0.01 3.497227 1 3051847 write 0.01 3.432796 0 8694798 5684 lseek 0.01 3.343298 2 1576613 brk 0.01 2.970389 14 213808 getdents 0.00 2.511414 1 1989877 rt_sigprocmask 0.00 2.191222 4 598520 mprotect 0.00 2.034284 2 963896 28 rt_sigaction 0.00 1.013485 4 243818 137443 execve 0.00 0.952604 86 11125 pwrite 0.00 0.642269 3 189873 584 readlink 0.00 0.574819 9 65681 32 vfork 0.00 0.536335 14 37671 35305 ioctl 0.00 0.532600 238 2237 symlink 0.00 0.479347 3 161899 getuid 0.00 0.448964 3 158131 getegid 0.00 0.432548 1 462935 1835 fcntl 0.00 0.425225 3 160131 geteuid 0.00 0.418771 65 6407 3618 mkdir 0.00 0.407211 3 161819 getgid 0.00 0.395323 3 121333 dup2 0.00 0.354416 3 106100 arch_prctl 0.00 0.265068 3 80654 getrlimit 0.00 0.254673 23 11070 4965 pselect6 0.00 0.222961 3 70009 pipe 0.00 0.206641 103 2005 627 unlinkat 0.00 0.171408 106 1622 2 unlink 0.00 0.161237 2 85127 5053 rt_sigreturn 0.00 0.151686 2 65149 uname 0.00 0.124745 102 1218 rename 0.00 0.113329 2 50950 clone 0.00 0.095758 36 2650 627 newfstatat 0.00 0.087777 2 49361 pread 0.00 0.068028 1 74988 setrlimit 0.00 0.051289 2 33287 getpgrp 0.00 0.049880 7 7545 poll 0.00 0.043773 3 16604 madvise 0.00 0.043232 1 33290 getppid 0.00 0.038585 4 8672 statfs 0.00 0.037752 1 34861 getpid 0.00 0.037340 8 4418 getrusage 0.00 0.034400 20 1703 1025 utimensat 0.00 0.032469 60 538 chmod 0.00 0.030341 45 669 fchmodat 0.00 0.026653 1 37441 getcwd 0.00 0.024115 6 4006 chdir 0.00 0.021799 67 326 fchmod 0.00 0.016171 2 9931 set_robust_list 0.00 0.013941 3 5234 sched_getaffinity 0.00 0.012925 3 4886 getgroups 0.00 0.011171 1 8888 socket 0.00 0.009249 1 7971 set_tid_address 0.00 0.008972 0 22551 mremap 0.00 0.008657 1 7948 7512 bind 0.00 0.008099 9 907 906 connect 0.00 0.007414 23 321 fchown 0.00 0.005842 3 2155 umask 0.00 0.004866 25 197 chown 0.00 0.004684 1 7513 sendto 0.00 0.003978 14 279 utimes 0.00 0.002132 79 27 openat 0.00 0.001911 6 321 getxattr 0.00 0.001591 3 478 fchdir 0.00 0.001353 38 36 faccessat 0.00 0.001293 0 9256 setsockopt 0.00 0.001183 0 7513 recvfrom 0.00 0.001000 333 3 creat 0.00 0.001000 250 4 sigaltstack 0.00 0.000967 0 5039 gettid 0.00 0.000439 4 107 ftruncate 0.00 0.000173 1 202 writev 0.00 0.000110 0 493 57 getsockname 0.00 0.000090 1 177 times 0.00 0.000079 0 2107 2107 getpeername 0.00 0.000016 0 872 getsockopt 0.00 0.000016 0 71 clock_getres 0.00 0.000000 0 12 1 select 0.00 0.000000 0 6 dup 0.00 0.000000 0 3 alarm 0.00 0.000000 0 22 socketpair 0.00 0.000000 0 6 kill 0.00 0.000000 0 6 sysinfo 0.00 0.000000 0 5 5 fsetxattr 0.00 0.000000 0 5 5 fgetxattr 0.00 0.000000 0 1 linkat ------ ----------- ----------- --------- --------- ---------------- 100.00 56776.115880 227365252 82512688 total System call usage summary for 32 bit mode: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 67.23 48.972573 54354 901 wait4 29.39 21.408221 314 68089 7725 futex 1.19 0.863531 9 96806 65138 open 0.94 0.682799 0 2312048 write 0.46 0.334636 1 556033 read 0.18 0.130719 0 997635 _llseek 0.17 0.122515 4 31291 157 close 0.11 0.077986 31 2538 856 unlink 0.10 0.073506 2 30895 1751 stat64 0.07 0.047623 0 320530 gettimeofday 0.03 0.022181 0 60615 brk 0.03 0.021145 1 28624 munmap 0.02 0.017376 34 508 rename 0.02 0.014333 1 12976 rt_sigaction 0.02 0.012528 0 27981 old_mmap 0.01 0.009627 0 19261 fstat64 0.01 0.008195 2 4598 3014 lstat64 0.01 0.005199 1 5416 mmap2 0.01 0.004555 1 5980 getpid 0.00 0.003601 1 4058 clock_gettime 0.00 0.002974 2 1209 uname 0.00 0.001965 1 2299 times 0.00 0.001364 1 1283 time 0.00 0.001354 1 925 execve 0.00 0.000944 0 2705 mprotect 0.00 0.000421 1 530 getcwd 0.00 0.000228 1 456 rt_sigprocmask 0.00 0.000209 3 72 getuid32 0.00 0.000181 0 456 sched_getaffinity 0.00 0.000163 1 168 clone 0.00 0.000125 0 333 285 access 0.00 0.000102 2 48 48 bind 0.00 0.000079 0 216 set_robust_list 0.00 0.000064 0 312 sigaltstack 0.00 0.000036 0 192 192 sched_setscheduler 0.00 0.000031 1 48 getdents 0.00 0.000031 0 96 socket 0.00 0.000030 0 72 geteuid32 0.00 0.000025 1 48 pread64 0.00 0.000023 0 48 set_tid_address 0.00 0.000021 0 96 madvise 0.00 0.000000 0 901 fork 0.00 0.000000 0 332 lseek 0.00 0.000000 0 24 24 mkdir 0.00 0.000000 0 24 setrlimit 0.00 0.000000 0 48 old_getrlimit 0.00 0.000000 0 48 readlink 0.00 0.000000 0 24 ftruncate 0.00 0.000000 0 24 sched_getparam 0.00 0.000000 0 24 sched_getscheduler 0.00 0.000000 0 24 sched_yield 0.00 0.000000 0 48 poll 0.00 0.000000 0 48 getrlimit 0.00 0.000000 0 48 getgid32 0.00 0.000000 0 48 getegid32 0.00 0.000000 0 96 getdents64 0.00 0.000000 0 48 fcntl64 0.00 0.000000 0 216 gettid 0.00 0.000000 0 87 set_thread_area 0.00 0.000000 0 48 48 connect 0.00 0.000000 0 48 sendto 0.00 0.000000 0 48 recvfrom 0.00 0.000000 0 48 setsockopt ------ ----------- ----------- --------- --------- ---------------- 100.00 72.843219 4600699 79238 total
_______________________________________________ Bug-make mailing list Bug-make@gnu.org https://lists.gnu.org/mailman/listinfo/bug-make