feature request: parallelize make builds.
current problem: make is serial in nature. there is room for making it
series-parallel.
I have been toying with this idea of parallel builds to gain project compile
speed (reducing time to a fraction) for quite a while.
compiles seem to spend more time on cpu and memory than they do disk for large
monolithic files. but for the typical small files on most projects, I should
think they would be more disk bound.
I have 2 machines I could test with (one with 2 threads and 32-bit and 38GB VM
and 3GB RAM, one with 12 threads and 64-bit and 64GB memory and 64GB VM), both
with about 50+ projects I could parallelize the builds on. but it would take
some manual typing work to do this kind of testing if you want it done (I may
do it anyway, I want to parallelize my builds somehow to save time and make use
of this nice fast proc). right now I am transferring the files to my new
machine. so it could be a month before I get anywhere. or less. the problem
with my current build system is, they are batch files and I don't use make. I
would have to convert the build systems for all of my 50+ projects, and also
redo my build system somehow by making some sort of mingw make template, and I
am no make expert. I am just putting this idea out there for someone to grab
onto and implement. should I just feed it to the GNU folks via a bug report?
on to the idea...
if you want to compile anything big without losing hair, you should start
compiling individual items in parallel where possible. in fact, within that,
the compilers if possible should be multithreaded compilers where you can
either set the limit on the number of number of threads (as long as it doesn't
go over what the system has available) or by default auto-detect the number of
threads. although it seems compilers do seem very much a serial thing from what
little I remember in my compiler class from 20 years ago...
one large scale example is CPU-based HPC machines with EMC disks would benefit
from this kind of feature. it would make provision for parallel builds for a
given project. which brings me to my next point.
I am starting to parallelize my compiles BECAUSE IT MAKES THINGS FASTER up to a
limit, which is probably some combination of disk speed and cpu threads.
this will afford more speed since most procs are multithreaded/multicore and
windows treats them like cpus. Of course, this is not limited to windows. you
can bring this to the mac and to linux also. any platform that has a C++
compiler that compiles lots of individual files.
this will make things disk bound very quickly however, since usually these
files reside on a single disk. this is where RAID comes in very handy and
provides the extra jump in speed. and this is where the EMC or even small-scale
RAID boxes for personal use or 19" RAID racks for work use can come in. you can
make build servers with this work even better by using the processors more
efficiently. this is also where you can do things like buy a certain number of
threads for faster build time, etc. for cloud build services.
an idea I have is you can have a fixed pool of worker threads assigned as
compile job engines, each with their own spooler.
and you need to sync up the jobs internally when finishing a SERIALCOMPILE
command.
for a SERIALCOMPILE job list such as a list of .o/obj files you want made from
.c files, so that the SERIALCOMPILE command finishes with them all done.
instead of the usual compile command using .c.o: $(CC) etc, I think it was, you
do COMPILESERIAL 12 THREADS for a 12-threaded CPU or COMPILESERIAL AUTO THREADS
and then your list of .cpp files and what file extension you want them turned
into (such as .obj), and what command you want to use to do it. maybe this
would havce to take up several lines of make.
or something like that.
think of it like a glorified make.
at some point I should expect we might even see a parallel build system in
place I would hope, once software developers begin to start thinking in terms
of parallel builds instead of serial builds - it would cut time by a fraction,
but you have to be careful HOW you do it.
some things just have to be done serially. that would just be regular make
commands.
I am not sure if I am providing this idea to the right person or not. maybe I
should be going to Intel or to GNU or to Microsoft or Apple or all of the
above. but I didn't really want one vendor hogging all of the benefits. so I
thought I would bring the idea to you. should you want to bring the
specification of a parallel build system into the language, I would appreciate
this (because I could certainly use it!).
and for us software developers, it would reduce our compile times.
Note that on windows machines, there is WaitForMultipleObjects() in the Win32
API to handle the issue of waiting for [process, window, thread, whatever]
HANDLEs without using while+for loops and some sort of conditional.
personally, I would like to see this in every make and build system and see it
made generally available. everyone has multicore processors now. let's make use
of them!
-------------
Jim Michaels
jmich...@yahoo.com
j...@renewalcomputerservices.com
http://RenewalComputerServices.com
http://JesusnJim.com (my personal site, has software)
---
IEC Units: Computer RAM & SSD measurements, microsoft disk size measurements
(note: they will say GB or MB or KB or TB when it is IEC Units!):
[KiB] [MiB] [GiB] [TiB]
[2^10B=1,024^1B=1KiB]
[2^20B=1,024^2B=1,048,576B=1MiB]
[2^30B=1,024^3B=1,073,741,824B=1GiB]
[2^40B=1,024^4B=1,099,511,627,776B=1TiB]
[2^50B=1,024^5B=1,125,899,906,842,624B=1PiB]
SI Units: Hard disk industry disk size measurements:
[KB] [MB] [GB] [TB]
[10^3B=1,000B=1KB]
[10^6B=1,000,000B=1MB]
[10^9B=1,000,000,000B=1GB]
[10^12B=1,000,000,000,000B=1TB]
[10^15B=1,000,000,000,000,000B=1PB]
_______________________________________________
Bug-make mailing list
Bug-make@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-make