> On Apr 5, 2025, at 11:43 AM, Joshua DeWeese <josh.dewe...@gmail.com> wrote:
> 
> Hi,
> I was wondering about the possibility of adding, as a feature to make,
> the addition of a standard makefile fragments library.

On a related note, I released a library called "make-booster":
https://github.com/david-a-wheeler/make-booster
It provides additional useful facilities, e.g., added support for large
data pipelines & Python. Below is a summary.

It might be useful if make pointed to some of these support systems, or
made it easier to download/install them.

--- David A. Wheeler

==== INFO ABOUT MAKE-BOOSTER ====

Make-booster
This project (contained in this directory and below) provides utility routines 
intended to greatly simplify data processing (particularly a data pipeline) 
using GNU make. It includes some mechanisms specifically to help Python, as 
well as general-purpose mechanisms that can be useful in any system. In 
particular, it helps reliably reproduce results, and it automatically 
determines what needs to run and runs only that (producing a significant 
speedup in most cases).
Specific capabilities
In particular:
    • It provides mechanisms to ensure that if a Python script is modified 
(including one that is transitively included by other Python scripts), or its 
internal inputs are modified, all the processes that depend on that script (or 
internal inputs) are rerun. This dependency calculation for Python scripts is 
done automatically by a tool included in this pacakge.
    • It provides general-purpose mechanisms to help do the same for other 
programming languages.
    • By default it enables "Delete on Error" to avoid accidentally including 
corrupted data in final results.
    • It supports "grouped targets" to correctly handle processes that generate 
multiple files, without requiring GNU make version 4.3 or later.
    • It automatically runs tests as appropriate if some file is changed, but 
only if the test could change its results (by examining transitive 
dependencies). We include default mechanisms for doing that in Python, and 
hooks to support other languages.
    • It will run source code scans run as appropriate if a file is changed. It 
includes defaults to do that in Python and shell, and hooks to do that with 
other languages.
For example, imagine that Python file BBB.py says include CC, and file CC.py 
reads from file F.txt (and CC.py declares its INPUTS= as described below). Now 
if you modify file F.txt or CC.py, any rule that runs BBB.py will automatically 
be re-run in the correct order when you use make, even if you didn't directly 
edit BBB.py.
In tests with over 1000 files the overhead for GNU make to figure out "what to 
do" was only 0.07 seconds when there was nothing to do. The first time you ever 
use it on a project there's some work for it to do to record information, but 
that is a one-time cost and even that doesn't take too long (depending on your 
project's size).
The approaches used here are not new to software development; people who use 
compiled programming languages have used them for decades. However, many people 
who use dynamic languages (like Python) to implement data pipelines are unaware 
that these mechanisms exist, and we didn't find ready-make mechanisms to do 
this for data processing pipelines. So this is small set of tools on top of GNU 
make to do the same thing for data pipelines as is already done for some 
projects that use compiled languages.

Reply via email to