[rfc] Trying to make sense of opt-in Node integration in mozilla-central

Nicholas Alexander Fri, 20 Apr 2018 10:24:42 -0700

Colleagues,

I have a patch series that adds an opt-in --enable-node-environment
configure flag and, when that flag is set, uses Node (via Webpack) to
generate the Activity Stream content bundle.  This patch series does not
try to solve a few hard problems:

1) vendoring Node modules into the tree
2) installing $topobjdir/node_modules at build time efficiently.

There's a green artifact build of my prototype at

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d138f854139f2e389867b01d2f2afe59f2975783

I owe some folks (dmose, jlaster) details on what is need in order to
land this opt-in prototype and, more importantly, how to make the
prototype not opt-in. To that end, I talked to most of the build peers
(chmanchester, gps, mshal, ted) yesterday in YVR.

The results were not what I expected of that discussion were not what I
expected.

First, some context. The build system is investing heavily into
capturing the full dependency DAG (Directed Acyclic Graph) in order to
produce correct builds. The current build backends, and in particular
the dominant RecursiveMake build backend, do not capture the full DAG.
Capturing the full DAG is required to use "modern" build systems like
Tup, Buck, or Bazel. Any sub-component of the build must therefore
either correspond to edges in the DAG (these inputs, these outputs) or,
if it does its own caching and invalidation, expose its internal DAG.
In the current build system, C compiler invocations are the prototype of
the first situation and cargo is the prototype of the second situation.
What I did not know is that the build peers are contributing code to
cargo to have it expose its internal DAG, and that all of the "modern"
build systems (in particular Buck) need this functionality to integrate
against cargo.

Second, my appraisal of the situation.

Integrating Node will be very challenging. On the one hand, |yarn
install| (or
|npm install|) is, like cargo, in the second situation -- it is its own
build system that does its own caching and invalidation. That means
that to integrate into the build system it must expose its internal DAG.
It's possible that yarn could expose its own DAG, but Node modules can
define arbitrary pre- and post-install scripts, which are essential to
the module ecosystem. I can't imagine us being able to capture the
"leaf DAG" of every installed module -- there are no rules out at the
leaves.

On the second hand, the most general form of integration (which I have
been pursuing) is to enable the build system to invoke arbitrary yarn
verbs (like `GENERATED_FILES[...].script = 'yarn.py';
GENERATED_FILES[...].flags = 'run arbitary_yarn_verb'). Arbitrary yarn
verbs are, well, arbitrary -- they could be simple, like C compiler
invocations, or they could be build systems in their own right, like
Webpack. For arbitrary yarn verbs, I don't think it's feasible to
extract DAGs from the Node ecosystem tools involved.

Third, what is to be done.

The build peers most invested in the transition to a "modern" build
system (here, Tup) are chmanchester and mshal. They conclude that it is
not possible to integrate build systems into each other without
significant work exposing internal DAGs (which we are willing to do for
cargo). They instead propose that build systems not integrate but
instead run in serial. That is, the "Node bits" run either first (and
provide inputs to the rest of the build system) or run second (and
consume outputs from the rest of the build system). Of course, that
arrangement sacrifices parallelism and throughput, but at least the
final output will be correct.

This leads me to propose that we treat |yarn install| as a separate
build system that runs before the main build system. It manages its own
caching and invalidation, and produces $topobjdir/node_modules. |yarn
install| is intended to efficiently determine that its output is
up-to-date, so perhaps the overhead of running it every time we build
will be acceptable. (Otherwise, we try to find ways to invoke it less
frequently.)

We then have a choice. We can either push _all_ Node invocations into
the first build system and accept what I expect to be a big performance
penalty in practice; or we can restrict the Node integration in the main
build system to commands that we are confident are not their own build
systems.

The former is fully general but will require non-trivial effort to
implement in the build system, I expect -- perhaps a new build backend,
specialized to Node, and some glue code in |mach build| to manage
ordering the systems. In addition, such an arrangement could never
allow Node bits to depend on regular build system bits, since the Node
bits would always happen first. That might make some sense right now,
since all of the Node projects we're integrating stand-alone (usually on
GitHub!) but as more of the core Firefox front-end functionality
leverages Node that will look worse and worse. Even exposing
AppConstants.jsm to Node could be fraught (if the actual contents are
required, for example to tree shake on the basis of build flags).

The latter is restrictive -- for example, we might support only Rollup
but not Webpack, since Rollup is more clearly inputs-to-outputs and
Webpack is more focused on incremental builds -- and requires labour to
audit and add support for new tools. However, it requires less up front
build system modifications and is easier to transition to gradually.

Fourth, my conclusion.

I prefer working within the existing build system and invoking Node
commands rather than arbitrary yarn verbs.

The fast path to landing this as an opt-in therefore looks like:

After that we can tackle vendoring Node modules into the tree, which does
not appear to have anything fundamental blocking it.

Phew! That's a wall of text. Please correct me if I'm misunderstanding
things, or if my explanations need clarification. As I said, the
results of this discussion were not what I expected, so this is mostly
new to me :/

I'll wait to collect some feedback on this summary before trying to
figure out next steps.

Yours,
Nick

_______________________________________________
dev-builds mailing list
dev-builds@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-builds

[rfc] Trying to make sense of opt-in Node integration in mozilla-central

Reply via email to