Re: [dev-servo] Using Cargo in Gecko

Yehuda Katz Mon, 21 Dec 2015 19:53:13 -0800

Thanks. This is great!

Responses inline.

Yehuda Katz
(ph) 718.877.1325

On Mon, Dec 21, 2015 at 6:19 PM, Gregory Szorc <g...@mozilla.com> wrote:

> On Mon, Dec 21, 2015 at 2:29 PM, Bobby Holley <bobbyhol...@gmail.com>
> wrote:
>
> > On Mon, Dec 21, 2015 at 2:21 PM, Yehuda Katz <wyc...@gmail.com> wrote:
> >
> >> On Mon, Dec 21, 2015 at 2:14 PM, Bobby Holley <bobbyhol...@gmail.com>
> >> wrote:
> >>
> >> > I don't think this is going to fly in Gecko, unfortunately.
> >> >
> >> > Gecko is a monolithic repo, and everything needs to be vendored
> in-tree
> >> (a
> >> > non-negotiable requirement from the build peers). This means that
> we'll
> >> > already need an automated process for pulling in updates to the shared
> >> > code, committing them, and running them through CI.
> >> >
> >>
> >> Can you say a bit more about what the requirements are here? Is the
> reason
> >> for including the code in-tree that you need to be 100% confident that
> >> everyone is talking about the same code? Or is it more than that?
> >>
> >
> > The reason I've heard are:
> > (1) Gecko/Firefox has a very complicated releng setup, and the builders
> > are heavily firewalled from the outside, and not allowed to hit the
> > network. So adding network dependencies to the build step would require a
> > lot of operations work.
> > (2) Gecko exists on a pretty long timescale, and we want to make sure
> that
> > we can still build Firefox 50 ten years from now, even if Caro has long
> > migrated to some other setup.
> > (3) A general unease about depending on any third-party service without a
> > contract and SLA in order to build and ship Firefox.
> >
> > There may be other reasons, or I may be getting some of these wrong. This
> > all comes from gps, ted, etc, so you're probably better off discussing
> with
> > them directly.
> >
>
> This is the gist of it. There are also implications for downstream
> packagers. The more complicated our build mechanism is, the more work it is
> for them. Having everything vendored makes it self contained and more
> manageable.
>
> There is also a general trend towards reproducible builds. Those are a bit
> harder to attain when you are trying to cobble together pieces from
> multiple repositories. Related to this are security and integrity concerns.
> Could a malicious actor insert a vulnerability in Firefox by compromising a
> 3rd party repository/project? Would we necessarily have the audit trail in
> place to detect this if things weren't vendored? (Yes, we have exposure to
> this today.)
>

Cargo also has reproducible builds as a hard requirement. For Cargo, this
means that the source code used to build a project on one machine must be
byte-for-byte equivalent to the source code used on another machine. This
is true about direct dependencies as a well as indirect (transitive)
dependencies, and is non-negotiable.

Today, we achieve this goal by serializing the "precise version" of each
dependency into Cargo.lock. "Precise versions" have these requirements:

   - For a given source, the precise version is sufficient to uniquely
   identify the source code of the dependency.
   - The code referenced by a precise version is immutable; it should be
   impossible for the precise dependency to point at different byte-for-byte
   source code over time.

For crates.io, this is achieved by using the name and version of the
dependency as the precise version, and making the registry immutable. It is
impossible to change the code pointed to by a version after it has been
published.

For git dependencies, this is achieved by recording the precise rev of the
code that was used when the lock file was produced.

Because both of these approaches have hypothetical compromises (the
crates.io server could be compromised, and git's use of SHA1 for revisions
is insufficiently secure), we plan to add an additional layer of
protection: we will record a SHA256 digest of the exact source code used in
the Cargo.lock, which we will use to verify that the source code is
byte-for-byte compatible.

Running `cargo build` against a repository with a `Cargo.lock` will
reliably use exactly the versions of the source code referenced by the
`Cargo.lock`, which makes Cargo builds fairly reliable by design.

> Also, #3 is more important than #1. To add some perspective, we can't have
> parts of automation clone from github.com because we've found GitHub to be
> too unreliable. I'm not talking about the China-based DDoS from a few
> months back - this has been a longstanding problem. In general, we don't
> want to have a Firefox chemspill delayed because some random 3rd party
> server isn't available.
>

In my opinion, #3 is best addressed with support for automated mirroring of
both crates.io and git dependencies. Mega-corps like LinkedIn use mirroring
services for Rubygems, npm, and maven to ensure high-availability. This
maintains the normal developer workflow and ensures that developers on the
project do not rely on custom infrastructure for updates, while still
ensuring that the deployment process isn't dependent on third-party
infrastructure.

Because the Cargo.lock is still checked into the main repository, it is
still possible to make strict (automated) policies about updates to
upstream dependencies, and require special manual review in order to accept
such a change. I agree that the ability to have such policies is important,
and intended for Cargo.lock to enable them.

-- Yehuda
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo

Re: [dev-servo] Using Cargo in Gecko

Reply via email to