Re: Building Firefox in TaskCluster

Gregory Szorc Wed, 09 Sep 2015 09:56:49 -0700

On Tue, Sep 8, 2015 at 7:38 AM, Dustin Mitchell <dus...@mozilla.com> wrote:


> Thanks!  Greg, I agree with a lot of what you have said.  Docker has
> some design issues, but it has advantages too.  Among those, it caches
> well and starts new containers very quickly.  I don't think we could
> match that speed with an approach that was manipulating individual
> files.
>
> If you think of Docker as a way of "freezing" the built images in a
> way that is traceable and can be deployed quickly, I think it makes a
> bit more sense.
>

I do. I wasn't suggesting we build images at job time: they should
definitely be pre-built and cached (OK, maybe there is a TaskCluster Graph
like job that ensures they are built and cached at the very beginning of
overall job execution - but certainly not during a job itself: that would
likely add too much overhead and redundancy).


>
> Regarding layers -- we can use those where it's useful, and not where
> it's problematic.  In the implementation I've put together, all of the
> package installation happens in one Docker "layer", so there's no
> redundant caching of files that are later deleted, etc.  It's worth
> noting that the overall size of the desktop-build docker image is
> 1.4GB - about 60% smaller than a mozilla-central checkout.
>
> One place the layering helps is in caching.  The current
> implementation is comprised of three layers: the base `centos6` image
> from RedHat (200MB), `centos6-build` which installs a long list of
> build dependencies (1.2GB), and `desktop-build` which installs a few
> shell scripts to make things go (negligible).  Rather than re-creating
> that centos6-build image frequently, we could insert a
> `centos6-build-updates` layer after it, which just involves a `yum
> update` run.  That layer will be fairly small and grow slowly, since
> centos6 gets so few updates.  Then developers on slow connections
> would only need to download `centos6-build-updates` and
> `desktop-build` to have the latest-and-greatest build image.
>
> Axel, you raise a good point about hackfests.  Docker has an `export`
> command which could be used to create a tarball to put on a USB stick.
> That has the side-effect of squashing layers, but for this particular
> purpose I don't think that will hurt.  The effect is similar to the VM
> image you mentioned.  I think there's also some means of carrying
> layers around other than HTTP (so hackers could prime their docker
> image caches from the USB stick), but I can't find it at the moment.
>
> Regarding the image build system, I see this running maybe 5-10 times
> a week (once automatically, plus a few try jobs as someone hacks on
> upgrading this or that library and finally lands to inbound), so I
> don't think it's a huge pain in the Internet.  Aside from the RPMs,
> the packages we install *are* cached locally, and content is verified
> for everything (acknowledging weaknesses in the yum signature system).
>
> When it comes to building packages that aren't easily available
> upstream (and, note, we require CentOS 6 for production builds and
> CentOS 6 is not, to my knowledge, available from Debian!), I agree
> that skipping the package database poses no issues.  I would be happy
> with another solution that offloads the package building, as long as
> that is automated and traceable (no "built it on my laptop and pushed
> it") and can be done in try by non-privileged users.  That's tricky
> with yum/apt repositories.  Nix might be able to do it?  Alternately,
> with some effort we could create other TaskCluster tasks to build each
> dependency as artifacts, then combine those artifacts together into
> the docker image.
>
> In general, lots of great ideas, but of course I can't promise to
> implement all of them.  Still, I don't think the system I've outlined
> *precludes* any of those great ideas, nor requires implementing them -
> in fact it enables a great many of them.  Especially deterministic
> builds :)
>
> Dustin
>
> On Fri, Sep 4, 2015 at 7:09 PM, Axel Hecht <l...@mozilla.com> wrote:
> > On 9/5/15 12:06 AM, Gregory Szorc wrote:
> >>
> >> First, thank you for spending the time to compose this. It is an
> >> excellent write-up. Responses inline.
> >>
> >> On Fri, Sep 4, 2015 at 1:24 PM, Dustin Mitchell <dus...@mozilla.com
> >> <mailto:dus...@mozilla.com>> wrote:
> >>
> >>       I'd like to get some feedback on changes that we (release
> >> engineering
> >>     and the taskcluster team) are planning around how we build Firefox.
> I
> >>     apologize that this is a bit long, as I'm trying to include  the
> >>     necessary background.  I have some questions at the end about which
> >>     I'd like to have some discussion.
> >>
> >>     Before I get into it, a word of apology.  We haven't done a great
> job
> >>     of talking about this work --
> >>
> >>     I've talked to many members of the build module individually, but in
> >>     so doing perhaps not shared all of the required background or
> >>     established a common understanding.  It's easy to get so deeply into
> >>     something that you assume everyone knows about it, and I fear that's
> >>     what may have happened.  So if some of this comes as a surprise or
> >>     feels like a fait accompli, I apologize.  It's certainly not
> finished,
> >>     and it's all software, so we can always change it.  We're working to
> >>     be more communicative and inclusive in the future.
> >>
> >>     = Buildbot and TaskCluster =
> >>
> >>     As you may know, we currently use Buildbot to schedule build (and
> >>     test) jobs across all of our platforms.  This has had a number of
> >>     issues, including difficulty in reproducing the build or test
> >>     environments outside of Buildbot, difficulty in testing or deploying
> >>     changes to the build process (especially around scheduling and host
> >>     configuration), and difficulty scaling. One of the issues we
> struggled
> >>     with internally was the difficulty in making requested upgrades to
> >>     otherwise "frozen" build systems: often the available upgrade was
> not
> >>     available for the ancient version of the operating system we were
> >>     runnning.
> >>
> >>     You may have your own issues to add to this list -- I'd be
> interested
> >>     to hear them, to see if we are addressing them, or can address them,
> >>     in this change!
> >>
> >>     During the development of Firefox OS, though, another parallel
> system
> >>     called TaskCluster (https://docs.taskcluster.net) was developed.
> It's
> >>     a cloud-first job-scheduling system designed with simplicity in
> mind:
> >>     tasks go in as structured data and run, producing logs, artifacts,
> and
> >>     even other tasks.  Here's a list of the design goals for
> TaskCluster:
> >>
> >>       * Establish a developer-first workflow for builds and tests.
> >>       * Speed/flexibility of deployment: Minimize server-side config
> >>     requirements; new Taskcluster Platform tasks can be deployed in days
> >>     or weeks, BB platform takes months/quarters.
> >>       * Reproducibility: Ability to define complete OS environment /
> image
> >>     outside of Releng- or IT-controlled infra; Increased transparency in
> >>     deployment
> >>       * Self-service: no releng needed to change any tasks, in-tree
> >>     scheduling
> >>       * Extensibility: Developing a general platform for developing
> tools
> >>     we haven't thought of
> >>
> >>     The bit of TaskCluster that is probably most salient for this
> audience
> >>     is this: a TaskCluster task essentially boils down to "run this
> shell
> >>     command in this operating system image," and the scheduling system
> >>     gets both the shell command and the operating system image from the
> >>     gecko tree itself.  That means you can change just about everything
> >>     about the build process, including the operating system (host
> >>     environment), from in-tree -- even by pushing to try!  As an
> example,
> >>     here [1] is a recent try job of mine.
> >>
> >>     We are currently in the process of transitioning from Buildbot to
> >>     TaskCluster.  This is, of course, an enormous project, and requires
> a
> >>     lot of hard work and attention from Releng, A-Team, the TaskCluster
> >>     team, and from other teams impacted by the changes.  I also think
> it's
> >>     going to be enormously rewarding, and free us of a lot of the
> >>     constraints and issues I mentioned above.  Ideally everyone wins --
> >>     build, Releng, A-Team, TaskCluster, all developers, even IT.  So if
> >>     you see something here as "losing", aside from the inevitable
> friction
> >>     of change, please speak up.
> >>
> >>     = Linux Builds =
> >>
> >>     Zooming in a little bit, let's talk about Linux builds of Firefox
> >>     Desktop.  Mac, Windows, Fennec, B2G etc. are all in various states
> of
> >>     progress, and we can talk about those too.  My focus right now is on
> >>     Linux builds and Glandium has raised some questions about them that
> >>     I'd like to address here.
> >>
> >>     For tasks that run on Linux, we can use Docker.  That means that the
> >>     "operating system image" is a Docker image.  In fact, we have a
> method
> >>     for building those Docker images using in-tree specifications and
> >>     plans[2] to support automatically rebuilding them on pushes to try.
> >>     I've built a working CentOS 6.7 image specification[3] based on the
> >>     mock environments used in buildbot, and I'm working on greening that
> >>     up and putting it into TreeHerder as a Tier-2 build.
> >>
> >>     Mock is not used at all -- the build runs directly in the docker
> >>     image, using a "worker" user account.  Taskcluster invokes a script
> >>     that's baked into the docker image which knows enough to checkout
> the
> >>     required revisions of the required sources (build/tools and gecko),
> >>     then execute an in-tree script
> >>     (testing/taskcluster/scripts/builder/build-linux.sh).  That
> >>     build-linux.sh translates a whole bunch of parameters from
> environment
> >>     variables into Mozharness configuration, then invokes the proper
> >>     Mozharness script, which performs the build as usual.  All of that
> is
> >>     easily tested in try -- that's what I've been doing for a few weeks
> >>     now!
> >>
> >>     This approach has lots of advantages, and (we think) solves a few of
> >>     the issues I mentioned above:
> >>
> >>       * Since everything is in-tree, everyone can self-serve.  There's
> no
> >>     need to wait on resources from another team to modify the build
> >>     environment, e.g., to upgrade valgrind.
> >>
> >>       * Since everything is in-tree, it can be handled like any other
> >>     commit: tried, backed out, bisected, put  on trains, merged, etc.
> >>
> >>
> >> This is all fantastic. We've wanted to move in this direction for years.
> >> It will enable all kinds of people to experiment with new and crazy
> >> ideas without having to bother anybody on the automation side of things
> >> (in theory). This should enable all kinds of changes and experiments
> >> that were otherwise too costly to perform. It is huge for productivity.
> >>
> >>
> >>       * Downloading and running a docker image is a well-known process,
> so
> >>     it's easy for devs to precisely replicate the production build
> >>     environment when necessary
> >>
> >>       * Inputs and outputs are precisely specified, so builds are
> >>     repeatable.  And because each gecko revision specifies exactly the
> >>     docker image used to build it, you can even bisect over host
> >>     configuration changes!
> >>
> >>     To address the issue of difficult upgrades, and to support the
> >>     security team's desire that we not run insecure versions of
> packages,
> >>     I have suggested that we rebuild the docker images weekly,
> regardless
> >>     of whether there are configuration changes to the images.  This
> would
> >>     incorporate any upstream package updates, but not upgrade to a new
> >>     major distro version (so no unexpected upgrade to CentOS 7).
> >>     Mechanically, a bumper script would increment a "VERSION" file
> >>     somewhere in-tree, causing an automatic rebuild[2] of the image.
> Thus
> >>     the "bump" would show up in treeherder, in version-control history,
> >>     and in perfherder and could be bisected over and blamed for build or
> >>     test failures or performance regressions, just like any other
> >>     changeset.  Reverting the changeset would revert to the earlier
> image.
> >>     The changeset would ride the trains just like any change.
> >>
> >>     = Questions =
> >>
> >>     Glandium has already raised a few questions about this plan.  I'll
> >>     list them here, but reserve my suggestions for a later reply.
> Please
> >>     do respond to these questions, and add any other comments or
> questions
> >>     that you might have so that we can identify, discuss, and address
> any
> >>     other risks or downsides to this approach.
> >>
> >>     1. A weekly bump may mean devs trying to use the latest-and-greatest
> >>     are constantly downloading new docker images, which can be large.
> >>
> >>
> >>     2. Glandium has also expressed some concern at the way
> >>     testing/docker/centos6-build/system-setup.sh is installing software:
> >>     downloading source tarballs and building them directly.  Other
> >>     alternatives are to download hand-built RPMs directly (as is done
> for
> >>     valgrind, yasm, and freetype, since they were already available in
> >>     that format) or to host and maintain a custom Yum repository.
> >>
> >>
> >> My professional opinion is that the layered images approach used by
> >> Docker out of the box is extremely sub-optimal and should be avoided at
> >> all costs. One of the primary reasons for this is because you end up
> >> having to downloads gigabytes of image layers and/or distro packages
> >> over and over and over again. This will be a very real concern for
> >> developers that don't have super fast Internet connections. This
> >> includes some Mozilla offices. A one-time cost to obtain the source code
> >> and system dependencies and the ongoing cost to keep these up to date is
> >> acceptable. But many barely tolerate it today. If we throw Docker's
> >> inefficiency into the mix, I worry about the consequences.
> >>
> >> I think it is a worthwhile investment to build out a Docker image
> >> management infrastructure that doesn't abuse the Internet so much. This
> >> almost certainly entails caching tarballs, packages, etc locally and
> >> then having the image build process leverage that local cache. I've
> >> heard of a few projects that are basically transparent Yum/Apt/PyPI
> >> caching proxies that do just this. Unfortunately, I can't find links to
> >> them right now. There are also efforts to invent better Docker image
> >> building techniques. I chatted with someone from RedHat about this a few
> >> months ago and it sounded like we were on the same page about an
> >> approach that composes images from cached/shared assets. I /think/
> >> Project Atomic (http://www.projectatomic.io/) was looking into things
> >> like building Docker images using the Yum "database" and cached RPMs
> >> from the host machine. Not sure how far they got.
> >>
> >> As for how exactly packages/binaries should make their way to images, we
> >> have a few options.
> >>
> >> I know this is going to sound crazy, but I think containers remove most
> >> of the necessity of a system packaging tool. Man pages, services, config
> >> files, etc are mostly irrelevant in containers - especially ones that
> >> build Firefox. Even most of the support binaries you'll find in
> >> containers are unused. System packaging is focused on managing a
> >> standalone system with many running services and with a configuration
> >> that is dynamic over time. The type of containers we're talking about
> >> only need to do one thing (build Firefox). So most of the benefits of a
> >> system packaging tool are overhead and cruft to the container world. The
> >> system packaging tools will need to adapt to this brave new container
> >> world. But until they do, I wouldn't feel obligated to use a system
> >> packaging tool, at least not in the traditional sense. e.g. I would use
> >> the lower-level `dpkg --unpack` or `dpkg --install` instead of `apt-get
> >> install` because apt-get provides little to no benefit for containers.
> >>
> >> Continuing this train of thought, I don't think there is anything wrong
> >> with defining images in terms of a manifest of tarballs, RPMs, or debs
> >> that should be manually uncompressed into / (possibly with filtering
> >> involved so you don't install unused files like man pages). The
> >> manifests should have embedded checksums for *everything* to prevent
> >> against MitM attacks and undetected changes. This manifest approach is
> >> low level and fast. Caching the individual components is trivial.
> >> Instead of downloading whole new images every week, you are downloading
> >> the individual packages that changed. This should use less bandwidth.
> >> This approach also gives you full control. It doesn't bind you to the
> >> system packager's world view that you are building a full-fledged
> >> system. If you squint hard enough, this approach kinda resembles
> >> tooltool, just carried out to the extreme.
> >>
> >> For the record, I've used this approach at a previous company. We had
> >> plain text manifests listing archives to install. There was an
> >> additional layer to run scripts after the base image was built to
> >> perform any additional customization. It worked insanely well. I wish
> >> Docker would have used this approach from the beginning. But I
> >> understand why they didn't: it was easier to lean on existing system
> >> packaging tools, even if they aren't (yet) suited for a container world.
> >>
> >> As for where to get the packages from, we should favor using bits
> >> produced by reputable vendors, notably RedHat and Debian. I especially
> >> like Debian because many of their packages are now reproducible and
> >> deterministic. This provides a lot of defense against Trusting Trust
> >> attacks. In theory, we could follow in the footsteps of Tor and enable a
> >> bit-identical Firefox build. No other browser can provide this to the
> >> degree we can. It makes the tin foil hat crowd happy. But the real
> >> benefit is to developers not having "drift" between build environments.
> >> And you don't need to build everything from source to achieve that. So
> >> as cool as Gitian (Tor's deterministic build tool) is, we shouldn't lose
> >> sleep over using it. Come to think of it, if we use Debian packages for
> >> building Firefox, we should be reproducible via the transitive property!
> >>
> >> Whew, that was a lot of words. I hope I gave you something to think
> >> about. I'm sure others will disagree with my opinions on the futility of
> >> system packaging in a container world :)
> >
> >
> >
> > I fully agree on the gist of it here.
> >
> > We want to increase participation, and thus, kicking things off needs to
> be
> > cheap.
> >
> > It also need a chance to succeed on flaky internets.
> >
> > We also need to support environments of hack fests. You got 20/30/50/100
> > people in a single office, over one line of internet, and they all want
> to
> > start hacking at the same time.
> >
> > When we did the last hackathon in Berlin, we had the VM with a
> precompiled
> > build.
> >
> > A few minutes of USB stick goodness, and people were launching a build.
> >
> > Axel
> > _______________________________________________
> > dev-builds mailing list
> > dev-builds@lists.mozilla.org
> > https://lists.mozilla.org/listinfo/dev-builds
> _______________________________________________
> dev-builds mailing list
> dev-builds@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-builds
>

_______________________________________________
dev-builds mailing list
dev-builds@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-builds

Re: Building Firefox in TaskCluster

Reply via email to