[gentoo-amd64] Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?)

Duncan Mon, 04 Aug 2014 22:52:45 -0700

Mark Knecht posted on Mon, 04 Aug 2014 15:04:12 -0700 as excerpted:

> As the line in that favorite song goes "Paranoia strikes deep"...


FWIW, while my lists sig is the proprietary-master quote from Richard 
Stallman below, since the (anti-)patriot bill was passed in the reaction 
to 9-11, my private email sig is a famous quote from Benjamin Franklin:

"They that can give up essential liberty to obtain a little
temporary safety, deserve neither liberty nor safety."

So "I'm with ya..."

> <NOTE>
> I am NOT trying to start ANY political discussion here. I hope no one
> will go too far down that path, at least here on this list. There are
> better places to do that.
> 
> I am also NOT suggesting anything like what I ask next has happened,
> either here or elsewhere. It's just a question.
> 
> Thanks in advance.
> </NOTE>
> 
> I'm currently reading a new book by Glen Greenwald called "No Place To
> Hide" which is about Greenwald's introduction to Edward Snowden and the
> release of all of the confidential NSA documents Snowden acquired. This
> got me wondering about Gentoo, or even just Linux in general. If the
> underlying issue in all of that Snowden stuff is that the NSA has the
> ability to intercept and hack into whatever they please, then how do I
> know that the source code I build on my Gentoo machines hasn't been
> modified by someone to provide access to my machine, networks, etc.?

These are good questions to ask, and to have some idea of the answers to, 
as well.

Big picture, at some level, you pretty much have to accept that you 
/don't/ know.  However, there's /some/ level of security... tho honestly 
a bit less on Gentoo than on some of the other distros (see below), tho 
it'd still not be /entirely/ easy to subvert at least widely (for an 
individual downloader is another question), but it could be done.

> Essentially, what is the security model for all this source code and how
> do I verify that it hasn't been tampered with in some manner?
> 
> 1) That the code I build is exactly as written and accepted by the OS
> community?

At a basic level, source and ebuild integrity, protecting both from 
accidental corruption (where it's pretty good) and from deliberate 
tampering (where it may or may not be considered "acceptable", but if 
someone with the resources wanted to bad enough, they could subvert), is 
what ebuild and sources digests are all about.  The idea is that the 
gentoo package maintainer creates hash digests of multiple types for both 
the ebuild and the sources, such that should the copy that a gentoo user 
gets not match the copy that a gentoo maintainer created, the package 
manager (PM, normally portage), if configured to do so (mainly 
FEATURES=strict, also see stricter and assume-digests, plus the webrsync-
gpg feature mentioned below) will error out and refuse to emerge that 
package.

But there are serious limits to that protection.  Here's a few points to 
consider:

1) While the ebuilds and sources are digested, those digests do *NOT* 
extend to the rest of the tree, the various files in the profile 
directory, the various eclasses, etc.  So in theory at least, someone 
could mess with say the package.mask file in profiles, or one of the 
eclasses, and could potentially get away with it.  But see point #3 as 
there's a (partial) workaround for the paranoid.

2) Meanwhile, since hashing (unlike gpg signing) isn't designed to be 
secure, primarily protecting against accidental damage not so much 
deliberate compromise, with digest verification verifying that nothing 
changed in transit but not who did the digest in the first place, there's 
some risk that one or more gentoo rsync mirrors could be compromised or 
be run by a bad actor in the first place.  Should that occur, the bad 
actor could attempt to replace BOTH the digested ebuild and/or sources 
AND the digest files, updating the latter to reflect his compromised 
version instead of the version originally digested by the gentoo 
maintainer.  Similarly, someone such as the NSA could at least in theory 
do the same thing in transit, targeting a specific user's downloads while 
leaving everyone else's downloads from the same mirror alone, so only the 
target got the compromised version.  While there's a reasonable chance 
someone would catch a bad mirror, if a single downloader is specifically 
targeted, unless they're specifically validating against other mirrors as 
well and/or comparing digests (over a secure channel) against those 
someone else downloaded, there's little chance they'd detect the 
problem.  So even digest-protected files aren't immune to compromise.

But as I said above, there's a (partial) workaround.  See point #3.

3) While #1 applies to the tree in general when it is rsynced, gentoo 
does have a somewhat higher security sync method for the paranoid and to 
support users behind firewalls which don't pass rsync.  Instead of 
running emerge sync, this method uses the emerge-webrsync tool, which 
downloads the entire main gentoo tree as a gpg-signed tarball.  If you 
have FEATURES=webrsync-gpg set (see the make.conf manpage, FEATURES, 
webrsync-gpg), portage will verify the gpg signature on this tarball.

The two caveats here are (1) that the webrsync tarball is generated only 
once per day, while the main tree is synced every few minutes, so the 
rsynced tree is going to be more current, and (2) that each snapshot is 
the entire tree, not just the changes, so for those updating daily or 
close to it, fetching the full tarball every day instead of just the 
changes will be more network traffic.  Tho I think the tarball is 
compressed (I've never tried this method personally so can't say for 
sure) while the rsync tree isn't, so if you're updating monthly, I'd 
guess it's less traffic to get the tarball.

The tarball is gpg-signed which is more secure than simple hash digests, 
but the signature covers the entire thing, not individual files, so the 
granularity of the digests is better.  Additionally, the tarball signing 
is automated, so while a signature validation pretty well ensures that 
the tarball did indeed come from gentoo, should someone compromise gentoo 
infrastructure security and somehow get a bad file in place, the daily 
snapshot tarball would blindly sign and package up the bad file along 
with all the rest.

So sync-method bottom line, if you're paranoid or simply want additional 
gpg-signed security, use emerge-webrsync along with FEATURES=webrsync-gpg, 
instead of normal rsync-based emerge sync.  That pretty well ensures that 
you're getting exactly the gentoo tree tarball gentoo built and signed, 
which is certainly far more secure than normal rsync syncing, but because 
the tarballing and signing is automated and covers the entire tree, 
there's still the possibility that one or more files in that tarball are 
compromised and that it hasn't been detected yet.

Meanwhile, I mentioned above that gentoo isn't as secure in this regard 
as a number of other Linux distros.  This is DEFINITELY the case for 
normal rsync syncers, but even for webrsync-gpg syncers it remains the 
case to some extent.  Unfortunately, in practice it seems that isn't 
likely to change in the near-term, and possibly not in the medium or 
longer term either, unless some big gentoo compromise is detected and 
makes the news.  THEN we're likely to see changes.

Alternatively, when that big pie-in-the-sky main gentoo tree switch from 
cvs (yes, still) to git eventually happens, the switch to full-signing 
will be quite a bit easier, tho there will still be policies to enforce, 
etc.  But they've been talking about the switch to git for years, as 
well, and... incrementally... drawing closer, including the fact that 
major portions of gentoo are actually developed in git-based overlays 
these days.  But will the main tree ever actually switch to git?  Who 
knows?  As of now it's still pie-in-the-sky, with no nailed down plans.  
Perhaps at some point somebody and some gentoo council together will 
decide it's time and move whatever mountains or molehills remain to get 
it done, and at this point I think that's mostly what it'll take, perhaps 
not, but unless that somebody steps up and makes that push come hell or 
high water, assuming gentoo's still around by then, come 2025 we could 
still be talking about doing it... someday...

Back to secure-by-policy gpg-signing...

The problem is that while we've known what must be done, and what other 
distros have already done, for years, and while gentoo has made some 
progress down the security road, in the absence of that ACTIVE KNOWN 
COMPROMISE RIGHT NOW immediate threat, other things simply continue to be 
higher priority, while REAL gentoo security continues to be back-burnered.

Basically, what must be done, thru all the way to policy enforcement and 
refusing gentoo developer commits if they don't match policy, is enforce 
a policy that every gentoo dev has a registered gpg key (AFAIK that much 
is already the case), and that every commit they make is SIGNED by that 
personal developer key, with gentoo-infra verification of those 
signatures, rejecting any commit that doesn't verify.

FWIW, there's GLEPs detailing most of this.  They've just never been 
fully implemented, tho incrementally, bits and pieces have been, over 
time.

As I said, other distros have done this, generally when they HAD to, when 
they had that compromise hitting the news.  Tho I think a few distros 
have implemented such a signed-no-exceptions policy when some OTHER 
distro got hit.  Gentoo hasn't had that happen yet, and while the 
infrastructure is generally there to sign at least individual package 
commits, and some devs actually do so (you can see the signed digests for 
some packages, for instance), that hasn't been enforced tree-wide, and in 
fact, there's a few relatively minor but still important policy questions 
to resolve first, before such enforcement is actually activated.


Here's one such signing-policy question to consider.  Currently, package 
maintainer devs make changes to their ebuilds, and later, after a period 
of testing, arch-devs keyword a particular ebuild stable for their arch.  
Occasionally arch-devs may add a bit of conditional code that applies to 
their arch only, as well.

Now consider this.  Suppose a compromised package is detected after the 
package has been keyworded stable.  The last several signed commits to 
that package were keywording only, while the commit introducing the 
compromise was sometime earlier.

Question:  Are those arch-devs that signed their keywording-only commits 
responsible too, because they signed off on the package, meaning they now 
have to inspect every package they keyword, checking for compromises that 
might not be entirely obvious to them, or are they only responsible for 
the keywording changes they actually committed, and aren't obligated to 
actually inspect the rest of the ebuild they're now signing?

OK, so we say that they're only responsible for the keywording.  Simple 
enough.  But what about this?  Suppose they add an arch-conditional that 
combined with earlier code in the package results in a compromise.  But 
the conditional code they added looks straightforward enough on its own, 
and really does solve a problem on that arch, and without that code, the 
original code looks innocently functional as well.  But together, anyone 
installing that package on that arch is now open to the world.  Both devs 
signed, the code of both devs is legit and looks innocent enough on its 
own, but taken together, they result in a bad situation.  Now it's not so 
clear that an arch-dev shouldn't have to inspect and sign for the results 
of the package after his commit, is it?  Yet enforcing that as policy 
will seriously slow-down arch stable keywording, and some archs can't 
keep up as it is, so such a policy will be an effective death sentence 
for them as a gentoo-stable supported arch.

Certainly there are answers to that sort of question, and various distros 
have faced and come up with their own policy answers, often because in 
the face of a REAL DISTRO COMPROMISE making the news, they've had no 
other choice.  To some extent, gentoo is lucky in that it hasn't been 
faced with making those hard choices yet.  But the fact is, all gentoo 
users remain less safe than we could be, because those hard choices 
haven't been made and enforced... because we've not been forced to do so.


Meanwhile, even were we to have done so, there's still the possibility 
that upstream development might be compromised.  Every year or two, some 
upstream project or another makes news due to some compromise or 
another.  Sometimes vulnerable versions have been distributed for awhile, 
and various distros have picked them up.  In an upstream-compromise 
situation like that, there's little a distro can do, with the exception 
of going slow enough that their packages are all effectively outdated, 
which also happens to be a relatively effective counter to this sort of 
issue since if a several years old version changes it'll be detected 
right away, and (one hopes) most compromises to a project server will be 
detected within months at the longest, so anything a year or more old 
should be relatively safe from this sort of issue, simply by virtue of 
its age.

Obviously the people and enterprise distros willing to run years outdated 
code do have that advantage, and that's a risk that people wishing to run 
reasonably current code simply have to take as a result of that choice, 
regardless of the distro they chose to get that current code from.


But even if you choose to run an old distro so aren't likely to be hit by 
current upstream compromises, that has and enforces a full signing policy 
so every commit can be accounted for, and even if none of those 
developers at either the distro or upstream levels deliberately breaks 
the trust and goes bad, there's still the issue below...

> 2) That the compilers and interpreters don't do anything except build
> the code?

There's a very famous in security circles paper that effectively proves 
that unless you can absolutely trust every single layer in the build 
line, including the hardware layer (which means its sources) and the 
compiler and tools used to build your operational tools, and the compiler 
and tools used to build them, and... all the way back... you simply 
cannot absolutely trust the results, period.

I never kept the link, but it seems the title actually stuck in memory 
well enough for me to google it: "Reflections on Trusting Trust"
=:^)  Here's the google link:

https://www.google.com/search?q=%22reflections+on+trusting+trust%22


That means that in ordered to absolutely prove the gcc (for example) on 
our own systems, even if we can read and understand every line of gcc 
source, we must absolutely prove the tools on the original installation 
media and in the stage tarballs that we used to build our system.  Which 
means we must not only have the code to them and trust the builders, but 
we must have the code and trust the builders of the tools they used, and 
the builders and tools of those tools, and...

Meanwhile, the same rule effectively applies to the hardware as well.  
And while Richard Stallman may run a computer that is totally open source 
hardware and firmware (down to the BIOS or equivalent), for which he has 
all the schemantics, etc, most of us run at least some semi-proprietary 
hardware of /some/ sort.  Which means even if we /could/ fully understand 
the sources ourselves, without them and without that full understanding, 
at that level, we simply have to trust... someone... basically, the 
people who design and manufacture that hardware.

Thus, in practice, (nearly) everyone ends up drawing the line
/somewhere/.  The Stallmans of the world draw it pretty strictly, 
refusing to run anything which at minimum has replaceable firmware which 
doesn't itself have sources available.  (As Stallman defines it, if the 
firmware is effectively burned in such that the manufacturer themselves 
can't update it, then that's good enough for the line he draws.  Tho that 
leads to absurdities such as an OpenMOKO phone that at extra expense has 
the firmware burned onto a separate chip such that it can't be replaced 
by anyone, in ordered to be able to use hardware that would otherwise be 
running firmware that the supplier refuses to open-source -- because the 
extra expense to do it that way means the manufacturer can't replace the 
firmware either, so it's on the OK side of Stallman's line.)

Meanwhile, I personally draw the line at what runs at the OS level on my 
computer.  That means I won't run proprietary graphics drivers or flash, 
but I will and do load source-less firmware onto the Radeon-based 
graphics hardware I do run, in ordered to use the freedomware kernel 
drivers for the same hardware that I refuse to run the proprietary frglx 
drivers on.

Other people are fine running flash and/or proprietary graphics drivers, 
but won't run a mostly-proprietary full OS such as MS Windows or Apple 
OSX.

Still others prefer to run open source where it fits their needs, but 
won't go out of their way to do so if proprietary works better for them, 
and still others simply don't care either way, running whatever works 
best regardless of the freedom or lack thereof of its sources.

Anyway, when it comes to hardware and compiler, in practice the best you 
can do is run a FLOSS compiler such as gcc, while trusting the tools you 
used to build the first ancestor, basically, the gcc and tools in the 
stage tarballs, as well as whatever you booted (probably either a gentoo-
installer or another distro) in ordered to chroot into that unpacked 
stage and build from there.  Beyond that, well... good luck, but you're 
still going to end up drawing the line /somewhere/.

> There's certainly lots of other issues about security, like protecting
> passwords, protecting physical access to the network and machines, root
> kits and the like, etc., but assuming none of that is in question (I
> don't have any reason to think the NSA has been in my home!) ;-) I'm
> looking for info on how the code is protected from the time it's signed
> off until it's built and running here.
> 
> If someone knows of a good web site to read on this subject let me know.
> I've gone through my Linux life more or less like most everyone went
> through life 20 years ago, but paranoia strikes deep.

Indeed.  Hope the above was helpful.  I think it's a pretty accurate 
picture from at least my own perspective, as someone who cares enough 
about it to at least spend a not insignificant amount of time keeping up 
on the current situation in this area, both for linux in general, and for 
gentoo in particular.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

[gentoo-amd64] Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?)

Reply via email to