On Thu, Nov 16, 2017 at 11:19:54AM +0100, Michał Górny wrote:
> [snip]
> Abstract
> ========
> 
> This GLEP extends the Manifest file format to cover full-tree file
> integrity and authenticity checks.The format aims to be future-proof,

Missing a space after the first sentence, between "checks." and "The".

> efficient and provide means of backwards compatibility.

Could use an Oxford comma after "efficient", but it's a style choice. Up
to you.

> 
> 
> Motivation
> ==========
> 
> The Manifest files as defined by GLEP 44 [#GLEP44]_ provide the current
> means of verifying the integrity of distfiles and package files
> in Gentoo. Combined with OpenPGP signatures, they provide means to
> ensure the authenticity of the covered files. However, as noted
> in GLEP 57 [#GLEP57]_ they lack the ability to provide full-tree
> authenticity verification as they do not cover any files outside
> the package directory. In particular, they provide multiple ways
> for a third party to inject malicious code into the ebuild environment.
> 
> Historically, the topic of providing authenticity coverage for the whole
> repository has been mentioned multiple times. The most noteworthy effort
> are GLEPs 58 [#GLEP58]_ and 60 [#GLEP60]_ by Robin H. Johnson from 2008.
> They were accepted by the Council in 2010 but have never been
> implemented. When potential implementation work started in 2017, a new
> discussion about the specification arose. It prompted the creation
> of a competing GLEP that would provide a redesigned alternative to
> the old GLEPs.

No correction, but I really like the inclusion of history here. It gives
the reader more context, should they have questions about prior
discussions.

> [snip]
> 1. It is more future-proof. If an incompatible change to the repository
>    format is introduced, only developers need to be upgrade the tools
>    they use to generate the Manifests. The tools used to verify
>    the updated Manifests will continue to work.

"be upgrade" -> "upgrade"

> 
> [snip]
> While both models have their advantages, the hierarchical model was
> selected because it reduces the number of OpenPGP operations
> which are comparatively costly to the minimum.

It seems like "which are comparatively costly" should be in parentheses,
or separated by some other punctuation like en- or em-dashes. e.g.

"... because it reduces the number of OpenPGP operations (which are
comparatively costly) to the minimum."

or

"... because it reduces the number of OpenPGP operations – which are
comparatively costly – to the minimum."

(Note en-dash was used (0x2013), not a regular hyphen (0x2D).)

Or something like that.

> 
> [snip]
> Non-strict Manifest verification
> --------------------------------
> 
> Originally the Manifest2 format provided a special ``MISC`` tag that
> was used for ``metadata.xml`` and ``ChangeLog`` files. This tag
> indicated that the Manifest verification failures could be ignored for
> those files unless the package manager was working in strict mode.
> 
> The first versions of this specification continued the use of this tag.
> However, after a long debate it was decided to deprecate it along with
> the non-strict behavior, and require all files to strictly match.

It may be outside the scope of the GLEP, but a link to said long debate
might be relevant to the reader, especially if they have suggestions or
points that have already been discussed in the debate.

> [snip]
> Finally, the non-strict mode could be used as means to an attack.
> The allowance of missing or modified documentation file could be used
> to spread misinformation, resulting in bad decisions made by the user.
> A modified file could also be used e.g. to exploit vulnerabilities
> of an XML parser.

"used e.g." -> "used, e.g."

Helps it reflect the way it would be spoken.

> 
> 
> Timestamp field
> ---------------
> 
> The top-level Manifests optionally allows using a ``TIMESTAMP`` tag
> to include a generation timestamp in the Manifest. A similar feature
> was originally proposed in GLEP 58 [#GLEP58]_.

"Manifests" and "allows" disagree grammatically -- one of them needs to
drop the "s". Context clues indicate a singular top-level Manifest.

> 
> A malicious third-party may use the principles of exclusion or replay
> [#C08]_ to deny an update to clients, while at the same time recording
> the identity of clients to attack. The timestamp field can be used to
> detect that.
> 
> In order to provide a more complete protection, the Gentoo
> Infrastructure should provide an ability to obtain the timestamps
> of all Manifests from a recent timeframe over a secure channel
> from a trusted source for comparison.

"a more complete protection"; should probably drop the "a".

> 
> Strictly speaking, this information is already provided by the various
> ``metadata/timestamp*`` files that are already present. However,
> including the value in the Manifest itself has a little cost
> and provides the ability to perform the verification stand-alone.
> 
> Furthermore, some of the timestamp files are added very late
> in the distribution process, past the Manifest generation phase. Those
> files will most likely receive ``IGNORE`` entries and therefore
> be not suitable to safe use.

Looks like a few extra words in the last sentence. Here's my attempt:

"These files will likely receive ``IGNORE`` entries and therefore be
unsafe to use."

("unsuitable" may replace "unsafe", up to you)

> 
> The specification permits additional timestamps in sub-Manifest files
> for local use. A generic testing tool should ignore them.
> 
> 
> New vs deprecated tags
> ----------------------
> 
> Out of the four types defined by Manifest2, only one is reused
> and the remaining three is replaced by a single, universal ``DATA``
> type.

"the remaining three is" -> "the remaining three are"

> [snip]
> Injecting ChangeLogs into the checkout
> --------------------------------------
> 
> One of the problems considered in the new Manifest format was that
> of injecting historical and autogenerated ChangeLog into the repository.
> Normally we are not including those files to reduce the checkout size.
> However, some users have shown interest in them and Infra is working
> on providing them via an additional rsync module.

"that of" is extraneous here.

The second sentence should read something like "We normally don't
include these files, to reduce checkout size."

> 
> [snip]
> Hash algorithms
> ---------------
> 
> While maintaining a consistent supported hash set is important
> for interoperability, it is no good fit for the generic layout of this
> GLEP. Furthermore, it would require updating the GLEP in the future
> every time the used algorithms change.

"it is no good fit" -> "it is not a good fit"

> 
> [snip]
> The compression of top-level Manifest file has been prohibited
> as the specification currently does not provide any means of verifying
> the file prior to decompression. This would make it possibly for
> a malicious third party to provide a compressed Manifest exposing
> decompressor vulnerabilities, or being a zip bomb, and the tooling
> would have to unpack it before being able to verify the contents.

The latter half of the paragraph is a little scattered. Here's my
attempt, after the first sentence:

"If the top-level Manifest is compressed, tooling will have to unpack
the file before being able to verify the contents. This makes it
possible for a malicious third party to attack a system by providing a
compressed Manifest that exposes decompressor vulnerabilities, or a zip
bomb."

(Maybe 'zip bomb' should be a link or a footnote, describing what it
is.)

> [snip]
> 
> The existence of additional entries for uncompressed Manifest checksums
> was debated. However, plain entries for the uncompressed file would
> be confusing if only compressed file existed, and conflicting if both

"only compressed" -> "only the compressed"

> uncompressed and compressed variants existed. Furthermore, it has been
> pointed out that ``DIST`` entries do not have uncompressed variant

"uncompressed variant" -> "an uncompressed variant"

> either.
> 
> 
> Performance considerations
> --------------------------
> 
> Performing a full-tree verification on every sync raises some
> performance concerns for end-user systems. The initial testing has shown
> that a cold-cache verification on a btrfs file system can take up around
> 4 minutes, with the process being mostly I/O bound. On the other hand,
> it can be expected that the verification will be performed directly
> after syncing, taking advantage of warm filesystem cache.

"warm" -> "a warm"

> 
> [snip]
> Thanks to all the people whose contributions were invaluable
> to the creation of this GLEP. This includes but is not limited to:
> 
> - Robin Hugh Johnson,
> - Ulrich Müller.
> 
> Additionally, thanks to Robin Hugh Johnson for the original
> MataManifest GLEP series which served both as inspiration and source

"MataManifest" -> "MetaManifest"

>
> [snip]
> 

Aside from the few nitpicks this looks good. Hope this helps.

-- 
Daniel Campbell - Gentoo Developer, Trustee, Treasurer
OpenPGP Key: 0x1EA055D6 @ hkp://keys.gnupg.net
fpr: AE03 9064 AE00 053C 270C  1DE4 6F7A 9091 1EA0 55D6

Attachment: signature.asc
Description: Digital signature

Reply via email to