On Thu, Nov 16, 2017 at 11:19:54AM +0100, Michał Górny wrote: > [snip] > Abstract > ======== > > This GLEP extends the Manifest file format to cover full-tree file > integrity and authenticity checks.The format aims to be future-proof,
Missing a space after the first sentence, between "checks." and "The". > efficient and provide means of backwards compatibility. Could use an Oxford comma after "efficient", but it's a style choice. Up to you. > > > Motivation > ========== > > The Manifest files as defined by GLEP 44 [#GLEP44]_ provide the current > means of verifying the integrity of distfiles and package files > in Gentoo. Combined with OpenPGP signatures, they provide means to > ensure the authenticity of the covered files. However, as noted > in GLEP 57 [#GLEP57]_ they lack the ability to provide full-tree > authenticity verification as they do not cover any files outside > the package directory. In particular, they provide multiple ways > for a third party to inject malicious code into the ebuild environment. > > Historically, the topic of providing authenticity coverage for the whole > repository has been mentioned multiple times. The most noteworthy effort > are GLEPs 58 [#GLEP58]_ and 60 [#GLEP60]_ by Robin H. Johnson from 2008. > They were accepted by the Council in 2010 but have never been > implemented. When potential implementation work started in 2017, a new > discussion about the specification arose. It prompted the creation > of a competing GLEP that would provide a redesigned alternative to > the old GLEPs. No correction, but I really like the inclusion of history here. It gives the reader more context, should they have questions about prior discussions. > [snip] > 1. It is more future-proof. If an incompatible change to the repository > format is introduced, only developers need to be upgrade the tools > they use to generate the Manifests. The tools used to verify > the updated Manifests will continue to work. "be upgrade" -> "upgrade" > > [snip] > While both models have their advantages, the hierarchical model was > selected because it reduces the number of OpenPGP operations > which are comparatively costly to the minimum. It seems like "which are comparatively costly" should be in parentheses, or separated by some other punctuation like en- or em-dashes. e.g. "... because it reduces the number of OpenPGP operations (which are comparatively costly) to the minimum." or "... because it reduces the number of OpenPGP operations – which are comparatively costly – to the minimum." (Note en-dash was used (0x2013), not a regular hyphen (0x2D).) Or something like that. > > [snip] > Non-strict Manifest verification > -------------------------------- > > Originally the Manifest2 format provided a special ``MISC`` tag that > was used for ``metadata.xml`` and ``ChangeLog`` files. This tag > indicated that the Manifest verification failures could be ignored for > those files unless the package manager was working in strict mode. > > The first versions of this specification continued the use of this tag. > However, after a long debate it was decided to deprecate it along with > the non-strict behavior, and require all files to strictly match. It may be outside the scope of the GLEP, but a link to said long debate might be relevant to the reader, especially if they have suggestions or points that have already been discussed in the debate. > [snip] > Finally, the non-strict mode could be used as means to an attack. > The allowance of missing or modified documentation file could be used > to spread misinformation, resulting in bad decisions made by the user. > A modified file could also be used e.g. to exploit vulnerabilities > of an XML parser. "used e.g." -> "used, e.g." Helps it reflect the way it would be spoken. > > > Timestamp field > --------------- > > The top-level Manifests optionally allows using a ``TIMESTAMP`` tag > to include a generation timestamp in the Manifest. A similar feature > was originally proposed in GLEP 58 [#GLEP58]_. "Manifests" and "allows" disagree grammatically -- one of them needs to drop the "s". Context clues indicate a singular top-level Manifest. > > A malicious third-party may use the principles of exclusion or replay > [#C08]_ to deny an update to clients, while at the same time recording > the identity of clients to attack. The timestamp field can be used to > detect that. > > In order to provide a more complete protection, the Gentoo > Infrastructure should provide an ability to obtain the timestamps > of all Manifests from a recent timeframe over a secure channel > from a trusted source for comparison. "a more complete protection"; should probably drop the "a". > > Strictly speaking, this information is already provided by the various > ``metadata/timestamp*`` files that are already present. However, > including the value in the Manifest itself has a little cost > and provides the ability to perform the verification stand-alone. > > Furthermore, some of the timestamp files are added very late > in the distribution process, past the Manifest generation phase. Those > files will most likely receive ``IGNORE`` entries and therefore > be not suitable to safe use. Looks like a few extra words in the last sentence. Here's my attempt: "These files will likely receive ``IGNORE`` entries and therefore be unsafe to use." ("unsuitable" may replace "unsafe", up to you) > > The specification permits additional timestamps in sub-Manifest files > for local use. A generic testing tool should ignore them. > > > New vs deprecated tags > ---------------------- > > Out of the four types defined by Manifest2, only one is reused > and the remaining three is replaced by a single, universal ``DATA`` > type. "the remaining three is" -> "the remaining three are" > [snip] > Injecting ChangeLogs into the checkout > -------------------------------------- > > One of the problems considered in the new Manifest format was that > of injecting historical and autogenerated ChangeLog into the repository. > Normally we are not including those files to reduce the checkout size. > However, some users have shown interest in them and Infra is working > on providing them via an additional rsync module. "that of" is extraneous here. The second sentence should read something like "We normally don't include these files, to reduce checkout size." > > [snip] > Hash algorithms > --------------- > > While maintaining a consistent supported hash set is important > for interoperability, it is no good fit for the generic layout of this > GLEP. Furthermore, it would require updating the GLEP in the future > every time the used algorithms change. "it is no good fit" -> "it is not a good fit" > > [snip] > The compression of top-level Manifest file has been prohibited > as the specification currently does not provide any means of verifying > the file prior to decompression. This would make it possibly for > a malicious third party to provide a compressed Manifest exposing > decompressor vulnerabilities, or being a zip bomb, and the tooling > would have to unpack it before being able to verify the contents. The latter half of the paragraph is a little scattered. Here's my attempt, after the first sentence: "If the top-level Manifest is compressed, tooling will have to unpack the file before being able to verify the contents. This makes it possible for a malicious third party to attack a system by providing a compressed Manifest that exposes decompressor vulnerabilities, or a zip bomb." (Maybe 'zip bomb' should be a link or a footnote, describing what it is.) > [snip] > > The existence of additional entries for uncompressed Manifest checksums > was debated. However, plain entries for the uncompressed file would > be confusing if only compressed file existed, and conflicting if both "only compressed" -> "only the compressed" > uncompressed and compressed variants existed. Furthermore, it has been > pointed out that ``DIST`` entries do not have uncompressed variant "uncompressed variant" -> "an uncompressed variant" > either. > > > Performance considerations > -------------------------- > > Performing a full-tree verification on every sync raises some > performance concerns for end-user systems. The initial testing has shown > that a cold-cache verification on a btrfs file system can take up around > 4 minutes, with the process being mostly I/O bound. On the other hand, > it can be expected that the verification will be performed directly > after syncing, taking advantage of warm filesystem cache. "warm" -> "a warm" > > [snip] > Thanks to all the people whose contributions were invaluable > to the creation of this GLEP. This includes but is not limited to: > > - Robin Hugh Johnson, > - Ulrich Müller. > > Additionally, thanks to Robin Hugh Johnson for the original > MataManifest GLEP series which served both as inspiration and source "MataManifest" -> "MetaManifest" > > [snip] > Aside from the few nitpicks this looks good. Hope this helps. -- Daniel Campbell - Gentoo Developer, Trustee, Treasurer OpenPGP Key: 0x1EA055D6 @ hkp://keys.gnupg.net fpr: AE03 9064 AE00 053C 270C 1DE4 6F7A 9091 1EA0 55D6
signature.asc
Description: Digital signature