commit:     dcd6873798fbd0d88d2d9872015e5093e2dfe87d
Author:     Fabian Groffen <grobian <AT> gentoo <DOT> org>
AuthorDate: Thu Aug 28 15:37:37 2025 +0000
Commit:     Fabian Groffen <grobian <AT> gentoo <DOT> org>
CommitDate: Thu Aug 28 15:38:25 2025 +0000
URL:        https://gitweb.gentoo.org/proj/portage-utils.git/commit/?id=dcd68737

gtree-format.md: add writeup on the format and its benefits

Signed-off-by: Fabian Groffen <grobian <AT> gentoo.org>

 gtree-format.md | 137 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 137 insertions(+)

diff --git a/gtree-format.md b/gtree-format.md
new file mode 100644
index 0000000..0c9e555
--- /dev/null
+++ b/gtree-format.md
@@ -0,0 +1,137 @@
+gtree-1
+=======
+The gtree-1 format, is a container format for Gentoo repositories aiming
+at better performance reading and processing on the repository.  The
+format structure is inspired by GLEP-78, in that it is based on nested
+POSIX tar archives.  The uncompressed outer archive has a name ending
+with `gtree.tar`.
+
+Rationale
+---------
+In portage-utils, operations are often performed on trees.  Trees can be
+VDB (installed packages database) or repositories, with or without
+metadata.  Traversal is done each time by iterating over directory
+structures that make the tree, and reading files as necessary.  These
+I/O operations prove to be costly, inferring a lot of latency, which in
+the synchronous model of portage-utils simply means a lot of time.
+
+To reduce this time, a sequential, large I/O instead of many scattered
+I/O operations is necessary.  This can be achieved using a container
+format, such as a tar archive.  The gtree-1 format is exactly that, a
+single file that can (only) be read sequentually to extract values
+necessary.
+
+It must be noted that this I/O issue also inhibits itself in for
+instance syncing a repository over rsync.  A lot of files must be
+checked and copied, which all is much less intense for remote and
+local disk subsystems if a single sequential I/O is used.  This is
+analogous to how Object Storage systems operate successfully at the time
+of this writing.  Thus, offering a gtree-1 archive as alternative for an
+rsync tree, would in situations where bandwidth is not of concern offer
+serious performance improvements.
+
+A second issue that comes up with trees, is their consistency.  Given
+that they are comprised of many files and directories, it is important
+to have checks and measures to ensure all files are indeed as they are
+supposed to be when looking at derived information, such as metadata
+cache.  Of course integrity of the data, as in, that it is not tampered
+with is an additional issue on top of this.  These checks and measures
+require additional I/Os which further slow down any access to the data.
+With a single gtree-1 archive, none of these problems exist, as the
+archive itself is considered consitent, and cryptographic signatures
+inside the files can be used to verify that in a single read.
+
+Format
+------
+A gtree-1 archive is an uncompressed tar archive whose filename should
+end with `.gtree.tar`.  The format used is the POSIX ustar format as
+defined by POSIX.1-2017.  The archive contains the following files, in
+order:
+1. The archive identifier file `gtree-1`, required
+2. The repository data file `repo.tar{compr}`, required but compr is optional
+3. The signature for the repository data file `repo.tar{compr}.sig`, optional
+
+### The archive identifier
+The archive identifier file serves the purpose of identifying the
+repository container format and its version.
+
+The current identifier is `gtree-1`.  The file can have any contents,
+and may be empty.  It is encouraged to use the program and its version
+that created the archive as contents for this file.
+
+Note that it is important that this member is the first in the archive
+for implementations to efficiently establish compatibility.  It also
+allows tools like file(1) to identify such file accordingly.
+
+### The repository data
+The actual repository data is stored in this entry.  It is another,
+nested, POSIX ustar tar archive and it is highly recommended to use
+Zstandard compression on this archive to reduce the overall size of the
+gtree-1 container with the best read performance.  Consider the
+following table:
+
+| compressor | q -c         | qlist -IStv |
+| ---------- | ------------ | ----------- |
+| Zstd (19)  | 25.4MiB, 22s | 0.12s       |
+| XZ         | 25.2MiB, 34s | 0.24s       |
+| Bzip2      | 29.7MiB, 17s | 0.55s       |
+| gzip       | 38.2MiB, 10s | 0.16s       |
+| LZ4        | 57.5MiB,  9s | 0.13s       |
+
+Zstandard compression levels 3 and 9 would take 8 and 12 seconds
+respectively, but since the creation is done only once, the cost here
+does not matter much, the resulting size though does.  XZ beats by a
+fraction, but takes twice the time to decompress.  Since we do this
+multiple times, it is really beneficial to use Zstandard in this case,
+which only LZ4 comes close, but for more than twice the data size.
+
+This archive contains the following members (all optional), in the order
+mentioned below to allow a reader to stop reading/processing in most
+cases:
+1. repository, contains the repository name
+2. caches/{CAT}/{PF}, metadata cache entries as single key-val data
+3. ebuilds/{CAT}/{PN}/metadata.xml, PN metadata.xml file
+4. ebuilds/{CAT}/{PN}/Manifest, Manifest entries for DIST files
+5. ebuilds/{CAT}/{PN}/files/..., files for the ebuilds
+6. ebuilds/{CAT}/{PN}/{PF}.ebuild, ebuild file
+7. eclasses/{ECLASS}.eclass, eclass file
+
+Performance
+-----------
+While a single container archive has benefits in compression and
+transferability, the main reason for was for performance.  Below follow
+a few tests conducted on the same repository tree with full md5-cache,
+or its equivalent gtree-1.
+
+For traversing the tree, the invocation of `qlist -IStv` is used.  This
+causes `qlist` to look for all ebuilds (t), and list them (I) with their
+SLOT (S) and package version (v).  The SLOT retrieval requires a lookup
+in the metadata cache for the normal repository tree, as its value
+cannot be derived from the directory structure.  In this experiment,
+qlist returned 30321 ebuilds in all cases.  Observe the following
+numbers on a MacBook Air M4 with SSD storage:
+
+|           | md5-cache | gtree |
+| --------- | --------- | ----- |
+| cold run  | 7.20s     | 0.17s |
+| run 1     | 5.30s     | 0.17s |
+| run 2     | 4.81s     | 0.17s |
+| run 3     | 5.05s     | 0.17s |
+
+As can be observed, the gtree results are far more stable, and
+outperform the md5-cache runs on this system.
+
+On disk and networked systems this difference is only worse.  Consider
+the same on a NFS-mounted volume where the repository and/or gtree
+resides:
+
+|           | md5-cache | gtree |
+| --------- | --------- | ----- |
+| run 1     | 2:32m     | 0.64s |
+
+While the gtree case is still under a second, the md5-cache case takes
+minutes, because on NFS the I/O latency problem is magnified, while the
+throughput for a single large file is just fine.
+
+Conclusion is that even on the fastest disk subsystems, the gtree-1 format
+appears very beneficial for overall performance.

Reply via email to