Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format
On Sun, 2018-11-18 at 12:00 +0100, Fabian Groffen wrote: > On 18-11-2018 10:38:51 +0100, Michał Górny wrote: > > On Sun, 2018-11-18 at 10:16 +0100, Fabian Groffen wrote: > > > On 17-11-2018 12:21:40 +0100, Michał Górny wrote: > > > > Problems with the current binary package format > > > > --- > > > > > > > > The following problems were identified with the package format currently > > > > in use: > > > > > > > > 1. **The packages rely on custom binary archive format to store > > > >metadata.** It is entirely Gentoo invented, and requires dedicated > > > >tooling to work with it. In fact, the reference implementation > > > >in Portage does not even include a CLI tool to work with tbz2 > > > >packages; an unofficial implementation is provided as part > > > >of portage-utils toolkit [#PORTAGE-UTILS]_. > > > > > > I think you should rewrite this section to the argument that the > > > metadata is hard to edit, and that there is only one tool to do so > > > (except a python interface from Portage?). > > > On a separate note, I don't think portage-utils can be considered > > > "unofficial", it is a Gentoo official project as far as I am aware. > > > > In this context, Portage is 'official'. Portage-utils is a project > > that's developed entirely separately from Portage and doesn't use > > Portage APIs but instead reinvents everything. As such, it is easy for > > the two to go out of sync. Or for one of them to have bugs that > > the other one doesn't have (say, with endianness). > > I'm not sure if it's actually true, I was under the impression the same > author(s) worked on the Portage as well as portage-utils code. Anyway, > aren't quickpkg and emerge enough from a user's perspective? Gentoo users have a wide perspective. Assuming that you can think of all things the users need and you don't need to care beyond that is plain wrong and results in Windows. > > > > 2. **The format relies on obscure compressor feature of ignoring > > > >trailing garbage**. While this behavior is traditionally implemented > > > >by many compressors, the original reasons for it have become long > > > >irrelevant and it is not surprising that new compressors do not > > > >support it. In particular, Portage already hit this problem twice: > > > >once when users replaced bzip2 with parallel-capable pbzip2 > > > >implementation [#PBZIP2]_, and the second time when support for zstd > > > >compressor was added [#ZSTD]_. > > > > > > I think this is actually the result of a rather opportunistic > > > implementation. The fault is that we chose to use an extension that > > > suggests the file is a regular compressed tarball. > > > When one detects that a file is xpak padded, it is trivial to feed the > > > decompressor just the relevant part of the datastream. The format > > > itself isn't bad, and doesn't rely on obscure behaviour. > > > > Except if you don't have the proper tools installed. In which case > > the 'opportunistic' behavior made it possible to extract the contents > > without special tools... except when it actually happens not to work > > anymore. Roy's reply indicates that there is actually interest in this > > design feature. > > Your point is that the format is broken (== relies on obscure compressor > feature). My point is that the format simply requires a special tool. > The fact that we prefer to use existing tools doesn't imply in any way > that the format is broken to me. > I think you should rewrite your point to mention that you don't want to > use a tool that doesn't exist in @system (?) to unpack a binpkg. My > guess is that you could use some head/tail magic in a script if the > trailing block is upsetting the decompressor. > > I'm not saying this may look ugly, I'm just saying that your point seems > biased. I've spent a significant effort rewriting those point to make it clear what the problem is, and separating it from other changes 'worth doing while we're changing stuff'. Hope that satisfies your nitpicking. > > > > 3. **Placing metadata at the end of file makes partial fetches > > > >complex.** While it is technically possible to obtain package > > > >metadata remotely without fetching the whole package, it usually > > > >requires e.g. 2-3 HTTP requests with rather complex driver. For > > > >comparison, if metadata was placed at the beginning of the file, > > > >early-terminated pipeline with a single fetch request would suffice. > > > > > > I think this point needs to be quantified somewhat why it is so > > > important. > > > I may be wrong, but the average binpkg is small, <1MiB, bigger packages > > > are <50MiB. > > > So what is the gain to be saved here? A "few" MiBs for what operation > > > exactly? I say "few" because I know for some users this is actually not > > > just a blib before it's downloaded. So if this is possible to achieve, > > > in what scenarios
Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format
On 21-11-2018 10:33:18 +0100, Michał Górny wrote: > > > > > 2. **The format relies on obscure compressor feature of ignoring > > > > >trailing garbage**. While this behavior is traditionally > > > > > implemented > > > > >by many compressors, the original reasons for it have become long > > > > >irrelevant and it is not surprising that new compressors do not > > > > >support it. In particular, Portage already hit this problem twice: > > > > >once when users replaced bzip2 with parallel-capable pbzip2 > > > > >implementation [#PBZIP2]_, and the second time when support for > > > > > zstd > > > > >compressor was added [#ZSTD]_. > > > > > > > > I think this is actually the result of a rather opportunistic > > > > implementation. The fault is that we chose to use an extension that > > > > suggests the file is a regular compressed tarball. > > > > When one detects that a file is xpak padded, it is trivial to feed the > > > > decompressor just the relevant part of the datastream. The format > > > > itself isn't bad, and doesn't rely on obscure behaviour. > > > > > > Except if you don't have the proper tools installed. In which case > > > the 'opportunistic' behavior made it possible to extract the contents > > > without special tools... except when it actually happens not to work > > > anymore. Roy's reply indicates that there is actually interest in this > > > design feature. > > > > Your point is that the format is broken (== relies on obscure compressor > > feature). My point is that the format simply requires a special tool. > > The fact that we prefer to use existing tools doesn't imply in any way > > that the format is broken to me. > > I think you should rewrite your point to mention that you don't want to > > use a tool that doesn't exist in @system (?) to unpack a binpkg. My > > guess is that you could use some head/tail magic in a script if the > > trailing block is upsetting the decompressor. > > > > I'm not saying this may look ugly, I'm just saying that your point seems > > biased. > > I've spent a significant effort rewriting those point to make it clear > what the problem is, and separating it from other changes 'worth doing > while we're changing stuff'. Hope that satisfies your nitpicking. Yes it does, thank you. > > > > > 3. **Placing metadata at the end of file makes partial fetches > > > > >complex.** While it is technically possible to obtain package > > > > >metadata remotely without fetching the whole package, it usually > > > > >requires e.g. 2-3 HTTP requests with rather complex driver. For > > > > >comparison, if metadata was placed at the beginning of the file, > > > > >early-terminated pipeline with a single fetch request would > > > > > suffice. > > > > > > > > I think this point needs to be quantified somewhat why it is so > > > > important. > > > > I may be wrong, but the average binpkg is small, <1MiB, bigger packages > > > > are <50MiB. > > > > So what is the gain to be saved here? A "few" MiBs for what operation > > > > exactly? I say "few" because I know for some users this is actually not > > > > just a blib before it's downloaded. So if this is possible to achieve, > > > > in what scenarios is this going to be used (and is this often?). > > > > > > Last I checked, Gentoo aimed to support more users than the 'majority' > > > of people with high-throughput Internet access. If there's no cost > > > in doing things better, why not do them better? > > > > You didn't address the critical question, but instead just repeated what > > I said. > > So again, why do you need to read just the metadata? > > The original idea was to provide the ability of indexing remote packages > without having a server-side cache available (or up-to-date). In order > to do that, the package manager would need to fetch the metadata of all > packages (but there's no necessity in fetching the whole packages). > However, that's merely a possible future idea. It's not worth debating > today. > > Today I really understood the point of avoiding premature optimization. > Even if the change is practically zero-cost and harmless (as it's simply > reordering files), it's going to cost you a lot of time because someone > will keep nitpicking on it, even though any other order will not change > anything. Perhaps next time don't put as much emphasize on it. I can see now what you aim for, but it simply raises more questions and concerns to me than it resolves. There is nothing wrong with putting in such future possibility though, if easily possible and not colliding with anything else. > > > > > 4. **Extending the format with OpenPGP signatures is non-trivial.** > > > > >Depending on the implementation details, it either requires > > > > > fetching > > > > >additional detached signature, breaking backwards compatibility or > > > > >introducing more custom logic to reassemble OpenPGP packets. > > > > > > > > I think one
Re: [gentoo-dev] [pre-GLEP] Gentoo binary package container format
On Wed, 2018-11-21 at 11:45 +0100, Fabian Groffen wrote: > > > > > > 5. **Metadata is not compressed.** This is not a significant > > > > > > problem, > > > > > >it is just listed for completeness. > > > > > > > > > > > > > > > > > > Goals for a new container format > > > > > > > > > > > > > > > > > > The following goals have been set for a replacement format: > > > > > > > > > > > > 1. **The packages must remain contained in a single file.** As a > > > > > > matter > > > > > >of user convenience, it should be possible to transfer binary > > > > > >packages without having to use multiple files, and to install > > > > > > them > > > > > >from any location. > > > > > > > > > > > > 2. **The file format must be entirely based on common file formats, > > > > > >respecting best practices, with as little customization as > > > > > > necessary > > > > > >to satisfy the requirements.** In particular, it is unacceptable > > > > > >to create new binary formats. > > > > > > > > > > I take this as your personal opinion. I don't quite get why it is > > > > > unacceptable to create a new binary format though. In particular when > > > > > you're looking for efficiency, such format could serve your purposes. > > > > > As long as it's clearly defined, I don't see the problem with a binary > > > > > format either. > > > > > Could you add why it is you think binary formats are unacceptable > > > > > here? > > > > > > > > Because custom binary formats require specialized tooling, and are > > > > a royal PITA when the user wants to do something that the author of > > > > specialized tooling just happened not to think worthwhile, or when > > > > the tooling is not available for some reason. And before you ask really > > > > silly questions, yes, I did fight binary packages over hex editor > > > > at some point. > > > > > > Which I still don't understand, to be frank. I think even Portage > > > exposes python APIs to get to the data. > > > > Compare the time needed to make a trivial (but unforeseen) change > > on a format that's transparent vs a format that requires you to learn > > its spec and/or API, write a program and debug it. > > I was under the impression you could unpack a tbz2 into data and xpak, > then unpack both, modify the contents with an editor or whatever, and > then pack the whole stuff back into a tbz2 again. This can be done > worst case scenario by emerge -k , modifying the vdb and quickpkg > afterwards. In the described example, the whole necessity of modifying the binary package arises from it being broken, therefore unsuitable for 'emerge -k'. > I know that with portage-utils you can do this easily with the qtbz2 and > qxpak commands. No need to do anything with a hex editor, or know > anything about how it's done. Actually, you need to: a. know that portage-utils has the appropriate tools (it's non-obvious), b. know how to use portage-utils. This is non-obvious. It took me a while to figure out that I need to use qtbz2 before using qxpak (why would it work only on split data when the format is explicitly written to be used on top of compressed archive?!). > Obvious advantage of your approach is that you don't need q* tools, but > can use tar instead. The editting is as trivial though. In your case > you need a special procedure to reconstruct the binpkg should you want > to keep your special properties (label, order) which equates to q* tools > somewhat. Except you don't need to keep them. The spec is quite explicit that they're optimizations and that the package must work even if they're lost as a part of editing exercise. > > > > > The most trivial case is an attempted recovery of a broken system. > > > > If you don't have Portage working and don't have portage-utils > > > > installed, do you really prefer a custom format which will require you > > > > to fetch and compile special tools? Or is one that can be processed > > > > with tools you're quite likely to have on every system, like tar? > > > > > > Well, I think the idea behind the original binpkg format was to use tar > > > directly on the files in emergency scenarios like these... > > > The assumption was bzip2 decompressor and tar being available. > > > I think it is an example of how you add something, while still allowing > > > to fallback on existing tools. > > > > Except progress in compressors has made it work less and less reliably. > > It's mostly an example how to be *clever*. However, being clever > > usually doesn't pay off in the long term, compared to doing things *in a > > simple way*. > > We agree it is hackish, and we agree we can do without. You simply > exaggerate the problem, IMO, which mostly isn't there, because it works > fine today. It can also be solved today using shell tools. > > % head -c `grep -abo 'XPAKPACK' > $EPREFIX/usr/portage/packages/sys-apps/sed-4.5.tbz2 | sed 's/:.*$//'` > $EPREFIX/usr/portage/package
Re: [gentoo-dev] [pre-GLEP r2] Gentoo binary package container format
On 20-11-2018 21:33:17 +0100, Michał Górny wrote: > The volume label > > > The volume label provides an easy way for users to identify the binary > package without dedicated tooling or specific format knowledge. > > The implementations should include a volume label consisting of fixed > string ``gpkg:``, followed by a single space, followed by full package > identifier. However, the implementations must not rely on the volume > label being present or attempt to parse its value when it is. > > Furthermore, since the volume label is included in the .tar archive > as the first member, it provides a magic string at a fixed location > that can be used by tools such as file(1) to easily distinguish Gentoo > binary packages from regular .tar archives. Just for clarity on this point. Are you proposing that we patch file(1) to print the Volume Header here? file-5.35 seems to not say much but "tar archive" or "POSIX tar archive" for tar-files containing a Volume Header as shown by tar -tv. > Container and archive formats > - > > During the debate, the actual archive formats to use were considered. > The .tar format seemed an obvious choice for the image archive since > it is the only widely deployed archive format that stores all kinds > of file metadata on POSIX systems. However, multiple options for > the outer format has been debated. You mention POSIX, which triggered me. I think it would be good to specify which tar format to use. POSIX.1-2001/pax format doesn't have a 100/256 char filename length restriction, which is good but it is not (yet) used by default by GNU tar. busybox tar can read pax tars, it seems. Thanks, Fabian -- Fabian Groffen Gentoo on a different level signature.asc Description: PGP signature
Re: [gentoo-dev] [pre-GLEP r2] Gentoo binary package container format
On Wed, 2018-11-21 at 14:10 +0100, Fabian Groffen wrote: > On 20-11-2018 21:33:17 +0100, Michał Górny wrote: > > The volume label > > > > > > The volume label provides an easy way for users to identify the binary > > package without dedicated tooling or specific format knowledge. > > > > The implementations should include a volume label consisting of fixed > > string ``gpkg:``, followed by a single space, followed by full package > > identifier. However, the implementations must not rely on the volume > > label being present or attempt to parse its value when it is. > > > > Furthermore, since the volume label is included in the .tar archive > > as the first member, it provides a magic string at a fixed location > > that can be used by tools such as file(1) to easily distinguish Gentoo > > binary packages from regular .tar archives. > > Just for clarity on this point. > Are you proposing that we patch file(1) to print the Volume Header here? > file-5.35 seems to not say much but "tar archive" or "POSIX tar archive" > for tar-files containing a Volume Header as shown by tar -tv. I'm wondering about that as well, yes. However, my main idea is to specifically detect 'gpkg:' there and use it to explicitly identify the file as Gentoo binary package (and print package name). > > > Container and archive formats > > - > > > > During the debate, the actual archive formats to use were considered. > > The .tar format seemed an obvious choice for the image archive since > > it is the only widely deployed archive format that stores all kinds > > of file metadata on POSIX systems. However, multiple options for > > the outer format has been debated. > > You mention POSIX, which triggered me. I think it would be good to > specify which tar format to use. > > POSIX.1-2001/pax format doesn't have a 100/256 char filename length > restriction, which is good but it is not (yet) used by default by GNU > tar. busybox tar can read pax tars, it seems. > I think the modern GNU tar format is the obvious choice here. I think it doesn't suffer any portability problems these days, and is more compact than the PAX format. -- Best regards, Michał Górny signature.asc Description: This is a digitally signed message part
[gentoo-dev] Last rites: net-nds/gosa-* packages
# Tiziano Müller (21 Nov 2018) # Project is in maintenance-only mode with the last big release in 2012. # Needs a dedicated maintainer with a matching LDAP setup (extra schemas required). # Several open issues (#370985, #356827, #399845, #544562, #651092) and one security # bug (bug #66912). Therefore removal in 30 days. net-nds/gosa-core net-nds/gosa-plugin-mail net-nds/gosa-plugin-samba net-nds/gosa-plugin-systems