Re: Let's shrink Packages.xz

2014-07-28 Thread Ian Jackson
Russ Allbery writes ("Re: Let's shrink Packages.xz"): > Ian Jackson writes: > > But the problem with lots of small packages is not that the Packages.xz > > has too many bytes. > > > > It's that the packaging tools, UIs (for users and developers

Re: Let's shrink Packages.xz

2014-07-25 Thread Gerrit Pape
On Fri, Jul 25, 2014 at 10:07:25AM -0700, Russ Allbery wrote: > Ian Jackson writes: > > But the problem with lots of small packages is not that the Packages.xz > > has too many bytes. > > It's that the packaging tools, UIs (for users and developers), and > > humans, need to think about too many pa

Re: Let's shrink Packages.xz

2014-07-25 Thread Russ Allbery
Ian Jackson writes: > But the problem with lots of small packages is not that the Packages.xz > has too many bytes. > It's that the packaging tools, UIs (for users and developers), and > humans, need to think about too many packages. > This makes packaging tools slow, UIs cluttered, and humans

Re: Let's shrink Packages.xz

2014-07-25 Thread Matt Zagrabelny
On Fri, Jul 25, 2014 at 6:50 AM, Ian Jackson wrote: >> Reducing the size of Packages.xz by 11% or 22% would leave room for quite >> a lot of small packages while not making the problem any worse than it is >> today. > > But the problem with lots of small packages is not that the > Packages.xz has

Re: Let's shrink Packages.xz

2014-07-25 Thread Ian Jackson
Russ Allbery writes ("Re: Let's shrink Packages.xz"): > I'm fairly sure Jakub's message was in response to the recent discussion > about small Node.js packages and the frequent complaints that we should > not introduce small packages into the archive because it bl

Re: Let's shrink Packages.xz

2014-07-18 Thread Chris Bannister
On Wed, Jul 16, 2014 at 08:40:29PM +0200, Ondřej Surý wrote: > On Wed, Jul 16, 2014, at 19:28, Russ Allbery wrote: > > Ondřej Surý writes: > > > On Mon, Jul 14, 2014, at 18:25, Jakub Wilk wrote: > > > > >> Food for thought: > > >> Which fields take up most space in Packages.xz[0]? > > > > > I am

Re: Let's shrink Packages.xz

2014-07-16 Thread Ondřej Surý
On Wed, Jul 16, 2014, at 19:28, Russ Allbery wrote: > Ondřej Surý writes: > > On Mon, Jul 14, 2014, at 18:25, Jakub Wilk wrote: > > >> Food for thought: > >> Which fields take up most space in Packages.xz[0]? > > > I am still lost - what problem are we trying to solve here? > > Could we at least

Re: Let's shrink Packages.xz

2014-07-16 Thread Russ Allbery
Ondřej Surý writes: > On Mon, Jul 14, 2014, at 18:25, Jakub Wilk wrote: >> Food for thought: >> Which fields take up most space in Packages.xz[0]? > I am still lost - what problem are we trying to solve here? > Could we at least define it to see if the problem exists? I'm fairly sure Jakub's me

Re: Let's shrink Packages.xz

2014-07-16 Thread Ondřej Surý
Hi Jakub, On Mon, Jul 14, 2014, at 18:25, Jakub Wilk wrote: > Food for thought: > Which fields take up most space in Packages.xz[0]? I am still lost - what problem are we trying to solve here? Could we at least define it to see if the problem exists? Ondrej -- Ondřej Surý Knot DNS (https://www

Re: Let's shrink Packages.xz

2014-07-16 Thread David Kalnischkies
On Wed, Jul 16, 2014 at 02:23:34PM +0200, David Kalnischkies wrote: > With a slight change in semantic we could drop the field from the > Packages file again anyhow: At the moment it is the MD5sum of the long > description. If it isn't present the clients are expected to calculate > it for themselv

Re: Let's shrink Packages.xz

2014-07-16 Thread David Kalnischkies
On Mon, Jul 14, 2014 at 06:25:47PM +0200, Jakub Wilk wrote: > Description-md5 794.3 KiB 11.9% Needed to provide a mapping as versions change a lot more often than descriptions do; also, historically, Translation-* were outside of the control of ftpmasters (at least, that is what history digg

Re: Let's shrink Packages.xz

2014-07-16 Thread David Kalnischkies
On Mon, Jul 14, 2014 at 12:26:30PM -0500, Jeff Epler wrote: > actually used by current versions of apt. (ideally you'd just go sha256, > but iirc it's the md5sum that is used in practice, even today. but > please find that thread, don't trust my summary) - apt-get --print-uris defaults to MD5 by

Re: Let's shrink Packages.xz

2014-07-14 Thread Russ Allbery
Dimitri John Ledkov writes: > Huh, I'm not quite sure that multiple hashes actually gain us anything > at all in terms of compromisation, since ultimately all our archive > metadata is protected by a single hash only. > Whilst replacing individual files & simultaneously matching multiple > hash

Re: Let's shrink Packages.xz

2014-07-14 Thread Dimitri John Ledkov
On 14 July 2014 20:57, Henrique de Moraes Holschuh wrote: > On Mon, 14 Jul 2014, Jakub Wilk wrote: >> * Peter Palfrader , 2014-07-14, 20:25: >> >>The basic idea is that it's much harder to come up with a >> >>simultaneoush hash collision with both SHA-1 and SHA-2 than >> >>breaking either of them

Re: Let's shrink Packages.xz

2014-07-14 Thread Henrique de Moraes Holschuh
On Mon, 14 Jul 2014, Jakub Wilk wrote: > * Peter Palfrader , 2014-07-14, 20:25: > >>The basic idea is that it's much harder to come up with a > >>simultaneoush hash collision with both SHA-1 and SHA-2 than > >>breaking either of them independently. > > > >ISTR reading papers that put this "much har

Re: Let's shrink Packages.xz

2014-07-14 Thread Henrique de Moraes Holschuh
On Mon, 14 Jul 2014, Nathan Schulte wrote: > "ASCII hex" encodes 4 bits as 8 (or 7. but really 8.), as each ASCII > character is a nibble of the digest; that's a 100% increase (factor > of 2) over the bare digest (or a "raw mapping" of 8 bits of digest > to an 8 bit character set). The figures giv

Re: Let's shrink Packages.xz

2014-07-14 Thread Russ Allbery
Jakub Wilk writes: > You might have had this paper in mind: > https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf > Quoting §4: “If F and G are good iterated hash functions with no attack > better than the generic birthday paradox attack, we claim that the hash > function F||G ob

Re: Let's shrink Packages.xz

2014-07-14 Thread Jakub Wilk
* Peter Palfrader , 2014-07-14, 20:25: The basic idea is that it's much harder to come up with a simultaneoush hash collision with both SHA-1 and SHA-2 than breaking either of them independently. ISTR reading papers that put this "much harder" into doubt. But I can't find those references, a

Re: Let's shrink Packages.xz

2014-07-14 Thread Nathan Schulte
Jeff Epler wrote: First, I tried encoding the various digests as base64 or base93, rather than hex. In each case, the file grew in size; base93 was the worst. Are you sure you performed this calculation correctly? "ASCII hex" encodes 4 bits as 8 (or 7. but really 8.), as each ASCII character

Re: Let's shrink Packages.xz

2014-07-14 Thread Russ Allbery
Peter Palfrader writes: > On Mon, 14 Jul 2014, Russ Allbery wrote: >> Using multiple hashes gives us some theoretical robustness against a >> break in one of the hash functions provided that all clients check all >> the hashes and the hashes would fail independently (which is likely). > I would

Re: Let's shrink Packages.xz

2014-07-14 Thread Peter Palfrader
On Mon, 14 Jul 2014, Russ Allbery wrote: > ابراهیم محمدی writes: > > > Isn't a single (rather small) hash value enough for almost all users? > > Using multiple hashes gives us some theoretical robustness against a break > in one of the hash functions provided that all clients check all the > ha

Re: Let's shrink Packages.xz

2014-07-14 Thread Russ Allbery
ابراهیم محمدی writes: > Isn't a single (rather small) hash value enough for almost all users? Using multiple hashes gives us some theoretical robustness against a break in one of the hash functions provided that all clients check all the hashes and the hashes would fail independently (which is l

Re: Let's shrink Packages.xz

2014-07-14 Thread ابراهیم محمدی
Isn't a single (rather small) hash value enough for almost all users? On Mon, Jul 14, 2014 at 8:55 PM, Jakub Wilk wrote: > Food for thought: > Which fields take up most space in Packages.xz[0]? > > (whole file) 6662.0 KiB 100.0% > SHA256 1463.8 KiB 22.0% > SHA1

Re: Let's shrink Packages.xz

2014-07-14 Thread Jeff Epler
I performed a few little experiments, too. First, I tried encoding the various digests as base64 or base93, rather than hex. In each case, the file grew in size; base93 was the worst. Eliminating all the headers (e.g., replacing Package: foo with simply foo) saved 3.2%. Replacing each one with

Let's shrink Packages.xz

2014-07-14 Thread Jakub Wilk
Food for thought: Which fields take up most space in Packages.xz[0]? (whole file) 6662.0 KiB 100.0% SHA256 1463.8 KiB 22.0% SHA1938.9 KiB 14.1% Description-md5 794.3 KiB 11.9% MD5sum 752.4 KiB 11.3% Depends 473.0 KiB7.1%