Re: reftable: new ref storage format

2017-07-23 Thread Shawn Pearce
On Sun, Jul 23, 2017 at 3:56 PM, Shawn Pearce wrote: > On Mon, Jul 17, 2017 at 6:43 PM, Michael Haggerty > wrote: >> On Sun, Jul 16, 2017 at 12:43 PM, Shawn Pearce wrote: >>> On Sun, Jul 16, 2017 at 10:33 AM, Michael Haggerty >>> wrote: > >> * What would you think about being extravagant and

Re: reftable: new ref storage format

2017-07-23 Thread Shawn Pearce
+git@vger.kernel.org. I originally sent the below reply privately by mistake. On Mon, Jul 17, 2017 at 6:43 PM, Michael Haggerty wrote: > On Sun, Jul 16, 2017 at 12:43 PM, Shawn Pearce wrote: >> On Sun, Jul 16, 2017 at 10:33 AM, Michael Haggerty >> wrote: > > On second thought, the idea of havi

Re: reftable: new ref storage format

2017-07-18 Thread Junio C Hamano
Michael Haggerty writes: > On second thought, the idea of having HEAD (or maybe all pseudorefs) > in the same system would open a few interesting possibilities that > derive from having a global, atomic view of all references: > > 1. We could store backlinks from references to the symbolic refere

Re: reftable: new ref storage format

2017-07-17 Thread Michael Haggerty
On Sun, Jul 16, 2017 at 12:43 PM, Shawn Pearce wrote: > On Sun, Jul 16, 2017 at 10:33 AM, Michael Haggerty > wrote: >> * Do you propose to store *all* references (i.e., including the >> references that we call pseudorefs, like `HEAD`, `FETCH_HEAD`, etc) in >> reftables, or only the references un

Re: reftable: new ref storage format

2017-07-16 Thread Shawn Pearce
On Sun, Jul 16, 2017 at 2:13 PM, Dave Borowitz wrote: > On Sun, Jul 16, 2017 at 3:43 PM, Shawn Pearce wrote: >> True... but... in my "android" example repository we have 866,456 live >> refs. A block size of 64k needs only 443 blocks, and a 12k index, to >> get the file to compress to 28M (vs. 62

Re: reftable: new ref storage format

2017-07-16 Thread Dave Borowitz
On Sun, Jul 16, 2017 at 3:43 PM, Shawn Pearce wrote: > True... but... in my "android" example repository we have 866,456 live > refs. A block size of 64k needs only 443 blocks, and a 12k index, to > get the file to compress to 28M (vs. 62M packed-refs). > > Index records are averaging 28 bytes per

Re: reftable: new ref storage format

2017-07-16 Thread Shawn Pearce
On Sun, Jul 16, 2017 at 12:43 PM, Shawn Pearce wrote: > On Sun, Jul 16, 2017 at 10:33 AM, Michael Haggerty > wrote: > >> * The tuning parameter number_of_restarts currently trades off space >> (for the full refnames and the restart_offsets) against the need to >> read and parse more ref_records

Re: reftable: new ref storage format

2017-07-16 Thread Shawn Pearce
On Sun, Jul 16, 2017 at 10:33 AM, Michael Haggerty wrote: > Thanks for your reftable proposal. Thanks for your time reading the proposal. I really was looking forward to your insights on this project. > It would solve a lot of scalability > problems that we currently have, and do it in a way tha

Re: reftable: new ref storage format

2017-07-16 Thread Michael Haggerty
Thanks for your reftable proposal. It would solve a lot of scalability problems that we currently have, and do it in a way that is implementable in both C and Java, which is very nice. There are two mostly orthogonal components to your proposal: 1. What does a single reftable file look like? 2. H

Re: reftable: new ref storage format

2017-07-16 Thread Johannes Sixt
Am 16.07.2017 um 12:03 schrieb Jeff King: On Sun, Jul 16, 2017 at 10:07:57AM +0200, Johannes Sixt wrote: Am 14.07.2017 um 22:08 schrieb Jeff King: The implementation on this doesn't seem overly complex. My main concerns are what we're asking from the filesystem in terms of atomicity, and what

Re: reftable: new ref storage format

2017-07-16 Thread Jeff King
On Sun, Jul 16, 2017 at 10:07:57AM +0200, Johannes Sixt wrote: > Am 14.07.2017 um 22:08 schrieb Jeff King: > > The implementation on this doesn't seem overly complex. My main concerns > > are what we're asking from the filesystem in terms of atomicity, and > > what possible races there are. > > O

Re: reftable: new ref storage format

2017-07-16 Thread Jeff King
On Sat, Jul 15, 2017 at 11:01:47PM -0700, Shawn Pearce wrote: > > How deep would you anticipate stacks getting? Would it be feasible for > > the tip to contain the names of the tables in the entire chain? If we're > > talking about 20 (or even 32) bytes per name, you could still fit over a > > hun

Re: reftable: new ref storage format

2017-07-16 Thread Johannes Sixt
Am 14.07.2017 um 22:08 schrieb Jeff King: The implementation on this doesn't seem overly complex. My main concerns are what we're asking from the filesystem in terms of atomicity, and what possible races there are. One of the failure modes is that on Windows a file cannot be deleted while it i

Re: reftable: new ref storage format

2017-07-15 Thread Shawn Pearce
On Fri, Jul 14, 2017 at 1:08 PM, Jeff King wrote: > On Thu, Jul 13, 2017 at 05:11:52PM -0700, Shawn Pearce wrote: > > I think the "stack" implementation is what makes me most uncomfortable > with this proposal. Atomicity with filesystem operations and especially > readdir() is one of the things I

Re: reftable: new ref storage format

2017-07-14 Thread Jeff King
On Thu, Jul 13, 2017 at 05:27:44PM -0700, Shawn Pearce wrote: > > We _could_ consider gzipping individual blocks of > > a reftable (or any structure that allows you to search to a > > constant-sized block and do a linear search from there). But given that > > they're in the same ballpark, I'm happ

Re: reftable: new ref storage format

2017-07-14 Thread Jeff King
On Thu, Jul 13, 2017 at 05:11:52PM -0700, Shawn Pearce wrote: > Yes, I was hoping for reader atomicity. But I may OK foregoing that if > the transaction either all goes through, or all fails. A partially > stuck transaction because the process died in the middle of the commit > step creates a mess

Re: reftable: new ref storage format

2017-07-14 Thread Shawn Pearce
On Fri, Jul 14, 2017 at 7:27 AM, Dave Borowitz wrote: > On Thu, Jul 13, 2017 at 8:11 PM, Shawn Pearce wrote: >> In another (Gerrit Code Review), we disable reflogs for >> the insane refs/changes/ namespace, as nearly every reference is >> created once, and never modified. > > Apologies for the ta

Re: reftable: new ref storage format

2017-07-14 Thread Dave Borowitz
On Thu, Jul 13, 2017 at 8:11 PM, Shawn Pearce wrote: > In another (Gerrit Code Review), we disable reflogs for > the insane refs/changes/ namespace, as nearly every reference is > created once, and never modified. Apologies for the tangent, but this is not true in the most recent Gerrit implement

Re: reftable: new ref storage format

2017-07-13 Thread Shawn Pearce
On Thu, Jul 13, 2017 at 1:35 PM, Jeff King wrote: > On Thu, Jul 13, 2017 at 12:56:54PM -0700, Stefan Beller wrote: > > I agree that a full binary search of a reftable is harder because of the > prefix compression (it may still be possible by scanning backwards, but > I think there are ambiguities

Re: reftable: new ref storage format

2017-07-13 Thread Shawn Pearce
On Thu, Jul 13, 2017 at 12:32 PM, Jeff King wrote: > On Wed, Jul 12, 2017 at 05:17:58PM -0700, Shawn Pearce wrote: > >> ### Problem statement >> >> Some repositories contain a lot of references (e.g. android at 866k, >> rails at 31k). The existing packed-refs format takes up a lot of >> space (e

Re: reftable: new ref storage format

2017-07-13 Thread Eric Wong
Jeff King wrote: > I agree that a full binary search of a reftable is harder because of the > prefix compression (it may still be possible by scanning backwards, but > I think there are ambiguities when you land in the middle of a record, > since there's no unambiguous end-of-record character). Bu

Re: reftable: new ref storage format

2017-07-13 Thread Jeff King
On Thu, Jul 13, 2017 at 12:56:54PM -0700, Stefan Beller wrote: > >> ### Problem statement > >> > >> Some repositories contain a lot of references (e.g. android at 866k, > >> rails at 31k). The existing packed-refs format takes up a lot of > >> space (e.g. 62M), and does not scale with additiona

Re: reftable: new ref storage format

2017-07-13 Thread Stefan Beller
On Thu, Jul 13, 2017 at 12:32 PM, Jeff King wrote: > On Wed, Jul 12, 2017 at 05:17:58PM -0700, Shawn Pearce wrote: > >> We've been having scaling problems with insane number of references >> (>866k), so I started thinking a lot about improving ref storage. >> >> I've written a simple approach, and

Re: reftable: new ref storage format

2017-07-13 Thread Jeff King
On Wed, Jul 12, 2017 at 05:17:58PM -0700, Shawn Pearce wrote: > We've been having scaling problems with insane number of references > (>866k), so I started thinking a lot about improving ref storage. > > I've written a simple approach, and implemented it in JGit. > Performance is promising: > >

reftable: new ref storage format

2017-07-12 Thread Shawn Pearce
We've been having scaling problems with insane number of references (>866k), so I started thinking a lot about improving ref storage. I've written a simple approach, and implemented it in JGit. Performance is promising: - 62M packed-refs compresses to 27M - 42.3 usec lookup You can read a re