See https://github.com/bup/bup/blob/master/DESIGN#L92 for the answer to most of these questions.
On Sat, Dec 14, 2019 at 7:13 AM Bob Glickstein <[email protected]> wrote: > Hello! I have a few questions about the file schema. > > In schema/filewriter.go, files are split into trees of chunks at "interesting > rollsum boundaries > <https://github.com/perkeep/perkeep/blob/d9e34b748ca155eb606d1026e3f861604eb01442/pkg/schema/filewriter.go#L294> > ." > > The comment describing an "interesting" rollsum boundary says it's when > the trailing 13 bits of the rolling checksum are "set the same way > <https://github.com/go4org/go4/blob/132d2879e1e95dadb805c26cd339344efd1a67c8/rollsum/rollsum.go#L61-L62>," > which sounds like it means all-zeroes or all-ones. But the implementation > says they have to be all ones > <https://github.com/go4org/go4/blob/132d2879e1e95dadb805c26cd339344efd1a67c8/rollsum/rollsum.go#L64> > . > > - Question 1: Which is right, the comment or the implementation? > - Question 1a: If the comment is right, then the intention is for 2 > out of every 1<<13 checksum values to satisfy OnSplit. Why not 1 out > of every 1<<12? > > This chunk splitting happens on the second and subsequent chunks of a file > after the size of the chunk surpasses 64kb, which by my calculation > <https://play.golang.org/p/cCmTYDAhpo9> happens, on average, within the > following 5,678 bytes. > > - Question 2: Why make irregularly sized chunks at all based on this > obscure property? Why not split at 64kb boundaries? > > Each chunk created gets a "bits" score which seems to be > <https://github.com/go4org/go4/blob/132d2879e1e95dadb805c26cd339344efd1a67c8/rollsum/rollsum.go#L74-L77> > the number of trailing ones in its rolling checksum (though I'm not quite > sure about that). If this is larger than the bits score of the last 1 or > more chunks, those are made "children" of this new chunk. > > - Question 3: Why? > > Thanks, > - Bob > > -- > You received this message because you are subscribed to the Google Groups > "Perkeep" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/perkeep/CAEf8c49V3gbxBXX%3DTeFnsi8kP_8tKTyZR9f26%3D_aaL-5%3DTYZFg%40mail.gmail.com > <https://groups.google.com/d/msgid/perkeep/CAEf8c49V3gbxBXX%3DTeFnsi8kP_8tKTyZR9f26%3D_aaL-5%3DTYZFg%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "Perkeep" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/perkeep/CAOk-JQ_AyWQ7qpaB8L_NUEt2nqb-Ynt6uRPyY_rF0tiv97e1fA%40mail.gmail.com.
