See https://github.com/bup/bup/blob/master/DESIGN#L92 for the answer to
most of these questions.

On Sat, Dec 14, 2019 at 7:13 AM Bob Glickstein <[email protected]>
wrote:

> Hello! I have a few questions about the file schema.
>
> In schema/filewriter.go, files are split into trees of chunks at "interesting
> rollsum boundaries
> <https://github.com/perkeep/perkeep/blob/d9e34b748ca155eb606d1026e3f861604eb01442/pkg/schema/filewriter.go#L294>
> ."
>
> The comment describing an "interesting" rollsum boundary says it's when
> the trailing 13 bits of the rolling checksum are "set the same way
> <https://github.com/go4org/go4/blob/132d2879e1e95dadb805c26cd339344efd1a67c8/rollsum/rollsum.go#L61-L62>,"
> which sounds like it means all-zeroes or all-ones. But the implementation
> says they have to be all ones
> <https://github.com/go4org/go4/blob/132d2879e1e95dadb805c26cd339344efd1a67c8/rollsum/rollsum.go#L64>
> .
>
>    - Question 1: Which is right, the comment or the implementation?
>    - Question 1a: If the comment is right, then the intention is for 2
>    out of every 1<<13 checksum values to satisfy OnSplit. Why not 1 out
>    of every 1<<12?
>
> This chunk splitting happens on the second and subsequent chunks of a file
> after the size of the chunk surpasses 64kb, which by my calculation
> <https://play.golang.org/p/cCmTYDAhpo9> happens, on average, within the
> following 5,678 bytes.
>
>    - Question 2: Why make irregularly sized chunks at all based on this
>    obscure property? Why not split at 64kb boundaries?
>
> Each chunk created gets a "bits" score which seems to be
> <https://github.com/go4org/go4/blob/132d2879e1e95dadb805c26cd339344efd1a67c8/rollsum/rollsum.go#L74-L77>
> the number of trailing ones in its rolling checksum (though I'm not quite
> sure about that). If this is larger than the bits score of the last 1 or
> more chunks, those are made "children" of this new chunk.
>
>    - Question 3: Why?
>
> Thanks,
> - Bob
>
> --
> You received this message because you are subscribed to the Google Groups
> "Perkeep" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/perkeep/CAEf8c49V3gbxBXX%3DTeFnsi8kP_8tKTyZR9f26%3D_aaL-5%3DTYZFg%40mail.gmail.com
> <https://groups.google.com/d/msgid/perkeep/CAEf8c49V3gbxBXX%3DTeFnsi8kP_8tKTyZR9f26%3D_aaL-5%3DTYZFg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Perkeep" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/perkeep/CAOk-JQ_AyWQ7qpaB8L_NUEt2nqb-Ynt6uRPyY_rF0tiv97e1fA%40mail.gmail.com.

Reply via email to