Hello! I have a few questions about the file schema. In schema/filewriter.go, files are split into trees of chunks at "interesting rollsum boundaries <https://github.com/perkeep/perkeep/blob/d9e34b748ca155eb606d1026e3f861604eb01442/pkg/schema/filewriter.go#L294> ."
The comment describing an "interesting" rollsum boundary says it's when the trailing 13 bits of the rolling checksum are "set the same way <https://github.com/go4org/go4/blob/132d2879e1e95dadb805c26cd339344efd1a67c8/rollsum/rollsum.go#L61-L62>," which sounds like it means all-zeroes or all-ones. But the implementation says they have to be all ones <https://github.com/go4org/go4/blob/132d2879e1e95dadb805c26cd339344efd1a67c8/rollsum/rollsum.go#L64> . - Question 1: Which is right, the comment or the implementation? - Question 1a: If the comment is right, then the intention is for 2 out of every 1<<13 checksum values to satisfy OnSplit. Why not 1 out of every 1<<12? This chunk splitting happens on the second and subsequent chunks of a file after the size of the chunk surpasses 64kb, which by my calculation <https://play.golang.org/p/cCmTYDAhpo9> happens, on average, within the following 5,678 bytes. - Question 2: Why make irregularly sized chunks at all based on this obscure property? Why not split at 64kb boundaries? Each chunk created gets a "bits" score which seems to be <https://github.com/go4org/go4/blob/132d2879e1e95dadb805c26cd339344efd1a67c8/rollsum/rollsum.go#L74-L77> the number of trailing ones in its rolling checksum (though I'm not quite sure about that). If this is larger than the bits score of the last 1 or more chunks, those are made "children" of this new chunk. - Question 3: Why? Thanks, - Bob -- You received this message because you are subscribed to the Google Groups "Perkeep" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/perkeep/CAEf8c49V3gbxBXX%3DTeFnsi8kP_8tKTyZR9f26%3D_aaL-5%3DTYZFg%40mail.gmail.com.
