On Wed, Jan 24, 2018 at 2:03 PM, Ævar Arnfjörð Bjarmason
<ava...@gmail.com> wrote:
> If you have a bunch of git repositories cloned of the same project on
> the same filesystem, it would be nice of the packs that are produced
> would be friendly to block-level deduplication.
>
> This would save space, and the blocks would be more likely to be in
> cache when you access them, likely speeding up git operations even if
> the packing itself is less efficient.
>
> Here's a hacky one-liner that clones git/git and peff/git (almost the
> same content) and md5sums each 4k packed block, and sort | uniq -c's
> them to see how many are the same:

<snip>

>
> Has anyone here barked up this tree before? Suggestions? Tips on where
> to start hacking the repack code to accomplish this would be most
> welcome.

Does this overlap with the desire to have resumable clones?  I'm
curious what would happen if you did the same experiment with two
separate clones of git/git, cloned one right after the other so that
hopefully the upstream git/git didn't receive any updates between your
two separate clones.  (In other words, how much do packfiles differ in
practice for different packings of the same data?)

Reply via email to