Re: Git packs friendly to block-level deduplication

Elijah Newren Wed, 24 Jan 2018 14:38:10 -0800

On Wed, Jan 24, 2018 at 2:03 PM, Ævar Arnfjörð Bjarmason
<ava...@gmail.com> wrote:
> If you have a bunch of git repositories cloned of the same project on
> the same filesystem, it would be nice of the packs that are produced
> would be friendly to block-level deduplication.
>
> This would save space, and the blocks would be more likely to be in
> cache when you access them, likely speeding up git operations even if
> the packing itself is less efficient.
>
> Here's a hacky one-liner that clones git/git and peff/git (almost the
> same content) and md5sums each 4k packed block, and sort | uniq -c's
> them to see how many are the same:


<snip>

>
> Has anyone here barked up this tree before? Suggestions? Tips on where
> to start hacking the repack code to accomplish this would be most
> welcome.

Does this overlap with the desire to have resumable clones?  I'm
curious what would happen if you did the same experiment with two
separate clones of git/git, cloned one right after the other so that
hopefully the upstream git/git didn't receive any updates between your
two separate clones.  (In other words, how much do packfiles differ in
practice for different packings of the same data?)

Re: Git packs friendly to block-level deduplication

Reply via email to