On Sun, Jun 12, 2016 at 05:54:36PM -0400, Konstantin Ryabitsev wrote:
> > git gc --prune=now
>
> You are correct, this solves the problem, however I'm curious. The usual
> maintenance for these repositories is a regular run of:
>
> - git fsck --full
> - git repack -Adl -b --pack-kept-objects
> - git pack-refs --all
> - git prune
>
> The reason it's split into repack + prune instead of just gc is because
> we use alternates to save on disk space and try not to prune repos that
> are used as alternates by other repos in order to avoid potential
> corruption.
>
> Am I not doing something that needs to be doing in order to avoid the
> same problem?
Your approach makes sense; we do the same thing at GitHub for the same
reasons[1]. The main thing you are missing that gc will do is that it
knows the prune-time it is going to feed to git-prune[2], and passes
that along to repack. That's what enables the "don't bother ejecting
these, because I'm about to delete them" optimization.
That option is not documented, because it was always assumed to be an
internal thing to git-gc, but it is:
git repack ... --unpack-unreachable=5.minutes.ago
or whatever.
-Peff
[1] We don't run the fsck at the front, though, because it's really
expensive. I'm not sure it buys you much, either. The repack
will do a full walk of the graph, so it gets you a connectivity
check, as well as a full content check of the commits and trees. The
blobs are copied as-is from the old pack, but there is a checksum on
the pack data (to catch any bit flips by the disk storage). So the
only thing the fsck is getting you is that it fully reconstructs the
deltas for each blob and checks their sha1. That's more robust than
a checksum, but it's a lot more expensive.
[2] It's unclear to me if you're passing any options to git-prune, but
you may want to pass "--expire" with a short grace period. Without
any options it prunes every unreachable thing, which can lead to
races if the repository is actively being used.
At GitHub we actually have a patch to `repack` that keeps all
objects, reachable or not, in the pack, and use it for all of our
automated maintenance. Since we don't drop objects at all, we can't
ever have such a race. Aside from some pathological cases, it wastes
much less space than you'd expect. We turn the flag off for special
cases (e.g., somebody has rewound history and wants to expunge a
sensitive object).
I'm happy to share the "keep everything" patch if you're interested.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html