On Mon, Feb 22, 2016 at 4:49 PM, James <wirel...@tampabay.rr.com> wrote: > Rich Freeman <rich0 <at> gentoo.org> writes: > >> If I were doing anything too >> crazy with all this I'd probably use the python git module. > > dev-python/git-python ??? Any others or related docs/howtos/examples? >
I used pygit2, but there are a few different implenentations and plenty of docs online in general. Here is an example program that runs through a history and dumps a list of commits and their metadata in csv format: https://github.com/rich0/gitvalidate/blob/master/gitdump/parsetrees.py There are some other scripts that retrieve blobs and manipulate them in the same directory. This was part of the validation of the git migration, which uses a map-reduce algorithm to diff every single commit in a git history and identify all file revisions (which creates a cvs-like per-file history which can then be compared with results obtained from parsing a cvs repository for the same information). The only single-threaded step in the process is walking the list of commits - all the diffs can be highly paralleled. I doubt you need anything quite so fancy. As you can see from the script pulling metadata out of commits and walking through parents is pretty easy. My example doesn't account for merge commits. There weren't any in the cvs->git migration. Obviously walking commits with merges will get a lot messier. -- Rich