Hi, On Tue, Oct 07, 2014 at 03:03:13PM -0500, jt.mil...@l-3com.com wrote: > Is there a way to check out every version of a file in a repository? We > just had a requirement levied to perform a scan of every file in a > repository. The scan tool must have each file in a stand-alone format. > Thus, I need a way to extract every version of every file within a > repository. > > > > Aside from the brute-force method of checking out the entire repository > starting at revision 1 , performing a scan, updating to the next revision, > and repeating until I reach the head, I don’t know of a way to do this.
That's certainly a somewhat tough one. I will get tarred and feathered here for my way of trying to solve this, and possibly even rightfully so, but... ;) OK, here it goes: you could do a git-svn on your repo, then get all files ever existing via http://stackoverflow.com/a/12090812 , then for each such file do a git log --all --something --someveryshortformat to get all its revisions, then do a file_content=$(git show <revision>:./path/to/file) (alternatively do git show ... > $TMPDIR/mytmp since that ought to be more reliable for largish files) , then scan that (but ideally you'd be able to directly pipe the git show stream into your scan tool). That ought to give you a scan result for *all* revisions of *all* files in *all* branches of your repo (you might want to decorate things with a "uniq" applied at some place or another, to ensure that you're indeed not doing wasteful duplicate processing of certain items). OK possibly scratch the "*all* branches" part, since this may require some extra effort in the case of git-svn... However this high-level complex lookup solution might be both rather crude and much less precise compared to a parse-each-object kind of solution at git plumbing level, if this is possible (and I'd very much guess it is). Hmm, that could be a git rev-list, and that would then list changed files for each commit, and AFAICS globally (i.e., on the global commit tree, rather than specific "human-tagged" branch names). So that operation mode once successfully scripted ought to be *a lot* better than the "list all files, then rev-log each file" algo. And you could then safety check your algorithm by having it spit out a full list of all commit hash / file combos (this happens to be the same list which you would then feed into git show, entry by entry), and then try hard to figure out a way to pick a repo-side file version which accidentally is NOT contained in that list --> algo error! Oh, and BTW: all this *without* having to do a filesystem-based checkout (i.e., working copy modification) of any repo item, even once. (i.e., this is actually going *against* your initially stated "requirement" of "Is there a way to check out every version of a file in a repository?", and rightfully so ;) HTH, Andreas Mohr