Hi,

On Tue, Oct 07, 2014 at 03:03:13PM -0500, jt.mil...@l-3com.com wrote:
>    Is there a way to check out every version of a file in a repository? We
>    just had a requirement levied to perform a scan of every file in a
>    repository. The scan tool must have each file in a stand-alone format.
>    Thus, I need a way to extract every version of every file within a
>    repository.
> 
>     
> 
>    Aside from the brute-force method of checking out the entire repository
>    starting at revision 1 , performing a scan, updating to the next revision,
>    and repeating until I reach the head, I don’t know of a way to do this.

That's certainly a somewhat tough one.


I will get tarred and feathered here for my way of trying to solve this,
and possibly even rightfully so, but... ;)

OK, here it goes:
you could do a git-svn on your repo,
then get all files ever existing via http://stackoverflow.com/a/12090812
, then for each such file do a git log --all --something --someveryshortformat
to get all its revisions,
then do a
file_content=$(git show <revision>:./path/to/file)
(alternatively do git show ... > $TMPDIR/mytmp since that ought to be more
reliable for largish files)
, then scan that
(but ideally you'd be able to directly pipe the git show stream into your scan 
tool).

That ought to give you a scan result for *all* revisions of *all* files
in *all* branches of your repo (you might want to decorate things with a
"uniq" applied at some place or another, to ensure that you're indeed
not doing wasteful duplicate processing of certain items).
OK possibly scratch the "*all* branches" part, since this may require
some extra effort in the case of git-svn...


However this high-level complex lookup solution
might be both rather crude and much less precise
compared to a parse-each-object kind of solution at git plumbing level, if this 
is
possible (and I'd very much guess it is).
Hmm, that could be a git rev-list, and that would then list changed files for 
each commit,
and AFAICS globally (i.e., on the global commit tree, rather than specific
"human-tagged" branch names). So that operation mode once successfully scripted
ought to be *a lot* better than the "list all files, then rev-log each file" 
algo.

And you could then safety check your algorithm
by having it spit out a full list of all commit hash / file combos
(this happens to be the same list which you would then feed into git show,
entry by entry),
and then try hard to figure out a way
to pick a repo-side file version which accidentally is NOT contained in that 
list
--> algo error!


Oh, and BTW: all this *without* having to do a filesystem-based checkout
(i.e., working copy modification)
of any repo item, even once.
(i.e., this is actually going *against* your initially stated "requirement" of
"Is there a way to check out every version of a file in a repository?",
and rightfully so ;)

HTH,

Andreas Mohr

Reply via email to