On 07.10.2014 22:36, Andreas Mohr wrote: > Hi, > > That's certainly a somewhat tough one. > > > I will get tarred and feathered here for my way of trying to solve this, > and possibly even rightfully so, but... ;)
Well, I certainly won't skin you alive for suggesting this; but ... I would imagine that "git svn fetch" has to essentially do just what the OP doesn't want to do, i.e., successively retreive each revision of every file in the Subversion repository to populate the Git repository. There's not much chance this would be faster than just doing the same with Subversion, especially since, once you're done you /still/ have to scan the files resulting Git repo. Going back to the original question ... > Aside from the brute-force method of checking out the entire repository > starting at revision 1 , performing a scan, updating to the next revision, > and repeating until I reach the head, I don’t know of a way to do this. This is, in fact, likely to be (almost) the most efficient way to do this, since you can just use the existing Subversion client to deal with the repository contents and version discrepancies. But there is an alternative that might be more efficient in your case: Create a dumpstream of the repository using "svnadmin dump", non-incremental and not using deltas, then pipe the stream to a custom tool that extracts the file contents the stream and either writes them to disk, or passes them to your scanning tool in some other way. The reason why this could be faster than the checkout+repeated update is that you do not have the overhead of a working copy, directory tracking, property handling, etc. etc., and you can probably save on disk space by keeping the file contents around only as long as they're being scanned. It does mean that your custom tool will have to parse the dumpfile format, but that's really not so hard, the format is quite simple, and there are a number of example scripts that do that in our repository. Another alternative is to use our API directly, possibly through one of the bindings, to get file contents straight from the repository; but I suspect it's harder than parsing the dump file. -- Brane
