Generating a dump file using a powershell script

Geoff Worboys Tue, 22 Jun 2010 07:37:25 -0700

Hi All,

I've just joined this group.  I've been using subversion for a
few years now - most of my day to day stuff via TortoiseSvn.
A few days ago I once again came across a requirement where I
said "subversion is what I need here" only to once again hit
the issue that to start a new project in subversion means
losing all the file time-stamps.  I don't want to re-start
arguments on that front (I see from googling and archives that
it is a VERY old discussion).  I simply have some questions in
regard to my own chosen work-around to the problem.


It seemed to me that for most of my requirements I did not need
extra features in subversion, all I really needed was some way
to create the new repository so that it looked like all the
files I imported were committed at the time it says on the file
from the original source.  If I could get that then I could use
the "use-commit-times" option to keep things very close to the
way wanted them.  [And I could keep using TortoiseSvn and I
would be a happy man.]

That all led me to trying to create my own dump files.  I ended
up choosing powershell scripting because I wanted to learn about
it and this seemed like an interesting project to try with it.
I have a working script now, put simply it is executed as:

powershell .\Import-from-Source D:\SourceFolder D:\Temp\DumpFile.dat

It takes the entire contents of D:\SourceFolder and creates
a subversion dump file in D:\Temp\DumpFile.dat.  It replicates
the structure inside D:\SourceFolder so if you want a "trunk"
folder etc you have to have created them first.

Objects (the full tree) from D:\SourceFolder are first sorted
by their last-write-time property and I then create a revision
entry for each date that appears (the revision resolution is
adjustable in the script).  This makes it so that each file
ends up appearing to have been committed on the same date that
it had on the original source file, so checking out the files
with the use-commit-times option gives them same date as the
original file (if not, necessarily, exactly the same time).

Yippee, it works.

Now to some gritty details, which is why I am here.


Q1:  If, in the dump file, I sometimes give a file a property
svn:eol-style = native, but the file itself has been copied
directly into the dump file (ie. contains CRLF end-of-lines)
is that going to matter to svnadmin load?

[Will the load process take care of things for me or do I
need to parse such files and make them all LF - which is what
svn says it uses internally for "native" files? ]

My experiments seemed to show that svnadmin dump also produced
the the CRLF end-of-lines but it all gets quite confusing so
thought I would ask here.

Since I mostly work under Windows it's probably not a big deal
for me ... but I'd rather the script was correct in case it
gets used by others that may have other requirements.


Q2:  When writing the code to try and identify text versus
binary files I decided to look at what subversion did ... but
now I am confused.  In libsvn_subr\io.c function
svn_io_detect_mimetype2 a comment says:
     going to examine the first block of data, and make sure that 85%
     of the bytes are such that their value is in the ranges 0x07-0x0D
     or 0x20-0x7F, and that 100% of those bytes is not 0x00.
but my reading of this code
      if (((binary_count * 1000) / amt_read) > 850)
        {
          *mimetype = generic_binary;
          return SVN_NO_ERROR;
        }
suggests that it is actually setting the type to binary only
if it finds more than 85% are binary bytes (in earlier code a
file binary if forced if any null byte is found).

Can anyone explain this?  A bug or am I missing something?


Q3:  If there are already other scripts around that do this
then feel free to tell me that I have wasted my time.  I could
not find any similar solutions in my searching.


Q4:  If there are any powershell people here that would like to
review and test the code I am quite happy to share it ... but
would not recommend it to a scripting novice until it has been
checked over and tested by more than me.


Q5:  I found a description of the dump file in the source but
that description says "Properties are stored in the same
human-readable hashdump format used by working copy property
files,"   Any pointers to a description for that?

(Obviously I've gotten by just by visually checking dump files
produced by svnadmin, but it would be good to know what I was
doing. ;-)


Hmmm... big post for my first post.  Hope that's okay.

-- 
Geoff Worboys
Telesis Computing

Generating a dump file using a powershell script

Reply via email to