Geoff Worboys wrote on Tue, 22 Jun 2010 at 17:36 -0000:
> powershell .\Import-from-Source D:\SourceFolder D:\Temp\DumpFile.dat
> 
> It takes the entire contents of D:\SourceFolder and creates
> a subversion dump file in D:\Temp\DumpFile.dat.  It replicates
> the structure inside D:\SourceFolder so if you want a "trunk"
> folder etc you have to have created them first.
> 
> Objects (the full tree) from D:\SourceFolder are first sorted
> by their last-write-time property and I then create a revision
> entry for each date that appears (the revision resolution is
> adjustable in the script).  This makes it so that each file
> ends up appearing to have been committed on the same date that
> it had on the original source file, so checking out the files
> with the use-commit-times option gives them same date as the
> original file (if not, necessarily, exactly the same time).
> 

i.e., you import the files in order of their timestamps, so that
svn:date remain globally sorted?

Nice!

> Q1:  If, in the dump file, I sometimes give a file a property
> svn:eol-style = native, but the file itself has been copied
> directly into the dump file (ie. contains CRLF end-of-lines)
> is that going to matter to svnadmin load?
> 
> [Will the load process take care of things for me or do I
> need to parse such files and make them all LF - which is what
> svn says it uses internally for "native" files? ]
> 
> My experiments seemed to show that svnadmin dump also produced
> the the CRLF end-of-lines but it all gets quite confusing so
> thought I would ask here.
> 

i.e., 'svnadmin dump' produces CRLF for svn:eol-style=native files?
That surprises me; I'd expect such files to be outputted with LF in dump
files.  (My testing agrees with my expectation.)  Can you double-check?

In any case, it probably *should* use LF, since dumpfiles are supposed
to be a portable binary format.

> Since I mostly work under Windows it's probably not a big deal
> for me ... but I'd rather the script was correct in case it
> gets used by others that may have other requirements.
> 
> 
> Q2:  When writing the code to try and identify text versus
> binary files I decided to look at what subversion did ... but
> now I am confused.  In libsvn_subr\io.c function
> svn_io_detect_mimetype2 a comment says:
>      going to examine the first block of data, and make sure that 85%
>      of the bytes are such that their value is in the ranges 0x07-0x0D
>      or 0x20-0x7F, and that 100% of those bytes is not 0x00.
> but my reading of this code
>       if (((binary_count * 1000) / amt_read) > 850)
>         {
>           *mimetype = generic_binary;
>           return SVN_NO_ERROR;
>         }
> suggests that it is actually setting the type to binary only
> if it finds more than 85% are binary bytes (in earlier code a
> file binary if forced if any null byte is found).
> 
> Can anyone explain this?  A bug or am I missing something?
> 

What's the question?  Are you saying the code/comment disagree?

> Q5:  I found a description of the dump file in the source but
> that description says "Properties are stored in the same
> human-readable hashdump format used by working copy property
> files,"   Any pointers to a description for that?
> 

You're quoting 
<http://svn.apache.org/repos/asf/subversion/trunk/notes/dump-load-format.txt>.

Internally the function it uses is svn_hash_write2(), and there's
a small documentation comment at the top of hash.c.  But, as you say,

> (Obviously I've gotten by just by visually checking dump files
> produced by svnadmin, but it would be good to know what I was
> doing. ;-)
> 

the format isn't hard to reverse-engineer, right?

> 
> Hmmm... big post for my first post.  Hope that's okay.
> 
> 

Yeah.  For next time, you could consider adding a one-paragraph summary
at the top, and/or make it clear what kind of responses you're looking
for (e.g., "Hey, I'm looking for people to try my script", or "Hey, I'm
looking for answers to questions I ran into developing a script", or ...)

Reply via email to