Re: Issue while loading the SVN Dump SVN version 1.9.7

2018-02-01 Thread Johan Corveleyn
On Wed, Jan 31, 2018 at 4:12 PM, Santosh Kondapuram
 wrote:
> Hi Johan,
>
> As you suggested I have removed the leading spaces from line 39 
> (enable-rep-sharing=false) and this time it worked and was able to 
> successfully load the problematic revision.
> So does this conclude I have the sha-1 colliding files in my repo ? Also what 
> are the next steps to catchup the latest revisions from the master node ?
> Appreciate all your help and great working with you on this issue !!!
>
> FYI,
>
> By the way I have tried running the following command " svnadmin load -M 0 
> /u01/svn/repos < incdump724413.txt "with rep-sharing enabled and still see 
> the same issue. I have tried doing this before the above work around which 
> worked.

Okay, thanks for re-testing that.

What to do next? I think it depends on whether or not this is a real
collision, or why the collision-detection code went wrong. Normally
you can catchup with the original repo by creating another incremental
dump of the remaining revisions, and loading that into the new
repository. You can re-enable rep-sharing before doing so, so the
additional revisions are using the rep-sharing functionality.

However, I'm still wondering what went wrong here. If there is a real
sha1 collision, you won't be able to checkout a working copy which
contains both colliding files (though it's certainly possible that
both files would normally not appear together in a working copy --
perhaps the "first file" is already long deleted, so it's only part of
ancient history in your repository).

To find out a little bit more about the (alleged) collision, can you
do the following, by using the sqlite3 executable (perhaps it's
installed standard on your system)?

go to the db subdir of your repository
sqlite3 rep-cache.db "select * from rep_cache where
hash='9db457be74545c184242e57208bf1d56db1f15b2'"

I think you'll get back at least two rows. The schema of the table is:

( hash TEXT NOT NULL PRIMARY KEY, revision INTEGER NOT NULL, offset
INTEGER NOT NULL, size INTEGER NOT NULL, expanded_size INTEGER NOT
NULL )

The revision columns that you get back might be interesting for
further investigation (perhaps by looking at them in the original
repo). Maybe you can 'svn log -v' those revisions, and run 'svn cat
URL_OF_FILE@REV' for each of the affected files (and the corresponding
revisions) to see their contents (and perhaps sha1sum them with a
commandline tool).

-- 
Johan


Re: mailer.py commit says TypeError: must be unicode, not str

2018-02-01 Thread Kenneth Porter

[moving discussion to dev list as I think this is now the correct fix.]

--On Wednesday, January 31, 2018 7:40 PM -0800 Kenneth Porter 
 wrote:



--On Wednesday, January 31, 2018 7:23 PM -0800 Kenneth Porter
 wrote:


fp = builtins.open(file, 'w+') # avoid namespace clash with
   # trimmed-down svn_fs_open()


I'm now thinking the problem is in the open call, and that I'm somehow
getting a Python 3 open function even though I've got Python 2.7
installed. Should the mode be 'wb' instead of 'w+'? That would insure
that the raw data from the Subversion object is getting dumped into the
temporary fle without interpretation. I don't understand why update
(denoted by the plus) is wanted. The temp file isn't being read from.


Proposed edit to fs.py: Change 'w+' to 'wb' when copying svn stream object 
to temporary file. Update isn't needed, and the code just needs to dump the 
raw data into a file for the external diff to access, so no 
encoding/decoding should occur. Hence we should open the file in binary 
mode. I just tested this edit and it seems to cure the problem.


It looks like this line is the same since it was originally added in 
r843330 and hasn't changed in Troy's swig-py3 branch.



From my initial report in the users list:





I'm using mailer.py in my post-commit hook and it's throwing a Unicode type 
error during the diff phase. Digging through the source code, I figured out 
that it's happening during the creation of the two temporary files for 
diff'ing. Somehow the output file is getting opened in Unicode text mode 
but the input source (the Subversion object stream) is a raw byte stream. 
The write call fails.


OS: CentOS 7.4
subversion-python-1.7.14-11.el7_4.x86_64
python-2.7.5-58.el7.x86_64




Re: mailer.py commit says TypeError: must be unicode, not str

2018-02-01 Thread Troy Curtis Jr
On Thu, Feb 1, 2018, 12:34 PM Kenneth Porter  wrote:

> [moving discussion to dev list as I think this is now the correct fix.]
>
> --On Wednesday, January 31, 2018 7:40 PM -0800 Kenneth Porter
>  wrote:
>
> > --On Wednesday, January 31, 2018 7:23 PM -0800 Kenneth Porter
> >  wrote:
> >
> >> fp = builtins.open(file, 'w+') # avoid namespace clash with
> >># trimmed-down svn_fs_open()
> >
> > I'm now thinking the problem is in the open call, and that I'm somehow
> > getting a Python 3 open function even though I've got Python 2.7
> > installed. Should the mode be 'wb' instead of 'w+'? That would insure
> > that the raw data from the Subversion object is getting dumped into the
> > temporary fle without interpretation. I don't understand why update
> > (denoted by the plus) is wanted. The temp file isn't being read from.
>
>
>
That seems strange, for py3 sure, but certainly odd on py2. Perhaps your
locale is set to utf8? I'll have to research to see if that even makes
sense.

Proposed edit to fs.py: Change 'w+' to 'wb' when copying svn stream object
> to temporary file. Update isn't needed, and the code just needs to dump the
> raw data into a file for the external diff to access, so no
> encoding/decoding should occur. Hence we should open the file in binary
> mode. I just tested this edit and it seems to cure the problem.
>
> It looks like this line is the same since it was originally added in
> r843330 and hasn't changed in Troy's swig-py3 branch.
>

I've been leaning heavily on the test coverage for validating my py3
updates. At first glance it looks like this FileDiff isn't referenced in
any existing test. I'll add a test and confirm the behavior, and then test
with your fix, unless you'd like to do so.

Troy


> >From my initial report in the users list:
>
> 
> 
>
> I'm using mailer.py in my post-commit hook and it's throwing a Unicode type
> error during the diff phase. Digging through the source code, I figured out
> that it's happening during the creation of the two temporary files for
> diff'ing. Somehow the output file is getting opened in Unicode text mode
> but the input source (the Subversion object stream) is a raw byte stream.
> The write call fails.
>
> OS: CentOS 7.4
> subversion-python-1.7.14-11.el7_4.x86_64
> python-2.7.5-58.el7.x86_64
>
>
>