Re: Disabling automatic setting of svn:executable property

2011-05-30 Thread Nico Kadel-Garcia
On Mon, May 30, 2011 at 2:26 AM, Markus Schaber
 wrote:
> Oh, sorry, I just read that someone else posted the same suggestion.
>
> Sorry for the duplication.
>
> Regards,
> Markus

There's a potential risk with the approach: CygWin uses UNIX
compatible end-of-line characters. TortoiseSVN, and other Windows
based clients, use Windows end-of-line. The result can be *CHAOS* if
you typically set source files, such as .html, .php, or .c, .sh, or
.pl files, to use "svn:eol-stile", or expect files to be automatically
set in Windows or UNIX style as you switch from programming from a
source repository in Windows, and one in CygWin.


Re: Disabling automatic setting of svn:executable property

2011-05-30 Thread Ryan Schmidt

On May 30, 2011, at 11:26, Nico Kadel-Garcia wrote:

> There's a potential risk with the approach: CygWin uses UNIX
> compatible end-of-line characters. TortoiseSVN, and other Windows
> based clients, use Windows end-of-line. The result can be *CHAOS* if
> you typically set source files, such as .html, .php, or .c, .sh, or
> .pl files, to use "svn:eol-stile", or expect files to be automatically
> set in Windows or UNIX style as you switch from programming from a
> source repository in Windows, and one in CygWin.

*Not* setting svn:eol-style to some value will lead to chaos, as you use 
different editors with different ideas of what a line ending is, and you start 
getting files with inconsistent line endings. *Setting* svn:eol-style to some 
value should prevent said chaos, by preventing you from committing files with 
inconsistent line endings. Now you just need to choose what value you want to 
use for svn:eol-style. Choices LF and CRLF will behave the same on every 
platform, so this may be desired if your working copies are shared between 
platforms (for example between Windows and Mac, or between Windows and Cygwin). 
Setting svn:eol-style to native means that if you check out under Windows, you 
get CRLF line endings whereas if you check out under Mac or Cygwin you get LF 
line endings; *this* might be the chaos you're contemplating above. But it will 
only be chaos within your working copy, on your machine, and only if your 
editors don't know how to deal with files of that line ending style; by virtue 
of having svn:eol-style set (to any value), you will be prevented from 
committing that chaos to the repository until you have resolved it.




svnadmin: Path '....' is not in UTF-8 - svnadmin load fails

2011-05-30 Thread Torsten Krah
I want to load a repository with a fresh dump and did:

svnadmin -q dump /repo1 | svnadmin load /repo2

This is the error i get:

svnadmin: Path 'Projektprofile/EMS(Newsletter,
Infomails, ?\192?\166).doc' is not in UTF-8


How to fix this error - i am unable to load the dump in a new
repository?
What is causing this and are there any known workarounds?

svn version: svn, version 1.6.12 (r955767)

source repository is based on fsfs from 1.5 and destination is fsfs
repository created with 1.6 from above.

regards

Torsten



smime.p7s
Description: S/MIME cryptographic signature


Re: svnadmin: Path '....' is not in UTF-8 - svnadmin load fails

2011-05-30 Thread Torsten Krah
Some more infos about those problem:

svnadmin verify tells me the revision in question is ok in the source
repo.
Using vim to view the revision dump show those 2 utf-8 chars at the end
of the path which i guess are making trouble:

Projektprofile/EMS(Newsletter, Infomails, À¦).doc

Maybe someone got some nice ideas ;)



smime.p7s
Description: S/MIME cryptographic signature


Re: svnadmin: Path '....' is not in UTF-8 - svnadmin load fails

2011-05-30 Thread Daniel Shahaf
1.6 checks that paths are in UTF-8 at the time they enter the
repository.  This was always required but not always enforced.

Solution is to recode the pathnames (those that are neither in ASCII nor
in UTF-8).  If none of the third-party dump manipulation tools can do
that, then you could patch svnsync or one of those tools to do the
recoding.  (just inject a filename-recoding editor at the right place)

Torsten Krah wrote on Mon, May 30, 2011 at 22:51:39 +0200:
> Some more infos about those problem:
> 
> svnadmin verify tells me the revision in question is ok in the source
> repo.
> Using vim to view the revision dump show those 2 utf-8 chars at the end

It doesn't show "two UTF-8 characters", since the filename contains two
bytes which do not form a valid UTF-8 sequence.

> of the path which i guess are making trouble:
> 
> Projektprofile/EMS(Newsletter, Infomails, À¦).doc
> 
> Maybe someone got some nice ideas ;)
> 




Re: svnadmin: Path '....' is not in UTF-8 - svnadmin load fails

2011-05-30 Thread Torsten Krah
Am Dienstag, den 31.05.2011, 00:30 +0300 schrieb Daniel Shahaf:
> 1.6 checks that paths are in UTF-8 at the time they enter the
> repository.  This was always required but not always enforced.

Ok - so 1.6 does things <1.6 did not but should.

> 
> Solution is to recode the pathnames (those that are neither in ASCII
> nor
> in UTF-8).  

Sorry but your "solution" seems really a little bit odd to me.
If <1.6 did not enforce this and 1.6 does - why does 1.6 not recode it
at the time it does encounter such "things" - at least via some optional
command line option?

Do you really want to tell me that subversion (the "tool" used to manage
my code) is not able to load its own "dump", at least by providing some
"fix" tool by itself if it did things not "right" before - why should i
need or bother with "third-party" tools here - this should be done by
svn, shouldn't it?

> If none of the third-party dump manipulation tools can do
> that, 

Which "third-party" tools you have in mind are able to do that for me?

> then you could patch svnsync or one of those tools to do the
> recoding.  (just inject a filename-recoding editor at the right place)

Of cause i'll take the source, patch it and get my repo working
again ... nice joke - it was a joke right?

> 
> It doesn't show "two UTF-8 characters", since the filename contains
> two
> bytes which do not form a valid UTF-8 sequence.

You're right, my fault.



smime.p7s
Description: S/MIME cryptographic signature


Re: svnadmin: Path '....' is not in UTF-8 - svnadmin load fails

2011-05-30 Thread Stefan Sperling
On Tue, May 31, 2011 at 12:30:54AM +0300, Daniel Shahaf wrote:
> 1.6 checks that paths are in UTF-8 at the time they enter the
> repository.  This was always required but not always enforced.
> 
> Solution is to recode the pathnames (those that are neither in ASCII nor
> in UTF-8).

Yes, that's what needs to be done. Pathnames must be encoded UTF-8.
Unfortunately it seems that this invalid pathname somehow entered the
repository when a server version was used that didn't enforce UTF-8
encoding.

I would try to edit the dump file with a hexeditor and replace the
offending two bytes with two spaces (or the proper UTF-8 character
if you know what should be there and the UTF-8 sequence has the same
number of bytes). I hope the number of paths affected by this problem
is small enough to keep this solution practical.

> If none of the third-party dump manipulation tools can do that,

... then we should provide our users with a way of fixing it,
as we did for e.g. badly encoded revision properties.

> then you could patch svnsync or one of those tools to do the
> recoding.  (just inject a filename-recoding editor at the right place)

Daniel, please keep in mind that this is the *users* list.
Maybe Torsten would like to try this, but I doubt that modifying
Subversion's code is the kind of advice he was looking for.
And I really don't think that this suggestion is something that people
who are not familiar with Subversion's code base should attempt to do.
If people modify the code without understand it well the chances of
unintentionally breaking things are way too high.

It's bad enough that Torsten has to edit the dump file to fix this.


Re: svnadmin: Path '....' is not in UTF-8 - svnadmin load fails

2011-05-30 Thread Torsten Krah
> I would try to edit the dump file with a hexeditor and replace the
> offending two bytes with two spaces (or the proper UTF-8 character
> if you know what should be there and the UTF-8 sequence has the same
> number of bytes).

Ok, lets take some hex editor and get rid of those bad sequences.

>  I hope the number of paths affected by this problem
> is small enough to keep this solution practical.

I'll hope so too - lets see how to split my dump to get hexedit solution
running.

> 
> > If none of the third-party dump manipulation tools can do that,
> 
> ... then we should provide our users with a way of fixing it,
> as we did for e.g. badly encoded revision properties.

That would be a "nice-to-have" feature :).

> Daniel, please keep in mind that this is the *users* list.

Yes.

> Maybe Torsten would like to try this, but I doubt that modifying
> Subversion's code is the kind of advice he was looking for.

You're right ;-)

> And I really don't think that this suggestion is something that people
> who are not familiar with Subversion's code base should attempt to do.
> If people modify the code without understand it well the chances of
> unintentionally breaking things are way too high.
> 

I can try but as you said - i am not familiar with the code base and
i'll bet, things are more worse after my modifications ;-).

> It's bad enough that Torsten has to edit the dump file to fix this.

But i will take this "red pill" to see where the journey ends :-D



smime.p7s
Description: S/MIME cryptographic signature


Re: svnadmin: Path '....' is not in UTF-8 - svnadmin load fails

2011-05-30 Thread Stefan Sperling
On Mon, May 30, 2011 at 11:47:30PM +0200, Torsten Krah wrote:
> If <1.6 did not enforce this and 1.6 does - why does 1.6 not recode it
> at the time it does encounter such "things" - at least via some optional
> command line option?

I think that is something we should add, yes.
We should also make svnadmin verify complain if paths are not in UTF-8.
That is two issues to file into our tracker right there.

Note that svnsync already has this kind of feature to handle badly
encoded revision properties.

> Do you really want to tell me that subversion (the "tool" used to manage
> my code) is not able to load its own "dump", at least by providing some
> "fix" tool by itself if it did things not "right" before - why should i
> need or bother with "third-party" tools here - this should be done by
> svn, shouldn't it?

Note that the API documentation for Subversion has always been saying
that paths are expected to be in UTF-8. It's just that the code didn't
enforce it. What probably happened here is that some third-party client
was used to add this file originally, and this third party client did
not convert the pathname to UTF-8 before sending it to the repository.
The standard svn client has been converting paths to UTF-8 since before 1.0.

Of course, that does not excuse the Subversion server's behaviour.
It should have verified the input and rejected the commit as invalid.
Alas, this verification step was only added in 1.6.

> > If none of the third-party dump manipulation tools can do
> > that, 
> 
> Which "third-party" tools you have in mind are able to do that for me?

A nice one is svndumptool: http://svn.borg.ch/svndumptool/
But it doesn't look like it has the feature you need.

I suppose the kind of corruption problem you are having is very rare.
If this was a common problem there would already be tool support for fixing it.


Re: svnadmin: Path '....' is not in UTF-8 - svnadmin load fails

2011-05-30 Thread Daniel Shahaf
Torsten Krah wrote on Mon, May 30, 2011 at 23:47:30 +0200:
> Am Dienstag, den 31.05.2011, 00:30 +0300 schrieb Daniel Shahaf:
> > Solution is to recode the pathnames (those that are neither in ASCII
> > nor in UTF-8).  
> 
> Sorry but your "solution" seems really a little bit odd to me.
> If <1.6 did not enforce this and 1.6 does - why does 1.6 not recode it
> at the time it does encounter such "things" - at least via some optional
> command line option?
> 
> Do you really want to tell me that subversion (the "tool" used to manage
> my code) is not able to load its own "dump", at least by providing some
> "fix" tool by itself if it did things not "right" before - why should i
> need or bother with "third-party" tools here - this should be done by
> svn, shouldn't it?
> 

As Stefan said, it would be nice if Subversion itself could fix that,
given that old released versions produced such (malformed) filesystems.

To my knowledge, currently there is no code in Subversion itself to do
this, hence my suggestion to use third-party tools.

> > If none of the third-party dump manipulation tools can do
> > that, 
> 
> Which "third-party" tools you have in mind are able to do that for me?
> 

I know there are a couple of dumpfile manipulator tools that are
regularly suggested around this list, but I don't have a specific
recomendation.

One of the other list members might be able to answer this question.


Re: svnadmin: Path '....' is not in UTF-8 - svnadmin load fails

2011-05-30 Thread Daniel Shahaf
Torsten Krah wrote on Mon, May 30, 2011 at 23:47:30 +0200:
> Am Dienstag, den 31.05.2011, 00:30 +0300 schrieb Daniel Shahaf:
> > then you could patch svnsync or one of those tools to do the
> > recoding.  (just inject a filename-recoding editor at the right place)
> 
> Of cause i'll take the source, patch it and get my repo working
> again ... nice joke - it was a joke right?

That's how I'd solve the problem.

But then, I'm not a tech support person but a Subversion committer who
is already familiar with FSFS and dumpstream format.

Stefan Sperling wrote on Mon, May 30, 2011 at 23:54:17 +0200:
> On Tue, May 31, 2011 at 12:30:54AM +0300, Daniel Shahaf wrote:
> > then you could patch svnsync or one of those tools to do the
> > recoding.  (just inject a filename-recoding editor at the right place)
> 
> Daniel, please keep in mind that this is the *users* list.
> Maybe Torsten would like to try this, but I doubt that modifying
> Subversion's code is the kind of advice he was looking for.
> And I really don't think that this suggestion is something that people
> who are not familiar with Subversion's code base should attempt to do.
> If people modify the code without understand it well the chances of
> unintentionally breaking things are way too high.
> 

I'm usually very right-winged on telling people "Don't edit anything
under $REPOS/db/ unless you can score A+ in an oral test on 'structure'
at 3am."

To the case at hand:

* There is probably a tool that allows performing the needed conversion.

* If there isn't, I think writing a "recode fspaths" filter to our API's
  isn't terribly hard (perhaps with some pointers on what API's to start
  at).  What do you refer to by "breaking things"?

> It's bad enough that Torsten has to edit the dump file to fix this.

In the general case, I expect there to be Out There repositories that
contain fspath's in multiple encodings: say, UTF-8 and latin1 (and
possibly latin15 too) in the same filesystem.  That's going to be a mess
to fix no matter what tools you use.


Re: svnadmin: Path '....' is not in UTF-8 - svnadmin load fails

2011-05-30 Thread Daniel Shahaf
> > Maybe Torsten would like to try this, but I doubt that modifying
> > Subversion's code is the kind of advice he was looking for.
> 
> You're right ;-)

I was assuming that someone would point out a tool that does the
recoding at some point in the next 24 hours, which would render this
particular suggestion moot.


Re: svnadmin: Path '....' is not in UTF-8 - svnadmin load fails

2011-05-30 Thread Daniel Shahaf
Stefan Sperling wrote on Tue, May 31, 2011 at 00:08:46 +0200:
> On Mon, May 30, 2011 at 11:47:30PM +0200, Torsten Krah wrote:
> > If <1.6 did not enforce this and 1.6 does - why does 1.6 not recode it
> > at the time it does encounter such "things" - at least via some optional
> > command line option?
> 
> I think that is something we should add, yes.

How would you handle a repository that contains the following
nodes/fspaths:

/foo/bår(in UTF-8)
/foo/bår(in latin1)

?


How would you handle a repository that contains:
/foo/barÉ   (in latin1)
/foo/barŠ   (in latin2)

?


> We should also make svnadmin verify complain if paths are not in UTF-8.

+1.

The validation that 'load' and 'commit' trigger is path_valid() in
fs_loader.c.


Re: svnadmin: Path '....' is not in UTF-8 - svnadmin load fails

2011-05-30 Thread Stefan Sperling
On Tue, May 31, 2011 at 01:41:54AM +0300, Daniel Shahaf wrote:
> How would you handle a repository that contains the following
> nodes/fspaths:
> 
> /foo/bår(in UTF-8)
> /foo/bår(in latin1)
> 
> ?
> 
> 
> How would you handle a repository that contains:
> /foo/barÉ   (in latin1)
> /foo/barŠ   (in latin2)
> 
> ?

All the ISO-8859 (latin) encodings are single-byte encodings.
It's not possible to know what the encoding is supposed to be if
paths in different ISO-8859 encodings entered the repository.
They all decode to different but valid strings of characters.

In the first iteration of this feature I would simply assume one
user-specified source encoding and try to convert data that isn't
UTF-8 from the source encoding to UTF-8.
In case multiple single-byte encodings are present this means that some
characters will be wrong but the repository will work again without
manual intervention. In case multiple multi-byte encodings other than
UTF-8 are present this approach can fail and might require manual fixing
(no worse than the current situation).
This could still be improved upon if necessary.
 
> > We should also make svnadmin verify complain if paths are not in UTF-8.
> 
> +1.
> 
> The validation that 'load' and 'commit' trigger is path_valid() in
> fs_loader.c.

Thanks for the hint. I'm now running tests on a patch for this.


Re: svnadmin: Path '....' is not in UTF-8 - svnadmin load fails

2011-05-30 Thread Daniel Shahaf
#define MBE multi-byte encoding
#defien SBE single-byte encoding

Stefan Sperling wrote on Tue, May 31, 2011 at 01:07:02 +0200:
> On Tue, May 31, 2011 at 01:41:54AM +0300, Daniel Shahaf wrote:
> > How would you handle a repository that contains the following
> > nodes/fspaths:
> > 
> > /foo/bår(in UTF-8)
> > /foo/bår(in latin1)
> > 
> > ?
> > 
> > 
> > How would you handle a repository that contains:
> > /foo/barÉ   (in latin1)
> > /foo/barŠ   (in latin2)
> > 
> > ?
> 
> All the ISO-8859 (latin) encodings are single-byte encodings.
> It's not possible to know what the encoding is supposed to be if
> paths in different ISO-8859 encodings entered the repository.
> They all decode to different but valid strings of characters.
> 
> In the first iteration of this feature I would simply assume one
> user-specified source encoding and try to convert data that isn't
> UTF-8 from the source encoding to UTF-8.
> In case multiple single-byte encodings are present this means that some
> characters will be wrong but the repository will work again without
> manual intervention. In case multiple multi-byte encodings other than
> UTF-8 are present this approach can fail and might require manual fixing
> (no worse than the current situation).
> This could still be improved upon if necessary.

True, I had overlooked these points.

One thing that jumps to mind is to have a list of encodings to
try --- i.e.,

   svnadmin load --recode-paths-from=MBE1,MBE2,SBE

would attempt to interpret paths as UTF-8, failing that as MBE1, failing
that as MBE2, failing that as SBE.

(I know you use vim, so: compare the 'fencs' option in vim).


Re: Disabling automatic setting of svn:executable property

2011-05-30 Thread Nico Kadel-Garcia
On Mon, May 30, 2011 at 2:43 PM, Ryan Schmidt
 wrote:
>
> On May 30, 2011, at 11:26, Nico Kadel-Garcia wrote:
>
>> There's a potential risk with the approach: CygWin uses UNIX
>> compatible end-of-line characters. TortoiseSVN, and other Windows
>> based clients, use Windows end-of-line. The result can be *CHAOS* if
>> you typically set source files, such as .html, .php, or .c, .sh, or
>> .pl files, to use "svn:eol-stile", or expect files to be automatically
>> set in Windows or UNIX style as you switch from programming from a
>> source repository in Windows, and one in CygWin.
>
> *Not* setting svn:eol-style to some value will lead to chaos, as you use 
> different editors with different ideas of what a line ending is, and you 
> start getting files with inconsistent line endings. *Setting* svn:eol-style 
> to some value should prevent said chaos, by

Then the editor, or practice of the developer, is fractured. In this
day of shared network based file systems and replication of developed
components via NFS, CIFS, SCP, and HTTP download, it is a dangerous
presumption that the EOL can be reset on a client system by client
system basis. CygWin is the best example of this: files checked out
and replicated with the CygWin based SVN will have one EOL for such
"clever" approaches, checked out with TortoiseSVN will have another.
The configured EOL approach to this which Subversion supports, as an
option, is hideously dangerous in such environments.

There are a few cases where OS specific EOL is useful, but they're
rare. Markup languages have a standard EOL written in: so does C,
Perl, Ruby, Java, and all the other programming languages. It's only
really useful in poorly implemented configuration files which weren't
written with such a standard, and certain forms of stored text files,
and most editors and display tools can use those just fine. (Wordpad
versus Notepad, for example, works well for the Windows users.)

I've actually seen this play out in C++ and Java and PHP and HTML in
the last 5 years. People checking out repositories on one OS to a
shared network directory, such as their Windows box with TortoiseSVN
for the superior interface, were alarmed to find the code mangled when
they did work with C, or replicated files and tried to import them
elsewhere *without* the same EOL settings. Chaos ensued repeatedly.