Re: filename encodings and conversion failure

2022-12-23 Thread Daniel Sahlberg
Den tors 22 dec. 2022 kl 23:40 skrev Karl Berry :

> Clearly those UTF-8 code points cannot be "converted" by svn to the
> 7-bit ASCII locale that is "C". Fine; I don't expect it to.  Is there a
> way to force svn to complete the checkout anyway? That is, just check
> out the file and let the name be whatever the bytes are. I don't
> understand why any "conversion" by svn is necessary merely to operate on
> files.
>

Not at all related to this issue except it also concerns filenames: It is
possible to commit files with a filename that works on only one platform,
making a checkout/update fail on other platforms.

Example: Commit a file with ? (questionmark) in the filename on Linux and
checkout the file on Windows.

[[[
D:\temp>svn co https://svn.apache.org/repos/private/pmc/subversion/pr/
private_wc
[...]
svn: E155009: Failed to run the WC DB work queue associated with
'D:\temp\private_wc\YYY_folder', work item 54 (file-install XY?Z.html 1 0 1
1)
svn: E720123: Can't move 'D:\temp\private_wc\.svn\tmp\svn-C3A15B21' to
'D:\temp\private_wc\XY?Z.html': The filename, directory name, or volume
label syntax is incorrect.
]]]

(The above example is from the Subversion private repository, I've masked
the actual folders/filenames but it should be reproducible for anyone with
access to the repository).

This is a case where a conversion might /be/ necessary (although I don't
have a concrete idea of what the conversion should be). Or else these files
should just be ignored on checkout.

I'm just mentioning this in case someone looks at the code and decides make
changes to the conversions.

Kind regards,
Daniel


Re: filename encodings and conversion failure

2022-12-23 Thread Daniel Shahaf
Karl Berry wrote on Thu, 22 Dec 2022 22:40 +00:00:
> A file with a name that has some "eight-bit" UTF-8 bytes (fn...-utf8.tex)
> was committed to one of my repositories. When I try to check it out in
> the C locale, svn complains:
>
> $ echo $LC_ALL
> C
> $ svn update
> svn: E22: Can't convert string from 'UTF-8' to native encoding:
> svn: E22: fn{U+00B1}{U+00D7}{U+00F7}{U+00A7}{U+00B6}-utf8.tex
>
> Or, in ls terms:
> $ ls --quoting-style escape fn??*-utf8.tex
> fn\302\261\303\227\303\267\302\247\302\266-utf8.tex
>
> Clearly those UTF-8 code points cannot be "converted" by svn to the
> 7-bit ASCII locale that is "C". Fine; I don't expect it to.  Is there a
> way to force svn to complete the checkout anyway?

Perhaps «export LC_ALL=C.UTF-8», if your platform has that encoding?

Good questions in the rest of the email but I'm ENOTIME to deal with them at 
the moment.

Cheers,

Daniel

> That is, just check
> out the file and let the name be whatever the bytes are. I don't
> understand why any "conversion" by svn is necessary merely to operate on
> files.
>
> Sure, the name may show up as garbage when I do things in my terminal,
> but that's my problem, not svn's. I didn't ask (and don't want) svn to
> convert anything.
>
> Incidentally, this is not about UTF-8 specifically. The same commit
> included names in SJIS and EUC encodings (they are test files for a new
> feature in Japanese TeX). The question is, in general, why svn needs to
> "convert" filenames at all.
>
> I did some searching both in the mailing list archives and on the web,
> to no avail. People had related problems, but I didn't see this (more
> basic) question being asked.
>
> This is with a somewhat old svn that I compiled myself:
> svn, version 1.13.0 (r1867053)
>compiled Nov 10 2019, 18:06:58 on x86_64-unknown-linux-gnu
>
> I'm guessing svn behavior in this regard has not changed since 1.13.0,
> but if I'm wrong about that, sorry for the noise, and I'll happily
> recompile the latest.
>
> Thanks for any info,
> Karl


Re: filename encodings and conversion failure

2022-12-23 Thread Daniel Shahaf
Daniel Sahlberg wrote on Fri, 23 Dec 2022 08:58 +00:00:
> Example: Commit a file with ? (questionmark) in the filename on Linux and
> checkout the file on Windows.

Or case-colliding files:

url=`svn info --show-item=url`
svn mkdir -- $url/foo $url/FOO
svn up

> This is a case where a conversion might /be/ necessary (although I don't
> have a concrete idea of what the conversion should be). Or else these files
> should just be ignored on checkout.
>
> I'm just mentioning this in case someone looks at the code and decides make
> changes to the conversions.

Ditto.

Cheers,

Daniel


Re: filename encodings and conversion failure

2022-12-23 Thread Nico Kadel-Garcia
On Fri, Dec 23, 2022 at 3:58 AM Daniel Sahlberg
 wrote:
>
> Den tors 22 dec. 2022 kl 23:40 skrev Karl Berry :
>>
>> Clearly those UTF-8 code points cannot be "converted" by svn to the
>> 7-bit ASCII locale that is "C". Fine; I don't expect it to.  Is there a
>> way to force svn to complete the checkout anyway? That is, just check
>> out the file and let the name be whatever the bytes are. I don't
>> understand why any "conversion" by svn is necessary merely to operate on
>> files.
>
>
> Not at all related to this issue except it also concerns filenames: It is 
> possible to commit files with a filename that works on only one platform, 
> making a checkout/update fail on other platforms.
>
> Example: Commit a file with ? (questionmark) in the filename on Linux and 
> checkout the file on Windows.

Yes. The source code for HylaFAX had this exact problem, since it had
MixedCaseFileNames.c and Mixedcasefilenames.c . They can be checked
out in the same working copy on UNIX and Linux and MacOS easily, on
Windows it's not so easy due to the "case-insensitive" file systems.

Nico Kadel-Garcia



> [[[
> D:\temp>svn co https://svn.apache.org/repos/private/pmc/subversion/pr/ 
> private_wc
> [...]
> svn: E155009: Failed to run the WC DB work queue associated with 
> 'D:\temp\private_wc\YYY_folder', work item 54 (file-install XY?Z.html 1 0 1 1)
> svn: E720123: Can't move 'D:\temp\private_wc\.svn\tmp\svn-C3A15B21' to 
> 'D:\temp\private_wc\XY?Z.html': The filename, directory name, or volume label 
> syntax is incorrect.
> ]]]
>
> (The above example is from the Subversion private repository, I've masked the 
> actual folders/filenames but it should be reproducible for anyone with access 
> to the repository).
>
> This is a case where a conversion might /be/ necessary (although I don't have 
> a concrete idea of what the conversion should be). Or else these files should 
> just be ignored on checkout.
>
> I'm just mentioning this in case someone looks at the code and decides make 
> changes to the conversions.
>
> Kind regards,
> Daniel


Re: filename encodings and conversion failure

2022-12-23 Thread Karl Berry
Perhaps «export LC_ALL=C.UTF-8», if your platform has that encoding?

Yes, thanks, that is one of the workarounds. But that's not my
question.

My question is, why can't svn just treat the filenames as bytes? I
remain baffled by the need to unconditionally convert to/from UTF-8 (or
any other encoding). Nothing in my environment ("C" in all respects)
says to do this, as far as I know.

We (this is the TeX Live svn repository, by the way) also have the other
problems mentioned so far, case clashes and Windows special characters
also causing trouble. But those seem different in kind to me. Those
problems are induced by the operating system and/or filesystem, and I
don't expect svn to solve them for me.

In contrast, the 
> svn: E22: Can't convert string from 'UTF-8' to native encoding:
error seems to be induced purely by svn on its own.

I expect there is a good reason for the behavior, since svn behavior is
usually sensible. I just can't imagine what that reason is. And I wish
there was a way to override it. Just give me the bytes, dear svn!

Thanks,
Karl