Re: filename encodings and conversion failure
Den tors 22 dec. 2022 kl 23:40 skrev Karl Berry : > Clearly those UTF-8 code points cannot be "converted" by svn to the > 7-bit ASCII locale that is "C". Fine; I don't expect it to. Is there a > way to force svn to complete the checkout anyway? That is, just check > out the file and let the name be whatever the bytes are. I don't > understand why any "conversion" by svn is necessary merely to operate on > files. > Not at all related to this issue except it also concerns filenames: It is possible to commit files with a filename that works on only one platform, making a checkout/update fail on other platforms. Example: Commit a file with ? (questionmark) in the filename on Linux and checkout the file on Windows. [[[ D:\temp>svn co https://svn.apache.org/repos/private/pmc/subversion/pr/ private_wc [...] svn: E155009: Failed to run the WC DB work queue associated with 'D:\temp\private_wc\YYY_folder', work item 54 (file-install XY?Z.html 1 0 1 1) svn: E720123: Can't move 'D:\temp\private_wc\.svn\tmp\svn-C3A15B21' to 'D:\temp\private_wc\XY?Z.html': The filename, directory name, or volume label syntax is incorrect. ]]] (The above example is from the Subversion private repository, I've masked the actual folders/filenames but it should be reproducible for anyone with access to the repository). This is a case where a conversion might /be/ necessary (although I don't have a concrete idea of what the conversion should be). Or else these files should just be ignored on checkout. I'm just mentioning this in case someone looks at the code and decides make changes to the conversions. Kind regards, Daniel
Re: filename encodings and conversion failure
Karl Berry wrote on Thu, 22 Dec 2022 22:40 +00:00: > A file with a name that has some "eight-bit" UTF-8 bytes (fn...-utf8.tex) > was committed to one of my repositories. When I try to check it out in > the C locale, svn complains: > > $ echo $LC_ALL > C > $ svn update > svn: E22: Can't convert string from 'UTF-8' to native encoding: > svn: E22: fn{U+00B1}{U+00D7}{U+00F7}{U+00A7}{U+00B6}-utf8.tex > > Or, in ls terms: > $ ls --quoting-style escape fn??*-utf8.tex > fn\302\261\303\227\303\267\302\247\302\266-utf8.tex > > Clearly those UTF-8 code points cannot be "converted" by svn to the > 7-bit ASCII locale that is "C". Fine; I don't expect it to. Is there a > way to force svn to complete the checkout anyway? Perhaps «export LC_ALL=C.UTF-8», if your platform has that encoding? Good questions in the rest of the email but I'm ENOTIME to deal with them at the moment. Cheers, Daniel > That is, just check > out the file and let the name be whatever the bytes are. I don't > understand why any "conversion" by svn is necessary merely to operate on > files. > > Sure, the name may show up as garbage when I do things in my terminal, > but that's my problem, not svn's. I didn't ask (and don't want) svn to > convert anything. > > Incidentally, this is not about UTF-8 specifically. The same commit > included names in SJIS and EUC encodings (they are test files for a new > feature in Japanese TeX). The question is, in general, why svn needs to > "convert" filenames at all. > > I did some searching both in the mailing list archives and on the web, > to no avail. People had related problems, but I didn't see this (more > basic) question being asked. > > This is with a somewhat old svn that I compiled myself: > svn, version 1.13.0 (r1867053) >compiled Nov 10 2019, 18:06:58 on x86_64-unknown-linux-gnu > > I'm guessing svn behavior in this regard has not changed since 1.13.0, > but if I'm wrong about that, sorry for the noise, and I'll happily > recompile the latest. > > Thanks for any info, > Karl
Re: filename encodings and conversion failure
Daniel Sahlberg wrote on Fri, 23 Dec 2022 08:58 +00:00: > Example: Commit a file with ? (questionmark) in the filename on Linux and > checkout the file on Windows. Or case-colliding files: url=`svn info --show-item=url` svn mkdir -- $url/foo $url/FOO svn up > This is a case where a conversion might /be/ necessary (although I don't > have a concrete idea of what the conversion should be). Or else these files > should just be ignored on checkout. > > I'm just mentioning this in case someone looks at the code and decides make > changes to the conversions. Ditto. Cheers, Daniel
Re: filename encodings and conversion failure
On Fri, Dec 23, 2022 at 3:58 AM Daniel Sahlberg wrote: > > Den tors 22 dec. 2022 kl 23:40 skrev Karl Berry : >> >> Clearly those UTF-8 code points cannot be "converted" by svn to the >> 7-bit ASCII locale that is "C". Fine; I don't expect it to. Is there a >> way to force svn to complete the checkout anyway? That is, just check >> out the file and let the name be whatever the bytes are. I don't >> understand why any "conversion" by svn is necessary merely to operate on >> files. > > > Not at all related to this issue except it also concerns filenames: It is > possible to commit files with a filename that works on only one platform, > making a checkout/update fail on other platforms. > > Example: Commit a file with ? (questionmark) in the filename on Linux and > checkout the file on Windows. Yes. The source code for HylaFAX had this exact problem, since it had MixedCaseFileNames.c and Mixedcasefilenames.c . They can be checked out in the same working copy on UNIX and Linux and MacOS easily, on Windows it's not so easy due to the "case-insensitive" file systems. Nico Kadel-Garcia > [[[ > D:\temp>svn co https://svn.apache.org/repos/private/pmc/subversion/pr/ > private_wc > [...] > svn: E155009: Failed to run the WC DB work queue associated with > 'D:\temp\private_wc\YYY_folder', work item 54 (file-install XY?Z.html 1 0 1 1) > svn: E720123: Can't move 'D:\temp\private_wc\.svn\tmp\svn-C3A15B21' to > 'D:\temp\private_wc\XY?Z.html': The filename, directory name, or volume label > syntax is incorrect. > ]]] > > (The above example is from the Subversion private repository, I've masked the > actual folders/filenames but it should be reproducible for anyone with access > to the repository). > > This is a case where a conversion might /be/ necessary (although I don't have > a concrete idea of what the conversion should be). Or else these files should > just be ignored on checkout. > > I'm just mentioning this in case someone looks at the code and decides make > changes to the conversions. > > Kind regards, > Daniel
Re: filename encodings and conversion failure
Perhaps «export LC_ALL=C.UTF-8», if your platform has that encoding? Yes, thanks, that is one of the workarounds. But that's not my question. My question is, why can't svn just treat the filenames as bytes? I remain baffled by the need to unconditionally convert to/from UTF-8 (or any other encoding). Nothing in my environment ("C" in all respects) says to do this, as far as I know. We (this is the TeX Live svn repository, by the way) also have the other problems mentioned so far, case clashes and Windows special characters also causing trouble. But those seem different in kind to me. Those problems are induced by the operating system and/or filesystem, and I don't expect svn to solve them for me. In contrast, the > svn: E22: Can't convert string from 'UTF-8' to native encoding: error seems to be induced purely by svn on its own. I expect there is a good reason for the behavior, since svn behavior is usually sensible. I just can't imagine what that reason is. And I wish there was a way to override it. Just give me the bytes, dear svn! Thanks, Karl