On 2010-08-11 13:42:35 +0200, Stefan Sperling wrote: > On Wed, Aug 11, 2010 at 12:31:48AM +0200, Vincent Lefevre wrote: > > On 2010-08-10 20:59:00 +0200, Stefan Sperling wrote: > > > Right now, if the filename cannot be represented in the current locale, > > > you get this error: "svn: Can't convert string from 'UTF-8' to native > > > encoding" > > > > which is bad and prevents users from writing POSIX-conforming scripts > > using svn, i.e. under the POSIX locale (except on systems where the > > POSIX locale uses UTF-8, but I don't know any). > > There's no reason your script could not configure a UTF-8 locale if that > is needed to represent filenames which exist in the repository.
Configuring a UTF-8 locale can yield non-portable behavior. There's a good reason why various scripts do a "LC_ALL=C". Moreover there's no portable way to select a UTF-8 locale. And the POSIX API doesn't need a UTF-8 locale to handle filenames with top-bit-set bytes. > We agree on the point that Subversion should use a single character > set for all filenames in the same working copy. > Because how should Subversion behave if some filenames convert fine to > the current character set, and some do not? E.g. what if my encoding > configuration setting is en_US.ISO8859-1? Should Subversion use ISO8859-1 > for some filenames, and UTF-8 for those which cannot be represented in > ISO8859-1? That gets really confusing. > > It seems that this conversation leads to the question of why Subversion > even bothers with checking the locale at all. It might as well always > create filenames in UTF-8, and leave the user with apparently mangled > filenames if they don't use a UTF-8 locale. > > But that isn't a solution either, because now you have lots of > non-UTF-8 users complaining that Subversion cannot represent their > filenames properly, where previously it worked fine. That's why I suggested the encoding to be configurable. > > It's not pointless, or at least, something else needs to be done. > > Currently "svn up" fails to work, and that's a problem. > > It doesn't fail if locales are used consistently. It fails even if locales are used consistently. > I don't think this problem is specific to Subversion. I haven't seen such problems with other tools. > Other tools also suffer from the fact that POSIX doesn't specify a > standard for defining filename encodings. Maybe we can find a good > solution by looking around at how other tools handle this. Most tools just ignore the encoding of filenames. > However, I'd expect many will just assume that the user wants filenames > to be encoded according to the current locale. > If everybody follows this convention, there is no problem, apart from > user errors during locale configuration. You're asking the user, and even all users on the system where the files are shared, to stick with a single locale. This is not acceptable, this is contrary to POSIX requirements, and is also a problem for SSH (where the user needs to use the same charset on both sides). Under these conditions, the only possibility is to encode the filenames in UTF-8 anyway. So, why not enforcing that? -- Vincent Lefèvre <vinc...@vinc17.net> - Web: <http://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)