On Tue, Aug 10, 2010 at 07:44:35PM +0200, Vincent Lefevre wrote: > On 2010-08-10 17:42:57 +0200, Stefan Sperling wrote: > > There are extensions in some systems like Linux, where filename encoding > > can be specified at mount time and a process can query this information. > > But the actual encoding of filenames might still differ (e.g. due to user > > error). But more importantly since there is no common standard I don't > > see how you'd solve this problem in a portable way. > > This is easy (at least from the specification point of view): once the > encoding has been determined[*], typically at checkout time, store the > encoding in the WC metadata (with the current WC layout, that would be > some file under the .svn directory), so that the next time the svn > client is used for this WC, the same encoding will be used, avoiding > inconsistencies (such as currently obtained by two "svn up" under two > different locales).
I doubt this can be made to work properly. A feature like that is just asking people to shoot themselves in the foot. People simply should not mix character sets like that in their working copies. There should be a project-wide convention about the encoding used for filenames, and everyone should be using that encoding (unless there really is a project-specific need to have filenames in multiple encodings for some reason, but that's really rare -- and whoever does this should be smart enough to deal with the consequences). Right now, if the filename cannot be represented in the current locale, you get this error: "svn: Can't convert string from 'UTF-8' to native encoding" The native encoding is determined by the locale, but that does not matter. The point is that, wherever encoding configuration happens to come from, if the configured encoding cannot represent the character string stored as UTF-8 in the repository, what is Subversion supposed to do? It cannot really do anything with a filename it cannot represent in the character set configured by the user, other than throwing an error. The filename conversion to UTF-8 and back must not be lossy. Because to uniquely identify a file the client needs to send the same UTF-8 byte sequence it got from the server back to the server. And it needs to keep doing so for backwards compatibility. This is biting us on Mac OS X by the way, because some characters have multiple representations in UTF-8, see http://subversion.tigris.org/issues/show_bug.cgi?id=2464 > [*] There are several ways to do that, such as: > 1. Use a charset specified by the user in the svn config file. That provides no advantage over checking the current locale. > 2. Use the current locale. That's what's being done. But we're not writing the information down in the working copy meta data, and doing so is quite pointless as described above. Stefan