On 12.08.2015 00:11, Dave Huang wrote: > On Aug 11, 2015, at 15:35, Branko Čibej <br...@wandisco.com> wrote: >> On 10.08.2015 18:46, Attila Soki wrote: >>> hi, >>> >>> i saw the entry "reimplement UTF-8 fuzzy conversion using utf8proc >>> (r1511676)" >>> in the changelog and hoped this would be the fix for >>> http://subversion.tigris.org/issues/show_bug.cgi?id=2464 >>> >>> but after a quick test it seems to be still broken. >> In my not even a bit humble opinion, what's broken is Apple's HFS, not >> Subversion. > Exactly what is broken in Apple's HFS? MacOS uses one of the Unicode > Normalization Forms. Perhaps it's not the same one that Windows uses, but > there's nothing wrong with that.
Yay for misunderstandings. :) The problem with HFS is that it normalizes paths: regardless of how your file names are (de)normalised when you create them, they're stored in HFS in NFD form. For example, if someone on Linux or Windows creates a file named "grölsch" and commits it, the Subversion client on the Mac will get a broken working copy on the next update: you'll see "grölsch" on disk and "grölsch" in the working copy database, but they'll be different strings. FWIW, HFS is the only filesystem I'm aware of that does this. Every other filesystem, including all Windows filesystems, store and return paths in the exact form they're given. This is true of mounted filesystems on OSX, too; if you mount a remote ext4 filesystem via NFS, it will behave differently in this respect than a native HFS volume. The problem isn't even specific to Subversion; it's encountered by any software on OSX that has to interact with other filesystems. This is broken. The filesystem should not be in the business of changing the (meta)data that it's supposed to store. > While it's unfortunate that SVN didn't handle this correctly from the start, > it doesn't make it Apple's fault. See above. It's a fundamental design bug that ignores the common sense of all other filesystem implementations. > Unicode 2.0 talked about normalization/canonicalization in 1996, and TR 15 > has been around since about the same time--both predating SVN's development > by years. Of course, most people weren't thinking about Unicode back then, > and a filename was considered to be some opaque string of bytes, so I don't > particularly blame SVN either. If anything, Unicode should've just declared > one canonical form instead of giving options. But while HFS(+) is old and is > due for an overhaul, its use of Unicode NFD isn't broken. So I'll skip commenting on all this because it's based on a fundamental misunderstanding of what we're seeing here. Suffice it to say that normalizing Unicode representations in databases is a very, very bad idea. The bottom line is: to work around this bug, Subversion needs to make changes on both the client side, which implies rather fundamental changes in the working copy structure; and on the server side, to handle requests made by older clients. I'm working on this, but slowly because the changes are potentially very destructive and there are other, far more important things to do. -- Brane