#define MBE multi-byte encoding #defien SBE single-byte encoding Stefan Sperling wrote on Tue, May 31, 2011 at 01:07:02 +0200: > On Tue, May 31, 2011 at 01:41:54AM +0300, Daniel Shahaf wrote: > > How would you handle a repository that contains the following > > nodes/fspaths: > > > > /foo/bår (in UTF-8) > > /foo/bår (in latin1) > > > > ? > > > > > > How would you handle a repository that contains: > > /foo/barÉ (in latin1) > > /foo/barŠ (in latin2) > > > > ? > > All the ISO-8859 (latin) encodings are single-byte encodings. > It's not possible to know what the encoding is supposed to be if > paths in different ISO-8859 encodings entered the repository. > They all decode to different but valid strings of characters. > > In the first iteration of this feature I would simply assume one > user-specified source encoding and try to convert data that isn't > UTF-8 from the source encoding to UTF-8. > In case multiple single-byte encodings are present this means that some > characters will be wrong but the repository will work again without > manual intervention. In case multiple multi-byte encodings other than > UTF-8 are present this approach can fail and might require manual fixing > (no worse than the current situation). > This could still be improved upon if necessary.
True, I had overlooked these points. One thing that jumps to mind is to have a list of encodings to try --- i.e., svnadmin load --recode-paths-from=MBE1,MBE2,SBE would attempt to interpret paths as UTF-8, failing that as MBE1, failing that as MBE2, failing that as SBE. (I know you use vim, so: compare the 'fencs' option in vim).
