> -----Original Message-----
> From: Daniel Shahaf [mailto:d...@daniel.shahaf.name] 
> Sent: 22 February 2011 09:34
> To: Johan Corveleyn
> Cc: Thomas STEININGER; Stephen Connolly; users@subversion.apache.org
> Subject: Re: Re: Antwort: Re: problem with mutated vowel in 
> log-message-contents
> 
> Daniel Shahaf wrote on Tue, Feb 22, 2011 at 11:26:25 +0200:
> > Johan Corveleyn wrote on Tue, Feb 22, 2011 at 09:43:25 +0100:
> > > So, all that being said, what Daniel means is that you 
> could apply 
> > > something like:
> > > 
> > >     svn propedit --revprop -r $REV --editor-cmd 'perl -pi -e 
> > > "s/\\xfc/\\xc3\\xbc/g"'
> > > 
> > > to all revisions (REV) that need to be corrected (either 
> a list that 
> > > you make up manually, or something automated with "svn propget 
> > > --revprop" combined with "sed", or something similar ...).
> > 
> > By the way, please don't consider this a generic solution.  It's a 
> > *shortcut*, which is probably okay for ü, but WILL corrupt your log 
> > messages if you adapt it for §.
> 
> ... because the latin1 byte sequence for § is part of some 
> UTF-8 byte sequences.

Which is why you should probably use iconv(1) or any of the APIs listed here:

http://www.unicodetools.com/

instead of dicking around with perl or sed and hard coded hand crafted single 
character mappings.  There's potentially a lot more than just u-umlaut to worry 
about.

Tony.

> 
> (I'm assuming that at least some log messages are already in UTF-8.)
> 
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit 
> http://www.messagelabs.com/email 
> ______________________________________________________________________
> 

Reply via email to