On 25/09/2013 07:52, Konstantin Preißer wrote: > Hi all, > >> -----Original Message----- From: kpreis...@apache.org >> [mailto:kpreis...@apache.org] Sent: Tuesday, September 24, 2013 >> 9:11 PM > >> --- tomcat/site/trunk/xdocs/whoweare.xml (original) +++ >> tomcat/site/trunk/xdocs/whoweare.xml Tue Sep 24 19:10:44 2013 @@ >> -100,6 +100,9 @@ A complete list of all the Apache Commit >> <p><b>Costin Manolache</b> (costin at apache.org)<br/></p> <!--Your >> bio goes here--> >> >> +<p><b>Konstantin Preißer</b> (kpreisser at apache.org)<br/></p> > > When editing the whoweare.xml, I wrote the "ß" character (sharp s) > which is now displayed as "ß" in the commit message, because the > source XML file is encoded in UTF-8 (the default encoding for XML > files). > > As far as I understand, SVN needs to treat changes in text files at > byte-level, not at character-level, to be independent from character > encodings. Therefore e.g. ".patch" files don't have a character > encoding as they describe changes at byte-level. > > However, when the Commit E-Mail is sent, the bytes need to be > converted to characters, and it seems the SVN commit diff is > interpreted as ISO-8859-1 (or Windows-1252). Therefore, the UTF-8 > bytes 0xC3 0x9F are displayed as "ß", instead of "ß". > > That would be the preferred way to handle such issues? One way I can > think would be to XML-encode such characters ("ß" as "ß"). > However, personally I would rather not do this, but write such > characters directly ("ß"), so that the source is better readable (and > encodings like UTF-8 guarantee that the characters are interpreted > the same on each system, independently from the system language or > geographic location).
I don't like the idea of using XML encoding at all. > Could it be possible to change SVN Commit E-Mail system so that it > may interpret diffs as UTF-8 instead of ISO-8859-1 (assuming all > files which contain bytes > 0x7F are encoded as UTF-8)? (Or, that it > tries to decode it as UTF-8, and if it fails, decode it as ISO-8859-1 > ?) This is a question for infra. If UTF-8 fails then ISO-8859-1 is going to fail as well. Mark --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org