Re: SVN Blame Returns Corrupt Data

Stefan Sperling Fri, 11 Oct 2013 10:27:06 -0700

On Fri, Oct 11, 2013 at 09:52:31AM -0700, Ben Reser wrote:
> On 10/11/13 9:22 AM, Branko Čibej wrote:
> > You'd have to extend Subversion's file type detection to detect UTF-16.
> > See svn_io_detect_mimetype2 in line 3333 in this file:
> > 
> > http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_subr/io.c?view=markup
> > Subversion currently only looks at the first 1k Bytes of a file. It may
> > be enough to check that this initial part of the file contains only
> > valid UTF-16 (BE or LE) codes.
> 
> Even if all we looked for is the BOM it might be helpful enough.  I suspect 
> the
> development tools producing UTF-16 are including BOMs.  Windows seems to be
> fond of including them, Notepad puts one even on UTF-8.


Couldn't Subversion automatically convert UTF-16 files to UTF-8 before
processing them for diff/merge/blame, and convert output written to
the original files back to UTF-16?

That would require some work because existing streams, strings, and files
passed around in the code would need to be wrapped so that translation
to/from the internal from/to the external encoding is seamless.

But I don't see why such an approach couldn't be made to work in principle.
It might even result in some spring cleaning in the code base and pave the
way for improved handling of file formats such as XML for diff and merge.

What do you think? Is it worth adding this to our project ideas page?

Re: SVN Blame Returns Corrupt Data

Reply via email to