Daniel Shahaf <d.s <at> daniel.shahaf.name> writes:

> 
> LiuYan 刘研 wrote on Thu, Nov 18, 2010 at 02:53:37 +0000:
> > Daniel Shahaf <d.s <at> daniel.shahaf.name> writes:
> > 
> > > 
> > > Stefan Sperling wrote on Wed, Nov 17, 2010 at 18:13:44 +0100:
> > > > On Wed, Nov 17, 2010 at 03:06:19PM +0000, LiuYan 刘研 wrote:
> > > > > I mean, if the revprops files are not in UTF-8 encoding, don't return 
> > REPORT 
> > > 
> > > Small correction: it's meaningless to talk about the encoding of the
> > > revprop files; it's only meaningful to talk about the encoding of the
> > > value of a given property.
> > > 
> > > (At the revprop files level, the values are binary, and the rest of the
> > > data in those files is always ASCII.)
> > > 
> > > 
> > 
> > You're right Daniel, but in such situation, these revprop files can be 
treated 
> > as readable text files:
> 
> This is simply not true: if you apply 'iconv -f latin1 -t utf-8' to
> a revprop file, you will CORRUPT that revprop file.
> 
> 

You're right Daniel, simply apply an 'iconv' operation to a revprop file will 
CORRUPT it, there's data length value should be changed too.
So I wrote a small script to do the conversion as I mentioned at 
http://article.gmane.org/gmane.comp.version-control.subversion.user/101383

The script do the following operations:
1. find out the affected revprop files
2. change the svn:log value length from "V 85" to "V 98"
3. change/convert the svn:log value to a UTF-8 encoded string

Here's the small script, and be aware of this script file is in GBK encoding.

administra...@cmtel-svr-hr-db /cygdrive/d/SVNRepositories/repos/cmcc/db
$ cat fix-cvs2svn.sh
IFS=$'\n'
grep -i -r -n "Standard project directories initialized by cvs2svn" revprops/* 
| cut -d ":" -f 1 > affected_files.txt
#grep result sample
#0/1:8:Standard project directories initialized by cvs2svn.由 cvs2svn
#0/133:8:Standard project directories initialized by cvs2svn.由 cvs2svn

for file in `cat affected_files.txt`
do
        echo $file

        #${file:9}: strip out 'revprops/'
        dest_file="fix/${file:9}"

        cp --force --preserve --verbose $file "fix-backup/${file:9}"

        gawk 'FNR==7 {print "V 98"} FNR==8{print "Standard project directories 
initialized by cvs2svn."} FNR==9{print "由 cvs2svn 初始化的标准项目文件夹"} FNR<7 
|| FNR==10 {print $0}' $file | iconv --from GBK --to UTF-8 > "$dest_file"
done


Reply via email to