Daniel Shahaf <d.s <at> daniel.shahaf.name> writes: > > LiuYan 刘研 wrote on Thu, Nov 18, 2010 at 02:53:37 +0000: > > Daniel Shahaf <d.s <at> daniel.shahaf.name> writes: > > > > > > > > Stefan Sperling wrote on Wed, Nov 17, 2010 at 18:13:44 +0100: > > > > On Wed, Nov 17, 2010 at 03:06:19PM +0000, LiuYan 刘研 wrote: > > > > > I mean, if the revprops files are not in UTF-8 encoding, don't return > > REPORT > > > > > > Small correction: it's meaningless to talk about the encoding of the > > > revprop files; it's only meaningful to talk about the encoding of the > > > value of a given property. > > > > > > (At the revprop files level, the values are binary, and the rest of the > > > data in those files is always ASCII.) > > > > > > > > > > You're right Daniel, but in such situation, these revprop files can be treated > > as readable text files: > > This is simply not true: if you apply 'iconv -f latin1 -t utf-8' to > a revprop file, you will CORRUPT that revprop file. > >
You're right Daniel, simply apply an 'iconv' operation to a revprop file will CORRUPT it, there's data length value should be changed too. So I wrote a small script to do the conversion as I mentioned at http://article.gmane.org/gmane.comp.version-control.subversion.user/101383 The script do the following operations: 1. find out the affected revprop files 2. change the svn:log value length from "V 85" to "V 98" 3. change/convert the svn:log value to a UTF-8 encoded string Here's the small script, and be aware of this script file is in GBK encoding. administra...@cmtel-svr-hr-db /cygdrive/d/SVNRepositories/repos/cmcc/db $ cat fix-cvs2svn.sh IFS=$'\n' grep -i -r -n "Standard project directories initialized by cvs2svn" revprops/* | cut -d ":" -f 1 > affected_files.txt #grep result sample #0/1:8:Standard project directories initialized by cvs2svn.由 cvs2svn #0/133:8:Standard project directories initialized by cvs2svn.由 cvs2svn for file in `cat affected_files.txt` do echo $file #${file:9}: strip out 'revprops/' dest_file="fix/${file:9}" cp --force --preserve --verbose $file "fix-backup/${file:9}" gawk 'FNR==7 {print "V 98"} FNR==8{print "Standard project directories initialized by cvs2svn."} FNR==9{print "由 cvs2svn 初始化的标准项目文件夹"} FNR<7 || FNR==10 {print $0}' $file | iconv --from GBK --to UTF-8 > "$dest_file" done