Geir,

I looked through some scripts that I wrote to help me sync the GNU
Nano repository and I came across a Perl script that might be useful
to you in quickly identifying all log messages that are not
representable in ASCII (hence possibly not UTF-8).

Attached is the source of the script. To use it, you will need the
libsvn Perl bindings (on Debian, install the `libsvn-perl` package),
and you will need to edit line 20 to change the URL of the Subversion
repository that you wish to examine.

Example output for svn://svn.sv.gnu.org/nano is:
------------------------------------------------------------------------
r619
Added Galician translation by Jacobo Tarr<jtar...@trasno.net>.

------------------------------------------------------------------------
r757
Updated Galician translation; thanks, Jacobo Tarr

------------------------------------------------------------------------
r826
Galician translation brought up to date for 1.1.2 by Jacobo Tarr

------------------------------------------------------------------------
r954
Galician translation update (Jacobo Tarr.

------------------------------------------------------------------------
r958
French translation update (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r962
French translation update (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1009
Moved no.po to nn.po.
New Norwegian bokm欠translation, by Stig E Sandoe <s...@ii.uib.no>.
Updated Norwegian nynorsk translation, by Kjetil Torgrim Homme
<kjeti...@linpro.no>.

------------------------------------------------------------------------
r1013
Moved no.po to nn.po.
New Norwegian bokm欠translation, by Stig E sand𠼳...@users.sourceforge.net>.
Added missing entries to THANKS.

------------------------------------------------------------------------
r1047
French translation updates (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1070
Norwegian bokm欠translation updates (Stig E Sandoe).

------------------------------------------------------------------------
r1071
Norwegian bokm欠translation updates (Stig E Sand𩮍

------------------------------------------------------------------------
r1072
Norwegian bokm欠translation updates (Stig E Sand𩮍

------------------------------------------------------------------------
r1125
French translation updates (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1133
French translation updates (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1258
French translation update (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1259
Spanish translation updates (Ricardo Javier Cⳤenes Medina).

------------------------------------------------------------------------
r1299
Updated Spanish translation (Ricardo Javier Cⳤenes Medina).

------------------------------------------------------------------------
r1301
Updated French translation (Jean-Philippe Gu곡rd).

------------------------------------------------------------------------
r1500
Updated French translation by Jean-Philippe Gu곡rd.

------------------------------------------------------------------------
r1537
Updated French translation by Jean-Philippe Gu곡rd.

------------------------------------------------------------------------
r1923
Updated French translation by Jean-Philippe Guérard.

------------------------------------------------------------------------
r2102
spell Ulf H峮hammar's name right

------------------------------------------------------------------------
r2373
in do_credits(), display Florian König's name properly in UTF-8 mode;
since we can't dynamically set that element of the array to its UTF-8
equivalent when in UTF-8 mode, we have to use the ISO-8859-1 version and
pass every string in the credits through make_mbstring() to make sure
they're all UTF-8 (sigh)

------------------------------------------------------------------------
r2784
rework the credits handling to display Florian König's name properly
whether we're in a UTF-8 locale or not.  This requires a minor hack, but
it's better than requiring a massive function that we only use once

------------------------------------------------------------------------
r2898
Update French manpages by Jean-Philippe Guérard.

------------------------------------------------------------------------
r3924
Update French manpages by Jean-Philippe Guérard.

------------------------------------------------------------------------
r4181
per Jean-Philippe Guérard's updates, in doc/man/fr/*.1,
doc/man/fr/nanorc.5, fix copyright notices; the copyrights are
disclaimed on these translations, but the copyrights of the untranslated
works also apply

------------------------------------------------------------------------
r4182
per Jean-Philippe Guérard's updates, in doc/man/fr/*.1,
doc/man/fr/nanorc.5, fix copyright notices; the copyrights are
disclaimed on these translations, but the copyrights of the untranslated
works also apply

------------------------------------------------------------------------
r4208
in print_opt_full(), use strlenpt() instead of strlen(), so that tabs
are placed properly when displaying translated strings in UTF-8, as
found by Jean-Philippe Guérard

------------------------------------------------------------------------

The corrupted-looking entries are the ones where the log message is
incorrectly stored in ISO-8859-1.
#! /usr/bin/env perl
use strict;
use warnings;

use Encode qw( from_to );
use SVN::Ra;

sub is_ascii {
  my @chars = split(//, shift);
  
  for my $c (@chars) {
    if (ord($c) >= 128) {
      return 0;
    }
  }
  
  1;
}

my $ra = SVN::Ra->new("svn://svn.sv.gnu.org/nano");

$ra->get_log('', 1, $ra->get_latest_revnum, 0, 1, 0, sub {
    my ($paths, $rev_num, $user, $datetime, $log_msg) = @_;
    
    if (not is_ascii($log_msg)) {
      print 
"------------------------------------------------------------------------\n";
      print "r", $rev_num, "\n";
      print $log_msg, "\n";
    }
  });

print 
"------------------------------------------------------------------------\n";

Reply via email to