Package: libwww-mediawiki-client-perl
Version: 0.31-2+wuth3
Severity: important
Tags: patch

Dear Maintainer,

When working with mediawiki 1.19.5-1 (Wheezy, and perhaps earlier) the
way the server transmits its character encoding has changed.  It no
longer specifies the character encoding in the content of the page but
instead used an HTTP header.

mvs refuses to upload any changes to a mediawiki page without first
determining the character encoding, which it no longer knows how to do.

The attached patch fixes this issue.

-- System Information:
Debian Release: 7.0
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 3.2.0-0.bpo.2-686-pae (SMP w/2 CPU cores)
Locale: LANG=en_CA.UTF-8, LC_CTYPE=en_CA.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages libwww-mediawiki-client-perl depends on:
ii  libexception-class-perl  1.32-1
ii  libvcs-lite-perl         0.09-1
ii  libwww-perl              6.04-1
ii  libxml-libxml-perl       2.0001+dfsg-1
ii  perl                     5.14.2-21

libwww-mediawiki-client-perl recommends no packages.

libwww-mediawiki-client-perl suggests no packages.

-- no debconf information
Index: libwww-mediawiki-client-perl-0.31/lib/WWW/Mediawiki/Client.pm
===================================================================
--- libwww-mediawiki-client-perl-0.31.orig/lib/WWW/Mediawiki/Client.pm	2013-07-08 16:37:23.000000000 -0600
+++ libwww-mediawiki-client-perl-0.31/lib/WWW/Mediawiki/Client.pm	2013-07-08 17:23:03.000000000 -0600
@@ -1564,19 +1564,36 @@
 }
 
 sub _get_server_encoding {
+    # Determine the character set used by the server.
+    # Assumes the same character set is used on all pages.
+
     my ($self) = @_;
+
+    # Get a sample page
     my $url = $self->_get_version_url;
     my $res = $self->{ua}->get($url);
+
+    # Use the character set in the response header, if provided
+    my $contenttypeheader = $res->header( 'content-type' );
+    # if defined will be like: "text/html; charset=UTF-8"
+    my $charsetheader = $contenttypeheader;
+    $charsetheader =~ m/charset=(.*)/;
+    $charsetheader = $1;
+    return $charsetheader if ($charsetheader);
+
+    # No character set defined in the header.  Look instead in the content.
     my $doc = $res->decoded_content;
     my $p = HTML::TokeParser->new(\$doc);
     while ( my $t = $p->get_tag("meta") ) {
         next unless defined $t->[1]->{'http-equiv'}
-     and ($t->[1]->{'http-equiv'} eq 'Content-Type'
-     or $t->[1]->{'http-equiv'} eq 'Content-type');
+          and ($t->[1]->{'http-equiv'} eq 'Content-Type'
+               or $t->[1]->{'http-equiv'} eq 'Content-type');
         my $cont = $t->[1]->{'content'};
         $cont =~ m/charset=(.*)/;
         return $1;
     }
+
+    # No character set found anywhere.  Return nothing.
 }
 
 sub _get_page_headline {

Reply via email to