Package: libwww-mediawiki-client-perl Version: 0.31-2+wuth3 Severity: important Tags: patch
Dear Maintainer, When working with mediawiki 1.19.5-1 (Wheezy, and perhaps earlier) the way the server transmits its character encoding has changed. It no longer specifies the character encoding in the content of the page but instead used an HTTP header. mvs refuses to upload any changes to a mediawiki page without first determining the character encoding, which it no longer knows how to do. The attached patch fixes this issue. -- System Information: Debian Release: 7.0 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: i386 (i686) Kernel: Linux 3.2.0-0.bpo.2-686-pae (SMP w/2 CPU cores) Locale: LANG=en_CA.UTF-8, LC_CTYPE=en_CA.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/bash Versions of packages libwww-mediawiki-client-perl depends on: ii libexception-class-perl 1.32-1 ii libvcs-lite-perl 0.09-1 ii libwww-perl 6.04-1 ii libxml-libxml-perl 2.0001+dfsg-1 ii perl 5.14.2-21 libwww-mediawiki-client-perl recommends no packages. libwww-mediawiki-client-perl suggests no packages. -- no debconf information
Index: libwww-mediawiki-client-perl-0.31/lib/WWW/Mediawiki/Client.pm =================================================================== --- libwww-mediawiki-client-perl-0.31.orig/lib/WWW/Mediawiki/Client.pm 2013-07-08 16:37:23.000000000 -0600 +++ libwww-mediawiki-client-perl-0.31/lib/WWW/Mediawiki/Client.pm 2013-07-08 17:23:03.000000000 -0600 @@ -1564,19 +1564,36 @@ } sub _get_server_encoding { + # Determine the character set used by the server. + # Assumes the same character set is used on all pages. + my ($self) = @_; + + # Get a sample page my $url = $self->_get_version_url; my $res = $self->{ua}->get($url); + + # Use the character set in the response header, if provided + my $contenttypeheader = $res->header( 'content-type' ); + # if defined will be like: "text/html; charset=UTF-8" + my $charsetheader = $contenttypeheader; + $charsetheader =~ m/charset=(.*)/; + $charsetheader = $1; + return $charsetheader if ($charsetheader); + + # No character set defined in the header. Look instead in the content. my $doc = $res->decoded_content; my $p = HTML::TokeParser->new(\$doc); while ( my $t = $p->get_tag("meta") ) { next unless defined $t->[1]->{'http-equiv'} - and ($t->[1]->{'http-equiv'} eq 'Content-Type' - or $t->[1]->{'http-equiv'} eq 'Content-type'); + and ($t->[1]->{'http-equiv'} eq 'Content-Type' + or $t->[1]->{'http-equiv'} eq 'Content-type'); my $cont = $t->[1]->{'content'}; $cont =~ m/charset=(.*)/; return $1; } + + # No character set found anywhere. Return nothing. } sub _get_page_headline {