Package: libwww-perl Version: 5.813-1 Severity: important
Here is what I tried to do: ------------cut------------ #!/usr/bin/perl use strict; use warnings; use encoding 'iso-8859-2'; use Encode; use LWP::UserAgent; use HTTP::Request; my $POST_URL = "http://somewhere.net/webservice.php"; my $xml = <<"EOT"; <?xml version="1.0" encoding="utf-8" ?> <PACKET> <TEXT>Árvíztûrõ tükörfúrógép</TEXT> </PACKET> EOT my $ua = LWP::UserAgent->new(); my $request = HTTP::Request->new('POST', $POST_URL); my $content = encode('utf-8', $xml); $request->header('Content-Type' => 'text/xml; charset=utf-8'); $request->header('Content-Length' => length($content)); $request->content($content); my $response = $ua->request($request); ------------cut------------ Here is what I get when Perl tries to execute the last line: ------------cut------------ failed: 500 Wide character in syswrite Content-Type: text/plain Client-Date: Wed, 25 Mar 2009 13:21:30 GMT Client-Warning: Internal response 500 Wide character in syswrite ------------cut------------ The message in the <TEXT> tag is a test phrase containing all possible accented characters in the Hungarian language. It is encoded as 'iso-8859-2' in the source file. Thanks to the 'use encoding' pragma this is converted to character semantics (utf8 flag on) when Perl reads the source. After some bughunting, I identified the source of the problem in /usr/share/perl5/LWP/Protocol/http.pm: 202: my $req_buf = $socket->format_request($method, $fullpath, @h); ... 235: if ($has_content) { ... 249: my $buf = $req_buf . $$content_ref; # <--- HERE If $$content_ref contains a byte-string (a string with byte semantics) and $req_buf is a character-string (a string with character semantics) then upon concatenation, $$content_ref will be converted to character semantics with the default 'iso-8859-1' encoding (this conversion happens even if $req_buf contains only ASCII characters). In my example, this means that Perl converts my utf-8 encoded test phrase to a string that contains consecutive bytes of utf-8 sequences masquerading as separate characters. What I don't understand: LWP::UserAgent should be able to send the resulting - "semantically" wrong, but "syntactically" right - string over the wire, as it contains only characters with code points < 256. So I still don't understand where those "wide characters" - which I assume to be characters with code points >= 256 - are coming from. Anyway, the problem can be resolved with the following lines added after line #202: my $req_buf = $socket->format_request($method, $fullpath, @h); use Encode; if (Encode::is_utf8($req_buf)) { Encode::_utf8_off($req_buf); } This simply makes sure that the buffer storing the HTTP headers does not have the 'utf8' flag turned on. I can only hope that the $req_buf returned by format_request does not contain non-ASCII characters (it shouldn't). With this change, the concatenation above does not touch $$content_ref and the request gets posted without errors. -- System Information: Debian Release: 5.0 APT prefers stable APT policy: (500, 'stable') Architecture: i386 (i686) Kernel: Linux 2.6.28.7prana (PREEMPT) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/bash Versions of packages libwww-perl depends on: ii libhtml-parser-perl 3.56-1+b1 A collection of modules that parse ii libhtml-tagset-perl 3.20-2 Data tables pertaining to HTML ii libhtml-tree-perl 3.23-1 represent and create HTML syntax t ii liburi-perl 1.35.dfsg.1-1 Manipulates and accesses URI strin ii netbase 4.34 Basic TCP/IP networking system ii perl [libdigest-md5-perl] 5.10.0-19 Larry Wall's Practical Extraction Versions of packages libwww-perl recommends: ii libcompress-zlib-perl 2.012-1 Perl module for creation and manip pn libhtml-format-perl <none> (no description available) ii libmailtools-perl 2.03-1 Manipulate email in perl programs Versions of packages libwww-perl suggests: ii libio-socket-ssl-perl 1.16-1 Perl module implementing object or -- no debconf information -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org