Package: libxml-dom-perl
Version: 1.43-4
Severity: important

*** lease type your report below this line ***

If an XML file is read and written using XML::DOM such as:

 ( new XML::DOM::Parser) -> parsefile ('in.xml') -> printToFile ('out.xml') ;

with a simple XML file (in.xml) like: 

 <?xml version="1.0" encoding="UTF-8"?>
 <blah>&#227;</blah>

(note the non-ascii &#227; a with acute) the resulting output (out.xml)
file is incorrectly coded with the (i presume) default locale, even
though UTF-8 is stated in the XML declaration.

This can be checked with xmllint:

 [EMAIL PROTECTED]:~$ xmllint out.xml 
 out.xml:2: parser error : Input is not proper UTF-8, indicate encoding !
 Bytes: 0xE3 0x3C 0x2F 0x62
 <blah>ã</blah>
      ^

In addition methods such as XML::DOM::Element::getAttribute do not
return UTF-8 encoded perl strings so unicode data is corrupted if
read/written from the document.

Regards,

--
David

-- System Information:
Debian Release: testing/unstable
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.4.27-2-386
Locale: LANG=en_GB, LC_CTYPE=en_GB (charmap=ISO-8859-1)

Versions of packages libxml-dom-perl depends on:
ii  libwww-perl                   5.803-4    WWW client/server library for Perl
ii  libxml-parser-perl            2.34-4     Perl module for parsing XML files
ii  libxml-perl                   0.08-1     Perl modules for working with XML
ii  libxml-regexp-perl            0.03-7     Perl module for regular expression
ii  perl                          5.8.7-3    Larry Wall's Practical Extraction 

libxml-dom-perl recommends no packages.

-- no debconf information

Reply via email to