Package: debiandoc-sgml Version: 1.2.25 Severity: normal Dear Maintainer,
I've noticed that Emacs (nXML, in particular) tends to get confused about the encoding of debiandoc2dbk's output files. This seems to be because it is incapable of understanding the comment declaring (in English) that the file is encoded in UTF-8. Yes, even though the spec says (in section 4.3.3, "Character Encoding in Entities", <http://www.w3.org/TR/2008/REC-xml-20081126/#charencoding>): ,---- | Entities encoded in UTF-16 must and entities encoded in UTF-8 may | begin with the Byte Order Mark described by Annex H of [ISO/IEC | 10646:2000], section 16.8 of [Unicode] (the ZERO WIDTH NO-BREAK SPACE | character, #xFEFF). This is an encoding signature, not part of either | the markup or the character data of the XML document. XML processors | must be able to use this character to differentiate between UTF-8 and | UTF-16 encoded documents. `---- and: ,---- | In the absence of external character encoding information (such as | MIME headers), parsed entities which are stored in an encoding other | than UTF-8 or UTF-16 must begin with a text declaration (see 4.3.1 The | Text Declaration) containing an encoding declaration: `---- The following patch solves this, by specifying the encoding as UTF-8 in the XML declaration.
diff --git a/tools/lib/Format/XML.pm b/tools/lib/Format/XML.pm index 6b5fa06..5e1b807 100644 --- a/tools/lib/Format/XML.pm +++ b/tools/lib/Format/XML.pm @@ -166,7 +166,7 @@ sub _output_end_book my $file = "$prefix$idx$content$extension"; push_output( 'file', File::Spec->catfile( "$directory", "$file" ) ); - output( "<?xml version='1.0'?>\n" ); + output( "<?xml version='1.0' encoding='utf-8'?>\n" ); output( "<!-- -*- DocBook -*- -->\n" ); } output( $chapter{ $chapter_id } ); @@ -184,7 +184,7 @@ sub _output_end_book } sub _html_head { - output( "<?xml version='1.0'?>\n" ); + output( "<?xml version='1.0' encoding='utf-8'?>\n" ); output( "<!-- -*- DocBook -*- -->\n" ); output( "<!DOCTYPE book PUBLIC \"-//OASIS//DTD DocBook XML V4.5//EN\"\n" ); output( " \"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd\" [\n" ); @@ -199,7 +199,6 @@ sub _html_head $language_tag = 'en' if $language_tag eq undef; output( "<book lang=\"$language_tag\">\n" ); - output( "<!-- This is UTF-8 encoded. -->\n" ); output( "\n" ); output( "<title>$title</title>\n" ); output( "\n" );
-- System Information: Debian Release: wheezy/sid APT prefers testing APT policy: (990, 'testing'), (500, 'unstable'), (1, 'experimental') Architecture: i386 (i686) Kernel: Linux 3.2.0-1-686-pae (SMP w/1 CPU core) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/bash Versions of packages debiandoc-sgml depends on: ii libhtml-parser-perl 3.69-1+b1 ii libroman-perl 1.23-1 ii libtext-format-perl 0.53-1 ii perl 5.14.2-7 ii sgml-base 1.26+nmu1 ii sgml-data 2.0.6 ii sgmlspl 1.03ii-32 ii sp 1.3.4-1.2.1-47.1 Versions of packages debiandoc-sgml recommends: ii ghostscript 9.05~dfsg-2 ii texinfo 4.13a.dfsg.1-8 ii texlive 2009-15 ii texlive-latex-extra 2009-10 Versions of packages debiandoc-sgml suggests: pn debiandoc-sgml-doc 1.1.22 pn latex-cjk-all <none> pn texlive-lang-all <none> -- no debconf information -- Hi! I'm a .signature virus! Copy me into your ~/.signature to help me spread!