Package: tidy
Version: 20091223cvs-1
Severity: normal

"tidy -asxhtml -utf8 --add-xml-decl yes" doesn't specify the encoding.
The consequence is that the XML processor cannot reliably determine
the encoding at that time. For instance, libxml2 will assume that the
output encoding should be US-ASCII (though it will be able to read
UTF-8 sequences as required), so that

  echo é | tidy -asxhtml -utf8 --add-xml-decl yes | xmllint -

gives:

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml";>
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="generator" content="HTML Tidy for Linux (vers 25 March 2009), see 
www.w3.org" />
<title></title>
</head>
<body>
&#xE9;
</body>
</html>

See the "é" that has been written as a character reference due to the
absence of declared encoding.

Note that the behavior of xmllint won't change:

  https://bugzilla.gnome.org/show_bug.cgi?id=350208

-- System Information:
Debian Release: squeeze/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-amd64 (SMP w/8 CPU cores)
Locale: LANG=POSIX, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages tidy depends on:
ii  libc6                      2.11.2-7      Embedded GNU C Library: Shared lib
ii  libtidy-0.99-0             20091223cvs-1 HTML syntax checker and reformatte

tidy recommends no packages.

Versions of packages tidy suggests:
ii  tidy-doc                   20091223cvs-1 HTML syntax checker and reformatte

-- no debconf information



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to