Package: debiandoc-sgml Version: 1.2.27 Severity: important Tags: patch In the kernel-handbook source we have:
Check the <url id="http://bugs.debian.org/cgi-bin/pkgreport.cgi?src=linux&src=linux-2.6" name="current bug list"> And in the HTML output this becomes: Check the <code><a href="http://bugs.debian.org/cgi-bin/pkgreport.cgi?src=linux%5C%7C[amp%20]%5C%7Csrc=linux-2.6">current bug list</a></code> I'm attaching a fix. Please upload and ask for a freeze exception, as this causes real breakage in debian-kernel-handbook. Ben. -- System Information: Debian Release: wheezy/sid APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'proposed-updates'), (500, 'unstable'), (500, 'stable'), (1, 'experimental') Architecture: i386 (x86_64) Foreign Architectures: amd64 Kernel: Linux 3.2.0-3-amd64 (SMP w/2 CPU cores) Locale: LANG=en_GB.utf8, LC_CTYPE=en_GB.utf8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages debiandoc-sgml depends on: ii libhtml-parser-perl 3.69-2 ii libroman-perl 1.23-1 ii libtext-format-perl 0.56-1 ii perl 5.14.2-12 ii sgml-base 1.26+nmu3 ii sgml-data 2.0.8 ii sgmlspl 1.03ii-32 ii sp 1.3.4-1.2.1-47.1+b1 Versions of packages debiandoc-sgml recommends: ii ghostscript 9.05~dfsg-6 ii texinfo 4.13a.dfsg.1-10 pn texlive <none> pn texlive-latex-extra <none> Versions of packages debiandoc-sgml suggests: ii debiandoc-sgml-doc 1.1.22 pn latex-cjk-all <none> pn texlive-lang-all <none> -- no debconf information
>From 9b2a5f95132b499d85eb6c17b57cf8e3a7748ac6 Mon Sep 17 00:00:00 2001 From: Ben Hutchings <b...@decadent.org.uk> Date: Thu, 16 Aug 2012 04:16:54 +0100 Subject: [PATCH] Fix mangling of '&' in URLs SGML entities, e.g. '&' are converted on input to SDATA sequences e.g. '\|[amp ]\|'. These need to be converted back to literal characters or entities on output, depending on the format. Currently we fail to do this because: 1. The driver normalizes URLs by squashing multiple spaces. Since the spaces are significant in matching of the SDATA sequences, they are not converted (by any back-end). Change it to trim leading and trailing space only; URLs should not normally contain any spaces anyway. 2. The HTML and XML back-ends further normalize URLs using the URL class. This results in the SDATA sequences being URL-encoded, and so they are not matched in the subsequent conversion to CDATA. Swap the order of conversion so that URL-encoding is done last. This is not theoretically correct: we should convert to literal text, then URL-encode, then HTML/XML-encode. However we know that '&' and ';' will not be URL-escaped and therefore the result should be the same. --- debian/changelog | 7 +++++++ tools/lib/Format/Driver.pm | 2 +- tools/lib/Format/HTML.pm | 10 +++++++--- tools/lib/Format/XML.pm | 10 +++++++--- 4 files changed, 22 insertions(+), 7 deletions(-) diff --git a/debian/changelog b/debian/changelog index 2f58e28..db764f6 100644 --- a/debian/changelog +++ b/debian/changelog @@ -1,3 +1,10 @@ +debiandoc-sgml (1.2.27+nmu1) UNRELEASED; urgency=low + + * Non-maintainer upload. + * Fix handling of entities (e.g. &) in URLs. + + -- Ben Hutchings <b...@decadent.org.uk> Thu, 16 Aug 2012 03:55:35 +0100 + debiandoc-sgml (1.2.27) unstable; urgency=low * Rebuild with debhelper sgml-base >=1.26+nmu2. Closes: #675474 diff --git a/tools/lib/Format/Driver.pm b/tools/lib/Format/Driver.pm index 368711f..b707291 100644 --- a/tools/lib/Format/Driver.pm +++ b/tools/lib/Format/Driver.pm @@ -918,7 +918,7 @@ sub end_httppath sub start_url { ( $element, $event ) = @_; - my $id = _normalize( _a( 'ID' ) ); + my $id = _trim( _a( 'ID' ) ); my $name = _a( 'NAME' ); $name = "" if ( $name eq '\|\|' ) || ( $name eq '\|urlname\|' ) || ( $name eq $id ); diff --git a/tools/lib/Format/HTML.pm b/tools/lib/Format/HTML.pm index 590bd79..564b420 100644 --- a/tools/lib/Format/HTML.pm +++ b/tools/lib/Format/HTML.pm @@ -956,7 +956,7 @@ sub _output_httppath } sub _output_url { - my $url = URI->new( $_[0] ); + my $url = URI->new( _to_cdata( $_[0] ) ); $_[1] = $_[0] if $_[1] eq ""; output( "<code><a href=\"$url\">" ); _cdata( $_[1] ); @@ -966,7 +966,7 @@ sub _output_url ## ---------------------------------------------------------------------- ## data output subroutines ## ---------------------------------------------------------------------- -sub _cdata +sub _to_cdata { ( $_ ) = @_; @@ -976,7 +976,11 @@ sub _cdata # SDATA s/\\\|(\[\w+\s*\])\\\|/$sdata{ $1 }/g; - output( $_ ); + return $_; +} +sub _cdata +{ + output( _to_cdata( $_[0] ) ); } sub _sdata { diff --git a/tools/lib/Format/XML.pm b/tools/lib/Format/XML.pm index 5e1b807..7d852ef 100644 --- a/tools/lib/Format/XML.pm +++ b/tools/lib/Format/XML.pm @@ -769,7 +769,7 @@ sub _output_httppath } sub _output_url { - my $url = URI->new( $_[0] ); + my $url = URI->new( _to_cdata( $_[0] ) ); $_[1] = $_[0] if $_[1] eq ""; output( "<ulink url=\"$url\">" ); _cdata( $_[1] ); @@ -779,14 +779,18 @@ sub _output_url ## ---------------------------------------------------------------------- ## data output subroutines ## ---------------------------------------------------------------------- -sub _cdata +sub _to_cdata { ( $_ ) = @_; # SDATA s/\\\|(\[\w+\s*\])\\\|/$sdata{ $1 }/g; - output( $_ ); + return $_; +} +sub _cdata +{ + output( _to_cdata( $_[0] ) ); } sub _sdata {