Package: debiandoc-sgml
Version: 1.2.27
Severity: important
Tags: patch

In the kernel-handbook source we have:

    Check the <url 
id="http://bugs.debian.org/cgi-bin/pkgreport.cgi?src=linux&amp;src=linux-2.6"; 
name="current bug list">

And in the HTML output this becomes:

    Check the <code><a
    
href="http://bugs.debian.org/cgi-bin/pkgreport.cgi?src=linux%5C%7C[amp%20]%5C%7Csrc=linux-2.6";>current
    bug list</a></code>

I'm attaching a fix.  Please upload and ask for a freeze exception, as
this causes real breakage in debian-kernel-handbook.

Ben.

-- System Information:
Debian Release: wheezy/sid
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'proposed-updates'), (500, 
'unstable'), (500, 'stable'), (1, 'experimental')
Architecture: i386 (x86_64)
Foreign Architectures: amd64

Kernel: Linux 3.2.0-3-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_GB.utf8, LC_CTYPE=en_GB.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages debiandoc-sgml depends on:
ii  libhtml-parser-perl  3.69-2
ii  libroman-perl        1.23-1
ii  libtext-format-perl  0.56-1
ii  perl                 5.14.2-12
ii  sgml-base            1.26+nmu3
ii  sgml-data            2.0.8
ii  sgmlspl              1.03ii-32
ii  sp                   1.3.4-1.2.1-47.1+b1

Versions of packages debiandoc-sgml recommends:
ii  ghostscript          9.05~dfsg-6
ii  texinfo              4.13a.dfsg.1-10
pn  texlive              <none>
pn  texlive-latex-extra  <none>

Versions of packages debiandoc-sgml suggests:
ii  debiandoc-sgml-doc  1.1.22
pn  latex-cjk-all       <none>
pn  texlive-lang-all    <none>

-- no debconf information
>From 9b2a5f95132b499d85eb6c17b57cf8e3a7748ac6 Mon Sep 17 00:00:00 2001
From: Ben Hutchings <b...@decadent.org.uk>
Date: Thu, 16 Aug 2012 04:16:54 +0100
Subject: [PATCH] Fix mangling of '&' in URLs

SGML entities, e.g. '&amp;' are converted on input to SDATA sequences
e.g. '\|[amp   ]\|'.  These need to be converted back to literal
characters or entities on output, depending on the format.  Currently
we fail to do this because:

1. The driver normalizes URLs by squashing multiple spaces.  Since the
spaces are significant in matching of the SDATA sequences, they are
not converted (by any back-end).

Change it to trim leading and trailing space only; URLs should not
normally contain any spaces anyway.

2. The HTML and XML back-ends further normalize URLs using the URL
class.  This results in the SDATA sequences being URL-encoded, and so
they are not matched in the subsequent conversion to CDATA.

Swap the order of conversion so that URL-encoding is done last.  This
is not theoretically correct: we should convert to literal text, then
URL-encode, then HTML/XML-encode.  However we know that '&' and ';'
will not be URL-escaped and therefore the result should be the same.
---
 debian/changelog           |    7 +++++++
 tools/lib/Format/Driver.pm |    2 +-
 tools/lib/Format/HTML.pm   |   10 +++++++---
 tools/lib/Format/XML.pm    |   10 +++++++---
 4 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/debian/changelog b/debian/changelog
index 2f58e28..db764f6 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,10 @@
+debiandoc-sgml (1.2.27+nmu1) UNRELEASED; urgency=low
+
+  * Non-maintainer upload.
+  * Fix handling of entities (e.g. &amp;) in URLs.
+
+ -- Ben Hutchings <b...@decadent.org.uk>  Thu, 16 Aug 2012 03:55:35 +0100
+
 debiandoc-sgml (1.2.27) unstable; urgency=low
 
   * Rebuild with debhelper sgml-base >=1.26+nmu2. Closes: #675474
diff --git a/tools/lib/Format/Driver.pm b/tools/lib/Format/Driver.pm
index 368711f..b707291 100644
--- a/tools/lib/Format/Driver.pm
+++ b/tools/lib/Format/Driver.pm
@@ -918,7 +918,7 @@ sub end_httppath
 sub start_url
 {
     ( $element, $event ) = @_;
-    my $id = _normalize( _a( 'ID' ) );
+    my $id = _trim( _a( 'ID' ) );
     my $name =  _a( 'NAME' );
     $name = "" if ( $name eq '\|\|' ) || ( $name eq '\|urlname\|' )
 	|| ( $name eq $id );
diff --git a/tools/lib/Format/HTML.pm b/tools/lib/Format/HTML.pm
index 590bd79..564b420 100644
--- a/tools/lib/Format/HTML.pm
+++ b/tools/lib/Format/HTML.pm
@@ -956,7 +956,7 @@ sub _output_httppath
 }
 sub _output_url
 {
-    my $url = URI->new( $_[0] );
+    my $url = URI->new( _to_cdata( $_[0] ) );
     $_[1] = $_[0] if $_[1] eq "";
     output( "<code><a href=\"$url\">" );
     _cdata( $_[1] );
@@ -966,7 +966,7 @@ sub _output_url
 ## ----------------------------------------------------------------------
 ## data output subroutines
 ## ----------------------------------------------------------------------
-sub _cdata
+sub _to_cdata
 {
     ( $_ ) = @_;
 
@@ -976,7 +976,11 @@ sub _cdata
     # SDATA
     s/\\\|(\[\w+\s*\])\\\|/$sdata{ $1 }/g;
 
-    output( $_ );
+    return $_;
+}
+sub _cdata
+{
+    output( _to_cdata( $_[0] ) );
 }
 sub _sdata
 {
diff --git a/tools/lib/Format/XML.pm b/tools/lib/Format/XML.pm
index 5e1b807..7d852ef 100644
--- a/tools/lib/Format/XML.pm
+++ b/tools/lib/Format/XML.pm
@@ -769,7 +769,7 @@ sub _output_httppath
 }
 sub _output_url
 {
-    my $url = URI->new( $_[0] );
+    my $url = URI->new( _to_cdata( $_[0] ) );
     $_[1] = $_[0] if $_[1] eq "";
     output( "<ulink url=\"$url\">" );
     _cdata( $_[1] );
@@ -779,14 +779,18 @@ sub _output_url
 ## ----------------------------------------------------------------------
 ## data output subroutines
 ## ----------------------------------------------------------------------
-sub _cdata
+sub _to_cdata
 {
     ( $_ ) = @_;
 
     # SDATA
     s/\\\|(\[\w+\s*\])\\\|/$sdata{ $1 }/g;
 
-    output( $_ );
+    return $_;
+}
+sub _cdata
+{
+    output( _to_cdata( $_[0] ) );
 }
 sub _sdata
 {

Reply via email to