Attaching an updated patch which corrects many errors in the original description of the bug.
Ben. -- Ben Hutchings I say we take off; nuke the site from orbit. It's the only way to be sure.
From b0a67f447be96d4b5df683da6f280d4b24eb3ff4 Mon Sep 17 00:00:00 2001 From: Ben Hutchings <b...@decadent.org.uk> Date: Thu, 16 Aug 2012 04:16:54 +0100 Subject: [PATCH] Fix mangling of '&' in URLs SDATA entities, e.g. '&' are converted on input to different escape sequences, e.g. '\|[amp ]\|'. These need to be converted back to literal characters or entities on output, depending on the format. Currently we fail to do this because: 1. The HTML and XML back-ends only do URL-normalization; they never process the SDATA escape sequences. Change them to do CDATA conversion before normalizing URLs. This is not theoretically correct: we should convert to literal text, then normalize the URL, then convert to CDATA. However we know that '&' and ';' will not be URL-escaped and therefore the result should be the same. 2. The driver does some basic normalization of the URL by squashing multiple spaces. Since the spaces are significant in matching of the SDATA escape sequences, they are then not converted by any back-end. Change it to trim leading and trailing space only; URLs should not normally contain any spaces anyway. --- debian/changelog | 7 +++++++ tools/lib/Format/Driver.pm | 2 +- tools/lib/Format/HTML.pm | 10 +++++++--- tools/lib/Format/XML.pm | 10 +++++++--- 4 files changed, 22 insertions(+), 7 deletions(-) diff --git a/debian/changelog b/debian/changelog index 2f58e28..a8b1dc1 100644 --- a/debian/changelog +++ b/debian/changelog @@ -1,3 +1,10 @@ +debiandoc-sgml (1.2.27+nmu1) UNRELEASED; urgency=low + + * Non-maintainer upload. + * Fix mangling of '&' in URLs. + + -- Ben Hutchings <b...@decadent.org.uk> Thu, 16 Aug 2012 03:55:35 +0100 + debiandoc-sgml (1.2.27) unstable; urgency=low * Rebuild with debhelper sgml-base >=1.26+nmu2. Closes: #675474 diff --git a/tools/lib/Format/Driver.pm b/tools/lib/Format/Driver.pm index 368711f..b707291 100644 --- a/tools/lib/Format/Driver.pm +++ b/tools/lib/Format/Driver.pm @@ -918,7 +918,7 @@ sub end_httppath sub start_url { ( $element, $event ) = @_; - my $id = _normalize( _a( 'ID' ) ); + my $id = _trim( _a( 'ID' ) ); my $name = _a( 'NAME' ); $name = "" if ( $name eq '\|\|' ) || ( $name eq '\|urlname\|' ) || ( $name eq $id ); diff --git a/tools/lib/Format/HTML.pm b/tools/lib/Format/HTML.pm index 590bd79..564b420 100644 --- a/tools/lib/Format/HTML.pm +++ b/tools/lib/Format/HTML.pm @@ -956,7 +956,7 @@ sub _output_httppath } sub _output_url { - my $url = URI->new( $_[0] ); + my $url = URI->new( _to_cdata( $_[0] ) ); $_[1] = $_[0] if $_[1] eq ""; output( "<code><a href=\"$url\">" ); _cdata( $_[1] ); @@ -966,7 +966,7 @@ sub _output_url ## ---------------------------------------------------------------------- ## data output subroutines ## ---------------------------------------------------------------------- -sub _cdata +sub _to_cdata { ( $_ ) = @_; @@ -976,7 +976,11 @@ sub _cdata # SDATA s/\\\|(\[\w+\s*\])\\\|/$sdata{ $1 }/g; - output( $_ ); + return $_; +} +sub _cdata +{ + output( _to_cdata( $_[0] ) ); } sub _sdata { diff --git a/tools/lib/Format/XML.pm b/tools/lib/Format/XML.pm index 5e1b807..7d852ef 100644 --- a/tools/lib/Format/XML.pm +++ b/tools/lib/Format/XML.pm @@ -769,7 +769,7 @@ sub _output_httppath } sub _output_url { - my $url = URI->new( $_[0] ); + my $url = URI->new( _to_cdata( $_[0] ) ); $_[1] = $_[0] if $_[1] eq ""; output( "<ulink url=\"$url\">" ); _cdata( $_[1] ); @@ -779,14 +779,18 @@ sub _output_url ## ---------------------------------------------------------------------- ## data output subroutines ## ---------------------------------------------------------------------- -sub _cdata +sub _to_cdata { ( $_ ) = @_; # SDATA s/\\\|(\[\w+\s*\])\\\|/$sdata{ $1 }/g; - output( $_ ); + return $_; +} +sub _cdata +{ + output( _to_cdata( $_[0] ) ); } sub _sdata {
signature.asc
Description: This is a digitally signed message part