Hi,

(If you're in a hurry, you might want to skip to the end of the mail, since
the conclusion is partly unrelated to my in-depth analysis of the problem)

it seems PHD is completely innocent for this bug. Upgrading PHD didn't seem to
help (though I'm not 100% sure that the upgrading worked, I did throw out the
old version entirely). From configure.php, it seems that this error occurs
when asking the DOMDocument builtin PHP class to validate, even before PHD is
called at all.

Instead the manual.xml validation errors reported here seem to be caused by
recent changes in libxml2, as suggested in this post:
  http://www.mail-archive.com/x...@gnome.org/msg07188.html

Apparently, there was recently a change in libxml to pass on the current
default namespace when expanding entity references. This was a change to fix
this bug:
  https://bugzilla.gnome.org/show_bug.cgi?id=502960

This problem seems to occur when the entity referenced actually contains
complete nodes, which don't have an xmlns= of themselves. The workaround
suggested in the first post above is to explicitly define a default namespace
inside the node(s) generated by the entity.

From (quickly) looking at the source, for when the "Namespace default prefix
was not found" error occurs, it seems that the following happens:
 * A new node (presumably from an entity reference) is created, which gets the
   current default namespace passed in (presumably from the node that
   references the entity) (xmlSAX2StartElementNs() in SAX2.c)
 * The new node checks his default namespace, using the xmlSearchNs function
   from tree.c. Its documentation says: "We don't allow to cross entities
   boundaries. If you don't declare the namespace within those you will be in
   troubles !!! A warning is generated to cover this case."
 * The namespace could not be found, generating the "Namespace default prefix
   was not found" error.

This seems related to this (old) bug, marked wontfix:
  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=174793
This post, from 2002 suggests that the behaviour for xmlSearchNS above is
actually a libxml2 bug, but one whose workaround is good practice anyway:
  http://mail.gnome.org/archives/xslt/2002-January/msg00022.html

This post concerns explicit namespace prefixes in entities, which are not
declared in the entity itself.

However, it is my suspicion that the recent change in libxml is now causing
the libxml bug from 2002 to appear for entities that don't declare a default
namespace either (previously, the node inside the entity just wouldn't have
any namespace associated with it, which was probably not correct either, but
did validate, since xmlSAX2StartElementNS() didn't try to check the namespace
at all).


So it seems the "official" workaround to this is declaring a default xml
namespace inside each entity declaration (I haven't tried this). I don't know
enough of the XML (NS) specification to know if this is a real solution or if
libxml2 should really be fixed instead...

However, it seems that this problem really isn't Debian specific, so upstream
should be seeing the same problems. Looking at the latest upstream version of
the documentation, it sems that upstream has already applied the workaround.
For example, the frontpage.authors entity at:
  http://svn.php.net/repository/phpdoc/en/trunk/contributors.ent

Closer inspection shows that this is fixed in r290424 and r290427:
  matth...@xanthe:$ svn log -c 290424 -c 290427
  http://svn.php.net/repository/phpdoc/en/trunk
  ------------------------------------------------------------------------
  r290424 | bjori | 2009-11-09 16:58:06 +0100 (Mon, 09 Nov 2009) | 3 lines

  Add namespace declaration to all "free standing elements"
  # See https://bugzilla.gnome.org/show_bug.cgi?id=502960

  ------------------------------------------------------------------------
  r290427 | bjori | 2009-11-09 17:04:52 +0100 (Mon, 09 Nov 2009) | 3 lines

  Adding namespace declaration to newly introduced entities
  # See https://bugzilla.gnome.org/show_bug.cgi?id=502960

  ------------------------------------------------------------------------




So, it turns out we can easily fix this problem by packaging the latest
version of the php documentation. Considering that we're currently shipping a
version from 2008, that seems like a good idea anyway.

It's not exactly obvious to me how the documentation stuff is organized and
how to get a orig tarball from upstream's svn (it seems that the Debian
package merges a few directories from SVN?), so I haven't tested this theory.
There's probably more things to fix when upgrading, though.


Lior, could you take care of this upgrade? If not, I can see if I can prepare
something, since I'd like to preserve php-doc (though I don't have too much
time, of course :-p).

Gr.

Matthijs

Attachment: signature.asc
Description: Digital signature

Reply via email to