ID:               36790
 User updated by:  exaton at free dot fr
 Reported By:      exaton at free dot fr
 Status:           Bogus
 Bug Type:         DOM XML related
 Operating System: WinXP SP2
 PHP Version:      5CVS-2006-03-19 (snap)
 New Comment:

Thanks for answering so quickly. I feel, however, that in minimizing
the code for the report I may not have presented the case fully. I'm
leaving the Bogus status though, as I do understand that this is beyond
PHP itself.

There is a real transitivity problem :

// Load XML file that will not cause any formatting problems

$doc = new DOMDocument();
$doc -> load('file.xml');
$doc -> formatOutput = TRUE;

// Add a child to its root element

$root = $doc -> documentElement;
$child = $doc -> createElement('child');
$root -> appendChild($child);

// Check XML that will be saved to file (it is fine, well formatted)
and save it

pre_ent_dump($doc -> saveXML()); (1)
$doc -> save('fileResult.xml');

// Open file saved with good formatting

$docRes = new DOMDocument();
$docRes -> load('fileResult.xml');
$docRes -> formatOutput = TRUE;

// Check XML of that file... Its good formatting meant
// that there were newlines and tabs, which cause problems.
// This formatting is OK for now (surprisingly...)

pre_ent_dump($docRes -> saveXML()); (2)

// Add another child to its root element

$root = $docRes -> documentElement;
$child = $docRes -> createElement('child');
$root -> appendChild($child);

// Check XML again -- this time it really is broken

pre_ent_dump($docRes -> saveXML()); (3)

-----

With the non-problematic contents of file.xml being :

<?xml version="1.0" standalone="yes"?>
<root></root>

Then the good contents at (1) that are saved to fileResult.xml are :

<?xml version="1.0" standalone="yes"?>
<root>
  <child/>
</root>

But even though, after re-opening that properly formatted file, the
contents at (2) are not yet broken, in (3), after adding a new child,
they are :

<?xml version="1.0" standalone="yes"?>
<root>
  <child/>
<child/></root>

So as you see, it is not possible to use output formatting throughout :
it should have to be turned off when saving to file and only turned on
for debug purposes... Which is fine in a theoretical sense, but in fact
the files thus produced are pretty illegible, which kills a good part of
the purpose of XML in the first place :-/

Do you think it would be worth my pointing this out to someone on the
libxml2 team ?

Thanks again.


Previous Comments:
------------------------------------------------------------------------

[2006-03-19 23:32:43] [EMAIL PROTECTED]

Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

First the formattting is handled by libxml2 so not a DOM issue. Second,
you are playing with an element containing mixed content so formatting
is different than content containing only child elements or text
content.

------------------------------------------------------------------------

[2006-03-19 20:58:17] exaton at free dot fr

Description:
------------
It appears that DOM output formatting activated with the DOMDocument's
formatOutput property fails as soon as the document's root element
($doc -> documentElement) contains a text node (even besides other
nodes).

Reproduce code:
---------------
function pre_ent_dump($var) {
        printf("<pre>\n%s</pre>\n", htmlentities($var));
}
function pre_var_dump($var) {
        echo "<pre>\n"; var_dump($var); echo "</pre>\n";
}       

$doc = new DOMDocument();
$doc -> load('file.xml');
$doc -> formatOutput = TRUE;

pre_ent_dump($doc -> saveXML()); // (1)

$root = $doc -> documentElement;

pre_var_dump($root -> textContent); // (2)

$child = $doc -> createElement('child');
$root -> appendChild($child);

pre_ent_dump($doc -> saveXML()); // (3)

Actual result:
--------------
[Note : this report is only long because of many examples with slight
variations]

PHP Version 5.1.3RC2-dev (2006-03-19 15:30)
(rolled out specially to confirm the issue, also occurring in 5.1.2)
Apache2 2.0.55, WinXP SP2

The reproduce code shows what we are doing : simply      loading the
contents of an XML file, dumping those contents, getting the root
element, dumping the possible text content of itself and its
descendants, adding a child to the root element, and dumping the whole
document contents again. Everything depends, therefore, on the contents
of file.xml prior to execution.

When file.xml is :

<?xml version="1.0" standalone="yes"?>
<root></root>

The contents output at (1) and (3) are :

<?xml version="1.0" standalone="yes"?>
<root/>

<?xml version="1.0" standalone="yes"?>
<root>
  <child/>
</root>

As expected. But when file.xml is :

<?xml version="1.0" standalone="yes"?>
<root> </root>

Then the contents output at (1) and (3) are :

<?xml version="1.0" standalone="yes"?>
<root> </root>

<?xml version="1.0" standalone="yes"?>
<root> <child/></root>

(1) appears to be as expected, but in (3) the new child is no longer on
a new line and indented. That absence of formatting does not occur with
just any sort of child to the root element however, as we now
demonstrate with the following contents for file.xml :

<?xml version="1.0" standalone="yes"?>
<root><a></a></root>

The contents output at (1) and (3) are :

<?xml version="1.0" standalone="yes"?>
<root>
  <a/>
</root>

<?xml version="1.0" standalone="yes"?>
<root>
  <a/>
  <child/>
</root>

Both as expected. If we add a text node to the root element though (we
use whitespace but this seems to apply to any sequence of characters)
:

<?xml version="1.0" standalone="yes"?>
<root> <a></a></root>

Then the contents output at (1) and (3) are :

<?xml version="1.0" standalone="yes"?>
<root> <a/></root>

<?xml version="1.0" standalone="yes"?>
<root> <a/><child/></root>

Now we realize that *both* outputs are broken (the <a /> in (1) should
be on its own line, like above).
Interestingly, the text nodes only cause trouble when they are at the
root. When the contents of file.xml are :

<?xml version="1.0" standalone="yes"?>
<root><a>
</a></root>

Then the contents output at (1) and (3) are :

<?xml version="1.0" standalone="yes"?>
<root>
  <a>
</a>
</root>

<?xml version="1.0" standalone="yes"?>
<root>
  <a>
</a>
  <child/>
</root>

As they should be.

I have tried working around by loading with
loadXML(file_get_contents('file.xml')), by going to and fro with
SimpleXML, by cancelling and setting formatOutput again at different
stages... Obviously the problem is not there.

I do not know if or how this is related to bugs 23726 or 35673 ; it
seems to me I have more details here, but I won't be surprised if you
tell me the problem is at the libxml2 level...

Thanks is advance.

P.S. Side note -- nothing to do here, but the bug report CAPTCHA image
does not seem to work under Firefox 1.5 win32 (works fine in IE), nor
does it accept the CAPTCHA if I get the source, view the image in IE,
and type it in Firefox.


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=36790&edit=1

Reply via email to