Package: libgumbo1 Version: 0.10.1+dfsg-2.1 Severity: minor When parsing a file, a spurious newline is added before </body>. For instance, on
<!DOCTYPE html> <html><head></head><body><p>Test</p></body></html> I get <?xml version="1.0" encoding="utf-8"?> <html xmlns="http://www.w3.org/1999/xhtml"><head/><body><p>Test</p> </body></html> when converting it with the following Perl script: ---------------------------------------------------------------------------- #!/usr/bin/env perl use strict; use HTML::Gumbo; use XML::LibXML; use open ':encoding(UTF-8)'; binmode STDIN, ':encoding(UTF-8)'; my %voidelem = map { $_ => 1 } qw(area base br col embed hr img input keygen link meta param source track wbr); my $doc = XML::LibXML::Document->createDocument('1.0', 'utf-8'); my @nodes; HTML::Gumbo->new->parse (do { local $/; <> }, format => 'callback', callback => sub { my ($event) = shift; if ($event =~ /^document (start|end)$/ ) { } elsif ($event eq 'start' ) { my ($tag, $attr) = @_; my $element; if (@nodes) { $element = $doc->createElement($tag); $nodes[-1]->appendChild($element); } else { $element = $doc->createElementNS('http://www.w3.org/1999/xhtml', $tag); $doc->setDocumentElement($element); } while (@$attr) { $element->setAttribute(splice @$attr, 0, 2); } push @nodes, $element unless $voidelem{$tag}; } elsif ($event eq 'end') { $_[0] eq $nodes[-1]->nodeName or die "internal error"; pop @nodes; } else { my $node; if ($event =~ /^(text|space)$/) { $node = $doc->createTextNode($_[0]); } elsif ($event eq 'comment') { $node = $doc->createComment($_[0]); } elsif ($event eq 'cdata') { $node = $doc->createCDATASection($_[0]); } else { die "unknown event"; } $nodes[-1]->appendChild($node); } } ); $doc->toFH(*STDOUT, 0); ---------------------------------------------------------------------------- I also obtain the newline with the "string" output format instead of "callback": ---------------------------------------------------------------------------- #!/usr/bin/env perl use strict; use HTML::Gumbo; use open ':encoding(UTF-8)'; binmode STDIN, ':encoding(UTF-8)'; binmode STDOUT, ':encoding(UTF-8)'; print HTML::Gumbo->new->parse(do { local $/; <> }); ---------------------------------------------------------------------------- This is with the Perl HTML::Gumbo module, but I suppose that the bug is in the library itself. In practice, this space shouldn't matter, but one never knows as it might theoretically have an effect on some CSS rules, for instance. It also breaks the idempotence property: by applying the above script with string output format several times with pipes, one gets another newline for each invocation. -- System Information: Debian Release: buster/sid APT prefers unstable-debug APT policy: (500, 'unstable-debug'), (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental') Architecture: amd64 (x86_64) Kernel: Linux 4.11.0-2-amd64 (SMP w/8 CPU cores) Locale: LANG=POSIX, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=POSIX (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages libgumbo1 depends on: ii libc6 2.24-14 libgumbo1 recommends no packages. libgumbo1 suggests no packages. -- no debconf information