Your message dated Sat, 17 May 2025 13:04:19 +0000
with message-id <e1ughd9-00bz3k...@fasolo.debian.org>
and subject line Bug#1104789: fixed in libhtml-gumbo-perl 0.18-5
has caused the Debian Bug report #1104789,
regarding libhtml-gumbo-perl: erratic behavior on the unsupported template HTML
element - GUMBO_NODE_TEMPLATE node type
to be marked as done.
This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.
(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact ow...@bugs.debian.org
immediately.)
--
1104789: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1104789
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems
--- Begin Message ---
Package: libhtml-gumbo-perl
Version: 0.18-4+b1
Severity: serious
Tags: security upstream
Justification: security
Forwarded: https://github.com/ruz/HTML-Gumbo/issues/6
X-Debbugs-Cc: Debian Security Team <t...@security.debian.org>
I get erratic behavior on the template HTML element, e.g. on
the HTML file "<template>". For instance:
$ perl -C -MHTML::Gumbo -e "print HTML::Gumbo->new->parse('<template>', format
=> 'string');"
<html><head>\217¥�¾U</head><body></body></html>
$ perl -C -MHTML::Gumbo -e "print HTML::Gumbo->new->parse('<template>', format
=> 'string');"
<html><head>)�>\220U</head><body></body></html>
$ perl -C -MHTML::Gumbo -e "print HTML::Gumbo->new->parse('<template>', format
=> 'string');"
<html><head>q'N$uU</head><body></body></html>
One can see random output, which may include control characters
(above, I have changed them to \217 and \220 as Emacs shows them,
to avoid such control characters in the mail message).
With valgrind:
$ valgrind perl -C -MHTML::Gumbo -e "print
HTML::Gumbo->new->parse('<template>', format => 'string');"
==64955== Memcheck, a memory error detector
==64955== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==64955== Using Valgrind-3.24.0 and LibVEX; rerun with -h for copyright info
==64955== Command: perl -C -MHTML::Gumbo -e print\
HTML::Gumbo-\>new-\>parse('\<template\>',\ format\ =\>\ 'string');
==64955==
==64955== Conditional jump or move depends on uninitialised value(s)
==64955== at 0x484DC89: strlen (vg_replace_strmem.c:505)
==64955== by 0x2AD7DF: ??? (in /usr/bin/perl)
==64955== by 0x486D6CE: tree_to_string (Gumbo.xs:189)
==64955== by 0x486E2C4: walk_tree.isra.0 (Gumbo.xs:55)
==64955== by 0x486E2C4: walk_tree.isra.0 (Gumbo.xs:55)
==64955== by 0x486E2C4: walk_tree.isra.0 (Gumbo.xs:55)
==64955== by 0x486E41B: parse_to_string_cb (Gumbo.xs:505)
==64955== by 0x486ED4B: common_parse.isra.0 (Gumbo.xs:545)
==64955== by 0x486F09C: XS_HTML__Gumbo_parse_to_string (Gumbo.xs:559)
==64955== by 0x20B3E7: ??? (in /usr/bin/perl)
==64955== by 0x290C95: Perl_runops_standard (in /usr/bin/perl)
==64955== by 0x179E51: perl_run (in /usr/bin/perl)
==64955==
<html><head></head><body></body></html>
==64955==
==64955== HEAP SUMMARY:
==64955== in use at exit: 592,160 bytes in 2,369 blocks
==64955== total heap usage: 7,166 allocs, 4,797 frees, 1,159,576 bytes
allocated
==64955==
==64955== LEAK SUMMARY:
==64955== definitely lost: 18,102 bytes in 19 blocks
==64955== indirectly lost: 50,698 bytes in 23 blocks
==64955== possibly lost: 514,100 bytes in 2,318 blocks
==64955== still reachable: 9,260 bytes in 9 blocks
==64955== of which reachable via heuristic:
==64955== newarray : 1,056 bytes in 33 blocks
==64955== suppressed: 0 bytes in 0 blocks
==64955== Rerun with --leak-check=full to see details of leaked memory
==64955==
==64955== Use --track-origins=yes to see where uninitialised values come from
==64955== For lists of detected and suppressed errors, rerun with: -s
==64955== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
So, uninitialized data are used for the output.
If I use "format => 'callback'" (will a callback) instead of
"format => 'string'", then I get the following error:
Unknown node type at /usr/lib/x86_64-linux-gnu/perl5/5.40/HTML/Gumbo.pm line
298, <> line 1.
(which is better from the security point of view, but prevents one
from parsing some modern HTML documents).
It apparently comes from Gumbo.xs, where there are two occurrences of
croak("Unknown node type");
I suspect that this is the first one as the second one corresponds to
text node types.
The cause is probably the most recent node type GUMBO_NODE_TEMPLATE
from the Gumbo library (libgumbo):
typedef enum {
/** Document node. v will be a GumboDocument. */
GUMBO_NODE_DOCUMENT,
/** Element node. v will be a GumboElement. */
GUMBO_NODE_ELEMENT,
/** Text node. v will be a GumboText. */
GUMBO_NODE_TEXT,
/** CDATA node. v will be a GumboText. */
GUMBO_NODE_CDATA,
/** Comment node. v will be a GumboText, excluding comment delimiters. */
GUMBO_NODE_COMMENT,
/** Text node, where all contents is whitespace. v will be a GumboText. */
GUMBO_NODE_WHITESPACE,
/** Template node. This is separate from GUMBO_NODE_ELEMENT because many
* client libraries will want to ignore the contents of template nodes, as
* the spec suggests. Recursing on GUMBO_NODE_ELEMENT will do the right thing
* here, while clients that want to include template contents should also
* check for GUMBO_NODE_TEMPLATE. v will be a GumboElement. */
GUMBO_NODE_TEMPLATE
} GumboNodeType;
This node type was added in 2015:
https://github.com/google/gumbo-parser/commit/4383a40605ee7872a8e2de58553383a13d919153
but most of the HTML::Gumbo code predates this change.
-- System Information:
Debian Release: trixie/sid
APT prefers unstable-debug
APT policy: (500, 'unstable-debug'), (500, 'stable-updates'), (500,
'stable-security'), (500, 'stable-debug'), (500, 'proposed-updates-debug'),
(500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386
Kernel: Linux 6.11.10-amd64 (SMP w/12 CPU threads; PREEMPT)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE,
TAINT_UNSIGNED_MODULE
Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Versions of packages libhtml-gumbo-perl depends on:
ii libc6 2.41-7
ii libgumbo3 0.13.0+dfsg-2
ii libhtml-tree-perl 5.07-3
ii perl 5.40.1-3
ii perl-base [perlapi-5.40.0] 5.40.1-3
libhtml-gumbo-perl recommends no packages.
libhtml-gumbo-perl suggests no packages.
-- no debconf information
--
Vincent Lefèvre <vinc...@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Pascaline project (LIP, ENS-Lyon)
--- End Message ---
--- Begin Message ---
Source: libhtml-gumbo-perl
Source-Version: 0.18-5
Done: gregor herrmann <gre...@debian.org>
We believe that the bug you reported is fixed in the latest version of
libhtml-gumbo-perl, which is due to be installed in the Debian FTP archive.
A summary of the changes between this version and the previous one is
attached.
Thank you for reporting the bug, which will now be closed. If you
have further comments please address them to 1104...@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.
Debian distribution maintenance software
pp.
gregor herrmann <gre...@debian.org> (supplier of updated libhtml-gumbo-perl
package)
(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmas...@ftp-master.debian.org)
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
Format: 1.8
Date: Sat, 17 May 2025 14:44:37 +0200
Source: libhtml-gumbo-perl
Architecture: source
Version: 0.18-5
Distribution: unstable
Urgency: medium
Maintainer: Debian Perl Group <pkg-perl-maintain...@lists.alioth.debian.org>
Changed-By: gregor herrmann <gre...@debian.org>
Closes: 1104789
Changes:
libhtml-gumbo-perl (0.18-5) unstable; urgency=medium
.
* Add patch to fix wrong code path with GUMBO_NODE_TEMPLATE.
Thanks to Vincent Lefevre for the bug report and Niko Tyni for the patch.
(Closes: #1104789)
* Declare compliance with Debian Policy 4.7.2.
Checksums-Sha1:
68102b221a867b1aa089b8d31226e44cfb8b45c3 2461 libhtml-gumbo-perl_0.18-5.dsc
a5259e6ee406119f1460561796a86b92f03ac917 4036
libhtml-gumbo-perl_0.18-5.debian.tar.xz
Checksums-Sha256:
ff02cc0bc8b1b6f44d45cd1c815bfc0177c93c733d724e7035ebb38ab5f85b4d 2461
libhtml-gumbo-perl_0.18-5.dsc
60e0ad8713c19f94f08ee1a0bbcf46664a65bdd6ab3ce726866d29e1044dd930 4036
libhtml-gumbo-perl_0.18-5.debian.tar.xz
Files:
8c87d11826a0b0755177fe607cf6677b 2461 perl optional
libhtml-gumbo-perl_0.18-5.dsc
57ecfb5e55a04711ecf97899df3ffdb6 4036 perl optional
libhtml-gumbo-perl_0.18-5.debian.tar.xz
-----BEGIN PGP SIGNATURE-----
iQKTBAEBCgB9FiEE0eExbpOnYKgQTYX6uzpoAYZJqgYFAmgohUpfFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldEQx
RTEzMTZFOTNBNzYwQTgxMDREODVGQUJCM0E2ODAxODY0OUFBMDYACgkQuzpoAYZJ
qgZGdw//XTTnXtYsEPQgxb2xEfyA8Kj86XPibSzc55iin3mStND69a21MQULT8gk
WPMWh7hbUKybVKyHOpixwlHP6pKmW3BOWdayi+AQnoYbA58Ln1PTuxspT4kGgLfG
tb64SYy8FyM9yaNKBLpuLsf0mYHdSATYg8STG8pV5Tlk7QZ36rznA8wBUF3jP+PP
rNSTs0Z9bajNuDAIATyMLPbn4gMqoKfQ179xZMyDvNtNonAE0STdpTLAoJ2jJCHo
Q8YEN8IYb806Vb/Th3t57QH2GqA6Vw7zSSk5wkqJXTdiy6jY/Al8Lnpvw8JqFvR+
zYWSJiMxXd2j8q4asYoycO4npqXr0u9x+IG05GB3+giGJtYppSeLX7o5Yw1XHdzD
U+AqBin5ESKHpnZTmN0VTXjmmXrvh94XMRinkEPq4MCmSGatLNvVE+X3htbre0ST
vw3QXxUYD6WtvJkU9hjg1+FNxvwiKi6LYlgggAcu7S49Qb3GBvQw0DacTRFQ2jPq
9Pt8Phv9Va8Ui93fTdwigzAJZ1YwILywn1YWfAj1TZwl8JiP3g9T/nERW2sgiJ9j
zhQzVPTqMbYuKNRK60OHK89vPl1WlzMZP77cdbi3GukI+qzpH8eCgSVU03qGabyW
jbagpedYnO/LzaBgOaMmF4/ilQ6kB/TgIynSB181q3PQrVdBb6w=
=WovM
-----END PGP SIGNATURE-----
pgp4jcZgSHZqS.pgp
Description: PGP signature
--- End Message ---