Package: libhtml-gumbo-perl Version: 0.18-4+b1 Severity: serious Tags: security upstream Justification: security Forwarded: https://github.com/ruz/HTML-Gumbo/issues/6 X-Debbugs-Cc: Debian Security Team <t...@security.debian.org>
I get erratic behavior on the template HTML element, e.g. on the HTML file "<template>". For instance: $ perl -C -MHTML::Gumbo -e "print HTML::Gumbo->new->parse('<template>', format => 'string');" <html><head>\217¥�¾U</head><body></body></html> $ perl -C -MHTML::Gumbo -e "print HTML::Gumbo->new->parse('<template>', format => 'string');" <html><head>)�>\220U</head><body></body></html> $ perl -C -MHTML::Gumbo -e "print HTML::Gumbo->new->parse('<template>', format => 'string');" <html><head>q'N$uU</head><body></body></html> One can see random output, which may include control characters (above, I have changed them to \217 and \220 as Emacs shows them, to avoid such control characters in the mail message). With valgrind: $ valgrind perl -C -MHTML::Gumbo -e "print HTML::Gumbo->new->parse('<template>', format => 'string');" ==64955== Memcheck, a memory error detector ==64955== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al. ==64955== Using Valgrind-3.24.0 and LibVEX; rerun with -h for copyright info ==64955== Command: perl -C -MHTML::Gumbo -e print\ HTML::Gumbo-\>new-\>parse('\<template\>',\ format\ =\>\ 'string'); ==64955== ==64955== Conditional jump or move depends on uninitialised value(s) ==64955== at 0x484DC89: strlen (vg_replace_strmem.c:505) ==64955== by 0x2AD7DF: ??? (in /usr/bin/perl) ==64955== by 0x486D6CE: tree_to_string (Gumbo.xs:189) ==64955== by 0x486E2C4: walk_tree.isra.0 (Gumbo.xs:55) ==64955== by 0x486E2C4: walk_tree.isra.0 (Gumbo.xs:55) ==64955== by 0x486E2C4: walk_tree.isra.0 (Gumbo.xs:55) ==64955== by 0x486E41B: parse_to_string_cb (Gumbo.xs:505) ==64955== by 0x486ED4B: common_parse.isra.0 (Gumbo.xs:545) ==64955== by 0x486F09C: XS_HTML__Gumbo_parse_to_string (Gumbo.xs:559) ==64955== by 0x20B3E7: ??? (in /usr/bin/perl) ==64955== by 0x290C95: Perl_runops_standard (in /usr/bin/perl) ==64955== by 0x179E51: perl_run (in /usr/bin/perl) ==64955== <html><head></head><body></body></html> ==64955== ==64955== HEAP SUMMARY: ==64955== in use at exit: 592,160 bytes in 2,369 blocks ==64955== total heap usage: 7,166 allocs, 4,797 frees, 1,159,576 bytes allocated ==64955== ==64955== LEAK SUMMARY: ==64955== definitely lost: 18,102 bytes in 19 blocks ==64955== indirectly lost: 50,698 bytes in 23 blocks ==64955== possibly lost: 514,100 bytes in 2,318 blocks ==64955== still reachable: 9,260 bytes in 9 blocks ==64955== of which reachable via heuristic: ==64955== newarray : 1,056 bytes in 33 blocks ==64955== suppressed: 0 bytes in 0 blocks ==64955== Rerun with --leak-check=full to see details of leaked memory ==64955== ==64955== Use --track-origins=yes to see where uninitialised values come from ==64955== For lists of detected and suppressed errors, rerun with: -s ==64955== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) So, uninitialized data are used for the output. If I use "format => 'callback'" (will a callback) instead of "format => 'string'", then I get the following error: Unknown node type at /usr/lib/x86_64-linux-gnu/perl5/5.40/HTML/Gumbo.pm line 298, <> line 1. (which is better from the security point of view, but prevents one from parsing some modern HTML documents). It apparently comes from Gumbo.xs, where there are two occurrences of croak("Unknown node type"); I suspect that this is the first one as the second one corresponds to text node types. The cause is probably the most recent node type GUMBO_NODE_TEMPLATE from the Gumbo library (libgumbo): typedef enum { /** Document node. v will be a GumboDocument. */ GUMBO_NODE_DOCUMENT, /** Element node. v will be a GumboElement. */ GUMBO_NODE_ELEMENT, /** Text node. v will be a GumboText. */ GUMBO_NODE_TEXT, /** CDATA node. v will be a GumboText. */ GUMBO_NODE_CDATA, /** Comment node. v will be a GumboText, excluding comment delimiters. */ GUMBO_NODE_COMMENT, /** Text node, where all contents is whitespace. v will be a GumboText. */ GUMBO_NODE_WHITESPACE, /** Template node. This is separate from GUMBO_NODE_ELEMENT because many * client libraries will want to ignore the contents of template nodes, as * the spec suggests. Recursing on GUMBO_NODE_ELEMENT will do the right thing * here, while clients that want to include template contents should also * check for GUMBO_NODE_TEMPLATE. v will be a GumboElement. */ GUMBO_NODE_TEMPLATE } GumboNodeType; This node type was added in 2015: https://github.com/google/gumbo-parser/commit/4383a40605ee7872a8e2de58553383a13d919153 but most of the HTML::Gumbo code predates this change. -- System Information: Debian Release: trixie/sid APT prefers unstable-debug APT policy: (500, 'unstable-debug'), (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable-debug'), (500, 'proposed-updates-debug'), (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 6.11.10-amd64 (SMP w/12 CPU threads; PREEMPT) Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages libhtml-gumbo-perl depends on: ii libc6 2.41-7 ii libgumbo3 0.13.0+dfsg-2 ii libhtml-tree-perl 5.07-3 ii perl 5.40.1-3 ii perl-base [perlapi-5.40.0] 5.40.1-3 libhtml-gumbo-perl recommends no packages. libhtml-gumbo-perl suggests no packages. -- no debconf information -- Vincent Lefèvre <vinc...@vinc17.net> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / Pascaline project (LIP, ENS-Lyon)