Hi! On Thu, 2024-02-22 at 12:10:06 +0100, Niels Thykier wrote: > Package: debian-policy > Severity: wishlist
> While the parser is technically correct given the Policy description above, > I cannot see any case where this is a desirable outcome. Instead, I think we > should classify this as a syntax error such that users are instructed to > indent `${shlib:Depends}` token. Currently both the dpkg C and Perl parsers are extremely lax (even more than what the deb822(5) manual page or the Debian Policy permit! No matter the outcome of this discussion, I'm going to make both these parsers at least as strict as the current dpkg documentation. I've played with this a bit and got the attached patch, which would restrict field names to: ^[_]*[A-Za-z]+[A-Za-z0-9-]*[A-Za-z0-9]+$ Which I think would at least cover all current fields in the Debian archive, including debconf templates. And requires somewhat well formed fields, such as not allowing: - starting or ending with «-» (for OpenPGP ASCII Armor protection at the start, and oddness at the end), - any special character at all, - short field names (of 1 character), - starting with a number (although perhaps this one is questionable, and it creates asymmetry) - using an underscore in anything not on the front of the field name. Perhaps the allowed syntax should even be defined by context, so that the initial underscores are only allowed for debconf template files. (This would be made strict for the Perl parsers for the build tools, strict for the C parser in charge of building packages or installing them anew, and lax with warnings for existing installations to avoid bricking potential current users with such field names.) But see below. > A simple fix would be to forbid `$` at the start of the field. Though, I > think at this point it would make sense to prune more special characters > from the list like `%&{}[]()/\\<>|$` from anywhere in the field. Note that > we definitely need to keep `_` as it is used in debconf template files for > translations. Both single and double underscore is used here, though they > always come first in the field name. > > The reason why I want a tighter bound is that currently, the following > things are also considered valid field names: > > > |foo(>=1:2.0), > foo(>=2:2.0), > ,foo(>=3:2.0) > ,foo(>=4:2.0)[amd64]<!nodeb> > )': We can make inverted crying smileys as field names! > `Foo`: Now with markdown formatting of the field name! > $(foo): Can we trick something into running shell commands? > /etc/passwd: Maybe path traversal > > > In all cases, I see no value to allowing these absurd field names and only > potential downsides. I was pondering about this and while I'm in general a strong proponent of rather strict parsing, to avoid garbage in garbage out, that external users would otherwise have to then protect against and deal with. In this case I could see how making the parsing overly strict could stymie potential innovation, like when debconf started using «_» to mark some fields in a special way. I mean we can always go around and carve out new exceptions, but that requires way more effort, and then one needs to justify those new uses, and convince others, which might be hard (once the allowed format is very strict)! We also do not have visibility over privately used field names. And there's the problem that allowing new syntax might require one release cycle to be able to reintroduce it, if the previous dpkg/apt need to be able to support it. Perhaps, as you mention, a more lenient parser that just restricts field start (from at least anything that can easily appear in a dependency field, for example) would accomplish most of the benefit, w/o blocking potential future directions? Say something like the following untested regex: ^[_]*[A-Za-z0-9]+[!-9;-~]+$ (Or perhaps if we are allowing bracket characters, then require them to be balanced, but that might be too much effort, if we are never going to make use of them.) Alternatively we could have hard and soft restriction, hard with something like the above regex producing errors, and soft like the first regex, which would only produce warnings (although that might still be enough of a deterrent that one would still need to update parsers anyway for new syntax). But in any case, this latter regex would not allow for the exact same extensibility that debconf used at the time to prefix the field with a special character (although it very much preserves syntax freedom at the end), which perhaps is an indicator, that we should simply restrict it all, and if a new syntax is required we can always introduce it later on? New syntax that comes to mind could be stuff like: Field-Name(something): value !Field: value Or things along these lines. > But someone will eventually try to do this. We already received a bug report > against debhelper long ago about being able to do path traversal via > questionable package names. Largely, this was considered a non-issue because > dpkg in its default configuration would abort and debhelper has a "run > arbitrary code via the upstream build system" feature anyway. Do you have a reference on hand to that report? Because if dpkg fails I'd assume that might be a side effect from something else, and it might make sense to make sure this is handled explicitly. Thanks, Guillem
From 9b284029b29b3f27cc478198f7de73dfed61cf6d Mon Sep 17 00:00:00 2001 From: Guillem Jover <guil...@debian.org> Date: Fri, 16 Aug 2024 19:59:51 +0200 Subject: [PATCH] Dpkg::Control::HashCore: Restrict field names FIXME: discuss, update manual pages, extend commit message. --- scripts/Dpkg/Control/HashCore.pm | 4 ++-- scripts/t/Dpkg_Control.t | 28 +++++++++++++++++++++++++++- 2 files changed, 29 insertions(+), 3 deletions(-) diff --git a/scripts/Dpkg/Control/HashCore.pm b/scripts/Dpkg/Control/HashCore.pm index 08f35c57b..cea329cf4 100644 --- a/scripts/Dpkg/Control/HashCore.pm +++ b/scripts/Dpkg/Control/HashCore.pm @@ -221,8 +221,8 @@ sub parse { my ($name, $value) = split /\s*:\s*/, $_, 2; if (defined $name and $name =~ m/^\S+?$/) { $parabody = 1; - if ($lead eq '-') { - $self->parse_error($desc, g_('field cannot start with a hyphen')); + if ($name !~ m{^[_]*[A-Za-z]+[A-Za-z0-9-]*[A-Za-z0-9]+$}) { + $self->parse_error($desc, g_('field name %s uses invalid format'), $name); } if (exists $self->{$name}) { unless ($$self->{allow_duplicate}) { diff --git a/scripts/t/Dpkg_Control.t b/scripts/t/Dpkg_Control.t index 29311b8c8..8df78c227 100644 --- a/scripts/t/Dpkg_Control.t +++ b/scripts/t/Dpkg_Control.t @@ -20,7 +20,7 @@ use Test::More; use Test::Dpkg qw(:needs :paths); BEGIN { - plan tests => 24; + plan tests => 35; use_ok('Dpkg::Control'); use_ok('Dpkg::Control::Info'); @@ -116,6 +116,32 @@ Architecture: all Depends: hello ', "Dump of second binary package of $datadir/control-1"); +# Check parsing deb822 data. +my %deb822_bogus = ( + no_separator => 'SomethingNotAField = with a value', + no_field => ': only value', + bad_start_hyphen => '-Bad-Field: hyphen-value', + bad_cont_line => ' continuation line outside of field', + bad_dup_field => "Duped-Field: value-1\n" . + "Duped-Field: value-2\n", + bad_short_field => 'A: short', + bad_end_hyphen => 'Field-: value-', + bad_end_underscore => 'Field-End-Underscore_: value', + bad_prefix_number => '0field: 0value', + bad_prefix_special => '$field: $value', + bad_middle_special => 'Field$name: value$mixed', +); + +while (my ($name, $data) = each %deb822_bogus) { + open $io, '<', \$data or die "canno open io string\n"; + $c = Dpkg::Control::Hash->new(); + eval { + $c->parse($io, "parse bogus deb822 data $name"); + }; + ok($@, "failed to parse bogus deb822 data $name"); + close $io; +} + # Check OpenPGP armored signatures in source control files my $dsc; -- 2.45.2