Hi! On Thu, 2015-09-03 at 12:16:21 +0200, Johannes Schauer wrote: > Package: dpkg > Version: 1.18.2 > Severity: wishlist
> it would be great if one could access Debian packaging information from > source packages of all formats in a unified way. My main motivation is > sbuild which could then just extract debian/changelog from a source > package to check in which chroot to build the package without having > first to extract the whole source package (an operation which might take > considerable time and disk space for large packages). > > Attached you find a partly working proposal for how such a functionality > could look like. Thanks for that! And sorry for the delay, I lost track of this one. :/ > It introduces the new --debian-tarfile option to dpkg-source which will > work similarly to `dpkg-deb --ctrl-tarfile` in that it would write a > tarball representation of the Debian packaging directory to standard > output independent of the source package format. Right, as agreed over IRC at the time. > With the attached patch this already nearly works for the trivial case > (format 2.0 and format 3.0 (quilt)). It does not *fully* work because > dpkg info messages are *also* printed to standard output. Hence, the > resulting combined output on standard output is not a proper tar file as > it has dpkg-source messages in the beginning (but removing it in an > editor will recover a working tarball). > > Maybe dpkg info and warning messages could be written to standard error > instead? Sure, you just need to use something like: report_options(info_fh => \*STDERR); in the same way --print-format is doing. :) > In ./scripts/dpkg-source.pl I found that the new option could share some > code with the --extract option, so I did. Right, I think it would be better instead to factor out the common code into functions and then have separate codepaths for each option, that's something I've been meaning to do anyway also for the build ones, and possibly try to move that into the modules. I can do that if you'd like. > In scripts/Dpkg/Source/Package/V2.pm there also seems to be lots of > opportunity for sharing the sanity checks of the do_extract method with > the new do_debian_tarfile method. If the --debian-tarfile option should > do the same checks as the --extract option (it probably should) then I > could move these sanity checks into a third function that would then be > used by do_extract as well as by do_debian_tarfile. Indeed, I'd rather avoid the duplication. Also a detached function to validate could be useful on its own right, so that one could do that also from say dpkg-source. > Now some thoughts about the interface I chose. Here is an excerpt from > man/dpkg-source.1: > > > .TP > > .RI "\fB\-\-debian-tarfile\fP \fIfilename\fP.dsc [\fIpath\fP]" > > Extracts the \fB./debian\fP packaging directory of a given source package > > and > > sends it to standard output in > > .B tar > > format. The \fB./debian\fP directory component will be part of all files in > > the > > output. Together with > > .BR tar (1) > > this can be used to extract a particular packaging file from the source > > package > > without unpacking the whole source package first. > > For source formats >= 2.0, this action never requires the upstream source. > > For source format 1.0, this action always requires the upstream source. > > 1. Since the output of --debian-tarfile always contains the content of > the ./debian directory one might think that it would make sense to > strip off the ./debian directory prefix and have the content of the > ./debian directory in the root of the created tarball. I decided > against this for two reasons: > > - the *.debian.tar.gz file of source formats >= 2.0 also contains > the debian directory, so I think one would expect the same layout > of the tarball produced by --debian-tarfile > > - it is easier to strip off a leading directory with tar than to add > one if one so desires > > 2. For source format 1.0, the upstream tarball is *always* required. > Even if no hunks in the diff.gz patch an existing file, the upstream > tarball might still contain a file ./debian/foo but we can only know > this if we have the upstream tarball as well. Without the upstream > tarball, the output of this option would otherwise be unreliable. > > 3. I thought about the option of an optional wildcard path so that by > selecting only a subset of files in the ./debian directory, the > upstream tarball would only be required if the requested file is > patching an existing one. But this also fails for the same reason as > the last point: the wildcard path might match a file in the upstream > tarball and we can only check for this if the upstream tarball is > present. > > 4. If an option were added that is not a wildcard but a precise path, > then it would be possible to not require the upstream tarball if that > path in the diff is not patching an existing file. The resulting > created tarball would then only carry a single file in it (which also > looks strange). I thought this option to be very limited and since > source format 1.0 packages are in the minority and since I currently > do not have a good use case where the upstream tarball would not be > present as well, I decided to not make this kind of feature part of > the initial implementation of this proposal. Should the use for it > arise, it can always be added later without breaking the interface. ISTR we covered and agreed to all this on IRC at the time, as it seemed like the sanest options. > What do you think? I think it looks good, it might need a bit more work, but that's it. :) A quick review: > >From 5f12e7d29e7741d67443379db0ea50307abc00bd Mon Sep 17 00:00:00 2001 > From: Johannes 'josch' Schauer <[email protected]> > Date: Thu, 3 Sep 2015 12:15:38 +0200 > Subject: [PATCH] proposal for dpkg-source --debian-tarfile > --- a/man/dpkg-source.1 > +++ b/man/dpkg-source.1 > @@ -121,6 +121,19 @@ This command can take supplementary parameters depending > on the source format. > It will error out for formats where this operation doesn't mean anything. > > .TP > +.RI "\fB\-\-debian-tarfile\fP \fIfilename\fP.dsc [\fIpath\fP]" Please use pathname instead of path. > +Extracts the \fB./debian\fP packaging directory of a given source package and > +sends it to standard output in > +.B tar > +format. The \fB./debian\fP directory component will be part of all files in > the And start new sentences after a dot on their own line, as groff handles that just fine, and reduces text reflow and diff when it needs to be changed. > +output. Together with > +.BR tar (1) > +this can be used to extract a particular packaging file from the source > package > +without unpacking the whole source package first. > +For source formats >= 2.0, this action never requires the upstream source. > +For source format 1.0, this action always requires the upstream source. > + > +.TP > .BR \-? ", " \-\-help > Show the usage message and exit. > .TP > diff --git a/scripts/Dpkg/Source/Package.pm b/scripts/Dpkg/Source/Package.pm > index 60ba3e7..91369bc 100644 > --- a/scripts/Dpkg/Source/Package.pm > +++ b/scripts/Dpkg/Source/Package.pm > @@ -540,6 +540,39 @@ sub do_extract { > 'source package; use one of the subclasses'; > } > > +=item $p->debian_tarfile() I'm not sure the function member name is descriptive enough in this context of what it is doing. > diff --git a/scripts/Dpkg/Source/Package/V1.pm > b/scripts/Dpkg/Source/Package/V1.pm > index db81962..412c532 100644 > --- a/scripts/Dpkg/Source/Package/V1.pm > +++ b/scripts/Dpkg/Source/Package/V1.pm > @@ -173,6 +173,10 @@ sub do_extract { > } > } > > +sub do_debian_tarfile { > + error(g_('cannot extract debian directory from source format 1.0')); > +} Please move the actual format name (here and all other similar error messages) into a format string so that the translations can be shared. > sub can_build { > my ($self, $dir) = @_; > > diff --git a/scripts/Dpkg/Source/Package/V2.pm > b/scripts/Dpkg/Source/Package/V2.pm > index 99fab9c..15d586e 100644 > --- a/scripts/Dpkg/Source/Package/V2.pm > +++ b/scripts/Dpkg/Source/Package/V2.pm > @@ -195,6 +195,40 @@ sub do_extract { > unless $self->{options}{skip_patches}; > } > > +sub do_debian_tarfile { > + my $self = shift; > + > + my ($debianfile, %seen); > + my $re_ext = compression_get_file_extension_regex(); > + my $basename = $self->get_basename(); > + my $basenamerev = $self->get_basename(1); > + foreach my $file ($self->get_files()) { > + my $uncompressed = $file; > + $uncompressed =~ s/\.$re_ext$/.*/; > + $uncompressed =~ s/\.$re_ext\.asc$/.*.asc/; > + error(g_('duplicate files in %s source package: %s'), 'v2.0', > + $uncompressed) if $seen{$uncompressed}; > + $seen{$uncompressed} = 1; > + if ($file =~ /^\Q$basename\E\.orig\.tar\.$re_ext$/) { > + } elsif ($file =~ /^\Q$basename\E\.orig\.tar\.$re_ext\.asc$/) { > + } elsif ($file =~ > /^\Q$basename\E\.orig-([[:alnum:]-]+)\.tar\.$re_ext$/) { > + } elsif ($file =~ > /^\Q$basename\E\.orig-([[:alnum:]-]+)\.tar\.$re_ext\.asc$/) { > + } elsif ($file =~ /^\Q$basenamerev\E\.debian\.tar\.$re_ext$/) { > + $debianfile = $file; > + } else { > + error(g_('unrecognized file for a %s source package: %s'), > + 'v2.0', $file); > + } > + } > + info(g_('unpacking %s'), $debianfile); > + my $dscdir = $self->{basedir}; > + my $fh = Dpkg::Compression::FileHandle->new(filename => > "$dscdir$debianfile"); > + while(<$fh>) { > + print; > + } > + close $fh; This loop uses readline, and although tar is a pretty textual format and so are usually the files inside the packaging directory, it still might end up slurping big files. I'd say you should use here File::Copy's copy function instead. Thanks, Guillem

