Package: devscripts Version: 2.10.69+squeeze2 Severity: wishlist Tags: patch
Hi, in a (bit longish) thread on debian-devel@l.d.o[1] there was some discussion about enabling uscan to remove files from upstream archives according to some information given in some control file. There was no real consensus about what control file to use. The implementation below is based on using debian/copyright but is easy to switch to other files in case some other consensus might be reached. The attached patch does the following: 1. If (and only if) the debian/copyright file is Format: http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/ and if it contains a non-empty field Files-Excluded containing a space separated list of globs (as used by find and for specifying file lists in machine readable debian/control files). The deletion process will loop over every expression and is using the find command to delete the according globs. 2. If files matching are contained in the source tarball this will be repackaged except if the option --no-exclusion is given at uscan command line or if USCAN_NO_EXCLUSION is set in /etc/devscripts.conf or ~/.devscripts. The removal is implemented for all tar compression methods as well as for zip archives (which are unpackaged using unzip). This means if the conditions for file exclusion as given above are fullfilled the patch below works similar as --repack. 3. If the tarball did not contained any of the globs in debian/copyright::Files-Excluded it will be left untouched. 4. In case something was removed the version string will be appended by '+dfsg' to express the fact that the content of the original source was changed. Note: There was no real consensus whether to use this suffix or rather '~dfsg'. This could be solved by some additional configuration option that could be added later. For some moment I also had the idea to obtain the suffix which is "wanted" by the maintainer either from debian/watch:dversionmangle or debian/changelog but I droped this idea because I did not found a reliable method to make a safe guess. 5. Sometimes upstream tarballs are dirty and unpack a load of files into the current directory. The patch tries to behave reasonable and checks whether it could move those files into a dir named $pkg-$newversion (in case no such file or directory just exists in such a dirty tarball). Also some non-dirty but quite generically named directories (like "source") are renamed to "$pkg-$newversion". I have tested the code with five different packages in Debian Med repository which show different problematic directory structures and also different compression methods. The according copyright files including Files-Excluded are commited to the following locations: Vcs-Svn: svn://svn.debian.org/debian-med/trunk/packages/ampliconnoise/trunk/ Vcs-Svn: svn://svn.debian.org/debian-med/trunk/packages/conquest-dicom-server/trunk/ Vcs-Svn: svn://svn.debian.org/debian-med/trunk/packages/imagej/trunk/ Vcs-Svn: svn://svn.debian.org/debian-med/trunk/packages/saint/trunk/ Vcs-Svn: svn://svn.debian.org/debian-med/trunk/packages/tm-align/trunk/ When applying the patch and using `uscan --verbose --force-download` I get the actual resulting orig.tar.gz as I want it to be. Remark: Regarding the implementation there was some uncertainity about the actual Perl module to use. In the patch below script I decided to stick to Dpkg::Control and left the code for Parse::DebControl as a comment which could pretty easily could replace the other parser. Please consider applying this patch after possibly further discussion on debian-devel@l.d.o. Kind regards Andreas. PS: Some more skilled Perl programmer might see some room for enhancing the code - just be warned that Perl is not my native language. [1] http://lists.debian.org/debian-devel/2012/08/msg00380.html -- Package-specific info: --- /etc/devscripts.conf --- --- ~/.devscripts --- Not present -- System Information: Debian Release: 6.0.5 Architecture: i386 (i686) Kernel: Linux 2.6.36-xenU-4814-i386 (SMP w/1 CPU core) Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages devscripts depends on: ii dpkg-dev 1.15.8.12 Debian package development tools ii libc6 2.11.3-3 Embedded GNU C Library: Shared lib ii perl 5.10.1-17squeeze3 Larry Wall's Practical Extraction Versions of packages devscripts recommends: pn at <none> (no description available) ii bsd-mailx [mailx] 8.1.2-0.20100314cvs-1 simple mail user agent ii curl 7.21.0-2.1+squeeze2 Get a file from an HTTP, HTTPS or ii dctrl-tools 2.14.5 Command-line tools to process Debi pn debian-keyring <none> (no description available) pn debian-maintainers <none> (no description available) ii dput 0.9.6.1+squeeze1 Debian package upload tool pn equivs <none> (no description available) ii fakeroot 1.14.4-1 Gives a fake root environment ii git [git-core] 1:1.7.2.5-3 fast, scalable, distributed revisi ii git-core 1:1.7.2.5-3 fast, scalable, distributed revisi ii gnupg 1.4.10-4 GNU privacy guard - a free PGP rep pn libauthen-sasl-per <none> (no description available) pn libcrypt-ssleay-pe <none> (no description available) pn libjson-perl <none> (no description available) pn libparse-debcontro <none> (no description available) pn libsoap-lite-perl <none> (no description available) pn libterm-size-perl <none> (no description available) ii libtimedate-perl 1.2000-1 collection of modules to manipulat ii liburi-perl 1.54-2 module to manipulate and access UR ii libwww-perl 5.836-1 Perl HTTP/WWW client/server librar pn libyaml-syck-perl <none> (no description available) pn lintian <none> (no description available) ii lsb-release 3.2-23.2squeeze1 Linux Standard Base version report ii lynx-cur [www-brow 2.8.8dev.5-1 Text-mode WWW Browser with NLS sup ii lzma 4.43-14 Compression method of 7z format in ii mailx 1:20071201-3 Transitional package for mailx ren ii man-db 2.5.7-8 on-line manual pager ii openssh-client [ss 1:5.5p1-6+squeeze2 secure shell (SSH) client, for sec ii patch 2.6-2 Apply a diff file to an original ii patchutils 0.3.1-2 Utilities to work with patches ii sensible-utils 0.0.4 Utilities for sensible alternative pn strace <none> (no description available) ii unzip 6.0-4 De-archiver for .zip files ii w3m [www-browser] 0.5.2-9 WWW browsable pager with excellent pn wdiff <none> (no description available) ii wget 1.12-2.1 retrieves files from the web ii xz-utils 5.0.0-2 XZ-format compression utilities Versions of packages devscripts suggests: ii build-essential 11.5 Informational list of build-essent pn cvs-buildpackage <none> (no description available) pn devscripts-el <none> (no description available) pn gnuplot <none> (no description available) pn libfile-desktopentry-p <none> (no description available) pn libnet-smtp-ssl-perl <none> (no description available) ii mutt 1.5.20-9+squeeze2 text-based mailreader supporting M pn svn-buildpackage <none> (no description available) ii w3m 0.5.2-9 WWW browsable pager with excellent -- no debconf information
diff --git a/scripts/uscan.1 b/scripts/uscan.1 index 86b3078..4f53df1 100644 --- a/scripts/uscan.1 +++ b/scripts/uscan.1 @@ -428,6 +428,10 @@ Give verbose output. .B \-\-no\-verbose Don't give verbose output. (This is the default behaviour.) .TP +.B \-\-no\-exclusion +Do not automatically exclude files mentioned in +\fIdebian/copyright\fR field \fBFiles-Excluded\fR +.TP .B \-\-debug Dump the downloaded web pages to stdout for debugging your watch file. .TP @@ -501,6 +505,10 @@ equivalent to the \fB\-\-destdir\fR option. If this is set to \fIyes\fR, then after having downloaded a bzip tar, lzma tar, xz tar, or zip archive, \fBuscan\fR will repack it to a gzip tar. This is equivalent to the \fB\-\-repack\fR option. +.B USCAN_NO_EXCLUSION +If this is set to \fIyes\fR, files mentioned in the field \fBFiles-Excluded\fR +of \fIdebian/copyright\fR will be ignored and no exclusion of files will be +tried. This is equivalent to the \fB\-\-no-exclusion\fR option. .SH "EXIT STATUS" The exit status gives some indication of whether a newer version was found or not; one is advised to read the output to determine exactly diff --git a/scripts/uscan.pl b/scripts/uscan.pl index 8723fb4..e118142 100755 --- a/scripts/uscan.pl +++ b/scripts/uscan.pl @@ -45,6 +45,11 @@ BEGIN { } } } +# Dpkg::Control::Hash prefered by James McCoy (who did the last three uscan.pl edits using a debian.org e-mail address) +use Dpkg::Control::Hash; +# Parse::DebControl suggested by Jonas Smedegaard +# use Parse::DebControl; + my $CURRENT_WATCHFILE_VERSION = 3; my $progname = basename($0); @@ -70,6 +75,7 @@ sub dehs_die ($); sub dehs_output (); sub quoted_regex_replace ($); sub safe_replace ($$); +sub get_main_source_dir($$$); sub usage { print <<"EOF"; @@ -136,6 +142,8 @@ Options: --no-conf, --noconf Don\'t read devscripts config files; must be the first option given + --no-exclusion no automatic exclusion of files mentioned in + debian/copyright field Files-Excluded --help Show this message --version Show version information @@ -178,6 +186,7 @@ my $dehs_start_output = 0; my $pkg_report_header = ''; my $timeout = 20; my $user_agent_string = 'Debian uscan ###VERSION###'; +my $no_exclusion = 0; if (@ARGV and $ARGV[0] =~ /^--no-?conf$/) { $modified_conf_msg = " (no configuration files read)"; @@ -194,6 +203,7 @@ if (@ARGV and $ARGV[0] =~ /^--no-?conf$/) { 'USCAN_DEHS_OUTPUT' => 'no', 'USCAN_USER_AGENT' => '', 'USCAN_REPACK' => 'no', + 'USCAN_NO_EXCLUSION' => 'no', 'DEVSCRIPTS_CHECK_DIRNAME_LEVEL' => 1, 'DEVSCRIPTS_CHECK_DIRNAME_REGEX' => 'PACKAGE(-.+)?', ); @@ -231,6 +241,8 @@ if (@ARGV and $ARGV[0] =~ /^--no-?conf$/) { or $config_vars{'USCAN_DEHS_OUTPUT'}='no'; $config_vars{'USCAN_REPACK'} =~ /^(yes|no)$/ or $config_vars{'USCAN_REPACK'}='no'; + $config_vars{'USCAN_NO_EXCLUSION'} =~ /^(yes|no)$/ + or $config_vars{'USCAN_NO_EXCLUSION'}='no'; $config_vars{'DEVSCRIPTS_CHECK_DIRNAME_LEVEL'} =~ /^[012]$/ or $config_vars{'DEVSCRIPTS_CHECK_DIRNAME_LEVEL'}=1; @@ -261,7 +273,7 @@ if (@ARGV and $ARGV[0] =~ /^--no-?conf$/) { # Now read the command line arguments my $debug = 0; my ($opt_h, $opt_v, $opt_destdir, $opt_download, $opt_force_download, - $opt_report, $opt_passive, $opt_symlink, $opt_repack); + $opt_report, $opt_passive, $opt_symlink, $opt_repack, $opt_no_exclusion); my ($opt_verbose, $opt_level, $opt_regex, $opt_noconf); my ($opt_package, $opt_uversion, $opt_watchfile, $opt_dehs, $opt_timeout); my $opt_download_version; @@ -293,6 +305,7 @@ GetOptions("help" => \$opt_h, "useragent=s" => \$opt_user_agent, "noconf" => \$opt_noconf, "no-conf" => \$opt_noconf, + "no-exclusion" => \$opt_no_exclusion, "download-current-version" => \$opt_download_current_version, ) or die "Usage: $progname [options] [directories]\nRun $progname --help for more details\n"; @@ -316,6 +329,7 @@ $timeout = 20 unless defined $timeout and $timeout > 0; $symlink = $opt_symlink if defined $opt_symlink; $verbose = $opt_verbose if defined $opt_verbose; $dehs = $opt_dehs if defined $opt_dehs; +$no_exclusion = $opt_no_exclusion if defined $opt_no_exclusion; $user_agent_string = $opt_user_agent if defined $opt_user_agent; $download_version = $opt_download_version if defined $opt_download_version; if ($dehs) { @@ -1429,6 +1443,56 @@ EOF } } + if ( !$no_exclusion ) { + my $data = Dpkg::Control::Hash->new(); + $data->load('debian/copyright'); + # my $parser = new Parse::DebControl(1); + # my $data = $parser->parse_file('debian/copyright', {discardCase=>1,singleBlock=>1,}); + my $okformat = qr'http://www.debian.org/doc/packaging-manuals/copyright-format/1.0'; + if ($data->{'format'} =~ m{^$okformat/?$} and $data->{'files-excluded'} ) { + my $tempdir = tempdir ( "uscanXXXX", TMPDIR => 1, CLEANUP => 1 ); + my $globpattern = "*"; + my $hidden = ".[!.]*"; + if (defined glob("$tempdir/$hidden")) { + $globpattern .= " $hidden"; + } + my $absdestdir = abs_path($destdir); + unless ( system("cd $tempdir; tar -xaf \"$absdestdir/$newfile_base\" 2>/dev/null") == 0 ) { + print "-- $newfile_base is no tarball. Try unzip.\n" if $verbose; + # try unzip if tar fails - we do want to do something sensible even if no --repack was specified + system('command -v unzip >/dev/null 2>&1') >> 8 == 0 + or die("unzip binary not found. This would serve as fallback because tar just failed.\n"); + system('unzip', '-q', '-a', '-d', $tempdir, "$destdir/$newfile_base") == 0 + or die("Repacking from zip to tar.gz failed (could not unzip)\n"); + } + my $main_source_dir = get_main_source_dir($tempdir, $pkg, $newversion); + unless ( -d $main_source_dir ) { + print STDERR "Error: $main_source_dir is no directory"; + } + my $nfiles_before = `find $main_source_dir | wc -l`; + foreach (grep {/\//} split /\s+/, $data->{"files-excluded"}) { + # delete trailing '/' because otherwise find -path will fail + s?/+$?? ; + # use rm -rf to enable deleting non-empty directories + `find $main_source_dir -path "$main_source_dir/$_" | xargs rm -rf`; + }; + foreach (grep {/^[^\/]+$/} split /\s+/, $data->{"files-excluded"}) { + `find $main_source_dir -type f -name $_ -delete`; + }; + my $nfiles_after = `find $main_source_dir | wc -l`; + if ( $nfiles_before == $nfiles_after ) { + print "-- Source tree remains identical - no need for repacking.\n" if $verbose; + } else { + my $excludefuffix = '+dfsg' ; + my $suffix = 'gz' ; + my $newfile_base_dfsg = "${pkg}_${newversion}${excludefuffix}.orig.tar.$suffix" ; + system("cd $tempdir; GZIP='-n -9' tar --owner=root --group=root --mode=a+rX -czf \"$absdestdir/$newfile_base_dfsg\" $globpattern") == 0 + or die("Excluding files failed (could not create tarball)\n"); + $symlink = 'no' # prevent symlinking or renaming + } + } + } + my @renames = ( [qr/\.(tar\.gz|tgz)$/, 'gz'], [qr/\.(tar\.bz2|tbz2?)$/, 'bz2'], @@ -2005,3 +2069,50 @@ sub safe_replace($$) { return 1; } } + +sub get_main_source_dir($$$) { + my ($tempdir, $pkg, $newversion) = @_; + my $fcount = 0; + my $main_source_dir = '' ; + my $any_dir = '' ; + opendir DIR, $tempdir or die "opendir $tempdir: $!"; + my @files = readdir DIR ; + closedir DIR ; + foreach my $file (@files) { + unless ($file =~ /^\.\.?/) { + $fcount++; + if ( -d $tempdir.'/'.$file ) { + $any_dir = $tempdir . '/' . $file ; + $main_source_dir = $any_dir if ( $file =~ /^$pkg\w*$newversion$/i ) ; + } + } + } + if ( $fcount == 1 and $main_source_dir ) { + return $main_source_dir ; + } + if ( $fcount == 1 and $any_dir ) { + # Unusual base dir in tarball - should be rather something like ${pkg}-${newversion} + $main_source_dir = $tempdir . '/' . $pkg . '-' . $newversion ; + move($any_dir, $main_source_dir) or die("Unable to move $any_dir directory $main_source_dir\n"); + return $main_source_dir ; + } + print "-- Dirty tarball found.\n" if $verbose; + if ( $main_source_dir ) { # if tarball is dirty but does contain a $pkg-$newversion dir we will not undirty but leave it as is + print "-- No idea how to create proper tarball structure - leaving as is.\n" if $verbose; + return $tempdir; + } + print "-- Move files to subdirectory $pkg-$newversion.\n" if $verbose; + $main_source_dir = $tempdir . '/' . $pkg . '-' . $newversion ; + mkdir($main_source_dir) or die("Unable to create temporary source directory $main_source_dir\n"); + foreach my $file (@files) { + unless ($file =~ /^\.\.?/) { + # move("${tempdir}/$file", $main_source_dir) or die("Unable to move ${tempdir}/$file directory $main_source_dir\n"); + unless ( move("${tempdir}/$file", $main_source_dir) ) { + # HELP: why can't perl move not move directories???? + print "Perl move seems to be not able to ` move(\"${tempdir}/$file\", $main_source_dir) ` ... trying system mv\n" if $debug; + system( "mv ${tempdir}/$file $main_source_dir" ) ; + } + } + } + return $main_source_dir; +}