* Andreas Tille <ti...@debian.org> [2013-10-29 10:33]:

On Mon, Oct 28, 2013 at 07:06:37PM -0400, James McCoy wrote:
Thanks Rafael for the feedback and Andreas for continued patience.

On Mon, Oct 28, 2013 at 10:10:00PM +0100, Andreas Tille wrote:
On Mon, Oct 28, 2013 at 07:44:57PM +0100, Rafael Laboissiere wrote:
It would be preferable that you had created a side branch in the Git
repository for your changes, such that the merge would be trivial to
do.

This really wouldn't have made much of a difference. It's trivial to add Andrea's repo as a remote and then the functionality is the same as if the branch were in devscripts' repo. At the time that Andreas started work on this, devscripts wasn't in collab-maint so it made sense to just push his changes to a user repo on Alioth so people could access and review the changes.

For the convenience of the developers of devscript, I am attaching below a patch generated with git format-patch that contains an appropriate log message. I cherry-picked the commits made by Andreas Tille and Gregor Herrmann into the Andreas' repo, which concern only the implementation of the Files-Excluded feature. I did also some improvements on my own. his patch works for me on the praat package, but was not extensively tested. At any rate, this "slim" patch introduces a single feature, namely the possibility of excluding files from upstream tarballs according to information in debain/copyright.

If I had push access to the devscripts repository, I could create a side branch there with this patch.

Best,

Rafael

>From 8192d448f5817cb95142124b42fd23a0ff55771d Mon Sep 17 00:00:00 2001
From: Rafael Laboissiere <raf...@laboissiere.net>
Date: Tue, 29 Oct 2013 10:11:33 +0100
Subject: [PATCH] Enable uscan to selectively remove files from upstream
 arquives

[N.B.: This patch includes the original patch proposed by Andreas
Tille in Bug#685787 and further improvements by Gregor Herrmann and
Andreas himself, posted at:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=685787

This is the follow up of a long discussion in the debian-devel mailing
list:

http://lists.debian.org/debian-devel/2012/08/msg00380.html]

The changes in this commit enable uscan to remove files from upstream
archives according to some information given in some control file.
The current implementation is based on using debian/copyright but is
easy to switch to another file.

The changes do the following:

 1. If (and only if) the debian/copyright file is

     Format: http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/

    and if it contains a non-empty field Files-Excluded containing a
    space separated list of globs (as used by find and for specifying
    file lists in machine readable debian/control files). The deletion
    process will loop over every expression and is using the find
    command to delete the according globs.

 2. If files matching are contained in the source tarball this will
    be repackaged except if the option --no-exclusion is given at
    uscan command line or if USCAN_NO_EXCLUSION is set in
    /etc/devscripts.conf or ~/.devscripts.  The removal is implemented
    for all tar compression methods as well as for zip archives (which
    are unpackaged using unzip).  This means if the conditions for
    file exclusion as given above are fullfilled the patch below
    works similar as --repack.

 3. If the tarball did not contained any of the globs in
    debian/copyright::Files-Excluded it will be left untouched.

 4. In case something was removed the version string will be appended by
    '+dfsg' to express the fact that the content of the original source
    was changed.

 5. Sometimes upstream tarballs are dirty and unpack a load of files
    into the current directory.  The patch tries to behave reasonable
    and checks whether it could move those files into a dir named
      $pkg-$newversion
    (in case no such file or directory just exists in such a dirty
    tarball).  Also some non-dirty but quite generically named
    directories (like "source") are renamed to "$pkg-$newversion".

The BEGIN block in uscal.pl has also been changed.  In the previous
version, it was used for requiring module LWP::UserAgent and, in case
of failure, inform the user that it sould install the libwww-perl
package.  This has been generalized throuhg a subroutine
require_module, which is called for both LWP::UserAgent and Try::Tiny.

The appropriate documentation has been added to uscan.1.  Appropriate
Build-Depends and Suggests on libtry-tiny-perl have been set in
debian/control.
---
 debian/control   |   2 +
 scripts/uscan.1  |   8 +++
 scripts/uscan.pl | 149 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 151 insertions(+), 8 deletions(-)

diff --git a/debian/control b/debian/control
index d48f7f2..b9dc690 100644
--- a/debian/control
+++ b/debian/control
@@ -16,6 +16,7 @@ Build-Depends: debhelper (>= 9),
                libparse-debcontrol-perl,
                libterm-size-perl,
                libtimedate-perl,
+               libtry-tiny-perl,
                liburi-perl,
                libwww-perl,
                lsb-release,
@@ -50,6 +51,7 @@ Recommends: at,
             libencode-locale-perl,
             libjson-perl,
             libparse-debcontrol-perl,
+            libtry-tiny-perl,
             liburi-perl,
             libwww-perl,
             lintian,
diff --git a/scripts/uscan.1 b/scripts/uscan.1
index af4e57f..fb53f3e 100644
--- a/scripts/uscan.1
+++ b/scripts/uscan.1
@@ -444,6 +444,10 @@ Give verbose output.
 .B \-\-no\-verbose
 Don't give verbose output.  (This is the default behaviour.)
 .TP
+.B \-\-no\-exclusion
+Do not automatically exclude files mentioned in
+\fIdebian/copyright\fR field \fBFiles-Excluded\fR
+.TP
 .B \-\-debug
 Dump the downloaded web pages to stdout for debugging your watch file.
 .TP
@@ -517,6 +521,10 @@ equivalent to the \fB\-\-destdir\fR option.
 If this is set to \fIyes\fR, then after having downloaded a bzip tar,
 lzma tar, xz tar, or zip archive, \fBuscan\fR will repack it to a gzip tar.
 This is equivalent to the \fB\-\-repack\fR option.
+.B USCAN_NO_EXCLUSION
+If this is set to \fIyes\fR, files mentioned in the field \fBFiles-Excluded\fR
+of \fIdebian/copyright\fR will be ignored and no exclusion of files will be
+tried.  This is equivalent to the \fB\-\-no-exclusion\fR option.
 .SH "EXIT STATUS"
 The exit status gives some indication of whether a newer version was
 found or not; one is advised to read the output to determine exactly
diff --git a/scripts/uscan.pl b/scripts/uscan.pl
index 976b368..19ef495 100755
--- a/scripts/uscan.pl
+++ b/scripts/uscan.pl
@@ -27,6 +27,7 @@ use strict;
 use Cwd;
 use Cwd 'abs_path';
 use Dpkg::IPC;
+use Dpkg::Control::Hash;
 use File::Basename;
 use File::Copy;
 use File::Temp qw/tempfile tempdir/;
@@ -36,15 +37,21 @@ use lib '/usr/share/devscripts';
 use Devscripts::Versort;
 use Text::ParseWords;
 BEGIN {
-    eval { require LWP::UserAgent; };
-    if ($@) {
-	my $progname = basename($0);
-	if ($@ =~ /^Can\'t locate LWP\/UserAgent\.pm/) {
-	    die "$progname: you must have the libwww-perl package installed\nto use this script\n";
-	} else {
-	    die "$progname: problem loading the LWP::UserAgent module:\n  $@\nHave you installed the libwww-perl package?\n";
+    sub require_module ($$) {
+	my ($mod, $pkg) = @_;
+	eval qq{ require $mod; };
+	if ($@) {
+	    my $progname = basename($0);
+	    my $pm_file = join ("/", split (/::/, $mod)) . ".pm";
+	    if ($@ =~ /^Can\'t locate $pm_file/) {
+		die "$progname: you must have the $pkg package installed\nto use this script\n";
+	    } else {
+		die "$progname: problem loading the $mod module:\n  $@\nHave you installed the $pkg package?\n";
+	    }
 	}
     }
+    require_module ('LWP::UserAgent', 'libwww-perl');
+    require_module ('Try::Tiny', 'libtry-tiny-perl');
 }
 my $CURRENT_WATCHFILE_VERSION = 3;
 
@@ -72,6 +79,7 @@ sub uscan_die (@);
 sub dehs_output ();
 sub quoted_regex_replace ($);
 sub safe_replace ($$);
+sub get_main_source_dir ($$$$$);
 
 sub usage {
     print <<"EOF";
@@ -138,6 +146,8 @@ Options:
     --no-conf, --noconf
                    Don\'t read devscripts config files;
                    must be the first option given
+    --no-exclusion no automatic exclusion of files mentioned in
+                   debian/copyright field Files-Excluded
     --help         Show this message
     --version      Show version information
 
@@ -180,6 +190,7 @@ my $dehs_start_output = 0;
 my $pkg_report_header = '';
 my $timeout = 20;
 my $user_agent_string = 'Debian uscan ###VERSION###';
+my $no_exclusion = 0;
 
 if (@ARGV and $ARGV[0] =~ /^--no-?conf$/) {
     $modified_conf_msg = "  (no configuration files read)";
@@ -196,6 +207,7 @@ if (@ARGV and $ARGV[0] =~ /^--no-?conf$/) {
 		       'USCAN_DEHS_OUTPUT' => 'no',
 		       'USCAN_USER_AGENT' => '',
 		       'USCAN_REPACK' => 'no',
+		       'USCAN_NO_EXCLUSION' => 'no',
 		       'DEVSCRIPTS_CHECK_DIRNAME_LEVEL' => 1,
 		       'DEVSCRIPTS_CHECK_DIRNAME_REGEX' => 'PACKAGE(-.+)?',
 		       );
@@ -233,6 +245,8 @@ if (@ARGV and $ARGV[0] =~ /^--no-?conf$/) {
 	or $config_vars{'USCAN_DEHS_OUTPUT'}='no';
     $config_vars{'USCAN_REPACK'} =~ /^(yes|no)$/
 	or $config_vars{'USCAN_REPACK'}='no';
+    $config_vars{'USCAN_NO_EXCLUSION'} =~ /^(yes|no)$/
+	or $config_vars{'USCAN_NO_EXCLUSION'}='no';
     $config_vars{'DEVSCRIPTS_CHECK_DIRNAME_LEVEL'} =~ /^[012]$/
 	or $config_vars{'DEVSCRIPTS_CHECK_DIRNAME_LEVEL'}=1;
 
@@ -263,7 +277,7 @@ if (@ARGV and $ARGV[0] =~ /^--no-?conf$/) {
 # Now read the command line arguments
 my $debug = 0;
 my ($opt_h, $opt_v, $opt_destdir, $opt_download, $opt_force_download,
-    $opt_report, $opt_passive, $opt_symlink, $opt_repack);
+    $opt_report, $opt_passive, $opt_symlink, $opt_repack, $opt_no_exclusion);
 my ($opt_verbose, $opt_level, $opt_regex, $opt_noconf);
 my ($opt_package, $opt_uversion, $opt_watchfile, $opt_dehs, $opt_timeout);
 my $opt_download_version;
@@ -295,6 +309,7 @@ GetOptions("help" => \$opt_h,
 	   "useragent=s" => \$opt_user_agent,
 	   "noconf" => \$opt_noconf,
 	   "no-conf" => \$opt_noconf,
+	   "no-exclusion" => \$opt_no_exclusion,
 	   "download-current-version" => \$opt_download_current_version,
 	   )
     or die "Usage: $progname [options] [directories]\nRun $progname --help for more details\n";
@@ -318,6 +333,7 @@ $timeout = 20 unless defined $timeout and $timeout > 0;
 $symlink = $opt_symlink if defined $opt_symlink;
 $verbose = $opt_verbose if defined $opt_verbose;
 $dehs = $opt_dehs if defined $opt_dehs;
+$no_exclusion = $opt_no_exclusion if defined $opt_no_exclusion;
 $user_agent_string = $opt_user_agent if defined $opt_user_agent;
 $download_version = $opt_download_version if defined $opt_download_version;
 
@@ -1480,6 +1496,64 @@ EOF
 	}
     }
 
+    my $excludesuffix = '+dfsg';
+    my ($newfile_base_dfsg);
+    if ( !$no_exclusion ) {
+        my $data ;
+        $data = Dpkg::Control::Hash->new();
+        Try::Tiny::try {
+            $data->load('debian/copyright');
+        } Try::Tiny::catch {
+            print "-- No machine readable debian/copyright file.\n" if ( $verbose ) ;
+            $data->{'format'} = '' ;
+        } ;
+        my $okformat = qr'http://www.debian.org/doc/packaging-manuals/copyright-format/[.\d]+';
+        print "-- Wrong format of debian/copyright file to profit from Files-Excluded.\n" if ( $data->{'files-excluded'} and $data->{'format'} !~ m{^$okformat/?$} and $verbose ) ;
+        if ($data->{'format'} =~ m{^$okformat/?$} and $data->{'files-excluded'} ) {
+            my $tempdir = tempdir ( "uscanXXXX", TMPDIR => 1, CLEANUP => 1 );
+            my $globpattern = "*";
+            my $hidden = ".[!.]*";
+            if (defined glob("$tempdir/$hidden")) {
+                $globpattern .= " $hidden";
+            }
+            my $absdestdir = abs_path($destdir);
+            unless ( system("cd $tempdir; tar -xaf \"$absdestdir/$newfile_base\" 2>/dev/null") == 0 ) {
+                print "-- $newfile_base is no tarball.  Try unzip.\n" if $verbose;
+                # try unzip if tar fails - we do want to do something sensible even if no --repack was specified
+                system('command -v unzip >/dev/null 2>&1') >> 8 == 0
+                   or die("unzip binary not found. This would serve as fallback because tar just failed.\n");
+                system('unzip', '-q', '-a', '-d', $tempdir, "$destdir/$newfile_base") == 0
+                   or die("Repacking from zip to tar.gz failed (could not unzip)\n");
+            }
+            # Some source archives contain a useless __MACOSX dir which would prevent a reasonable
+            # normalising of the +dfsg.orig archive - so removing it in advance in case it should
+            # be removed anyway helps creating normalised source archives.
+            my $exclude__MACOSX = grep( /\s*\/?__MACOSX\/?\s*/, $data->{"files-excluded"} );
+            my $main_source_dir = get_main_source_dir($tempdir, $pkg, $newversion, $excludesuffix, $exclude__MACOSX);
+            unless ( -d $main_source_dir ) {
+                print STDERR "Error: $main_source_dir is no directory";
+            }
+            my $nfiles_before = `find "$main_source_dir" | wc -l`;
+            foreach (split /\s+/, $data->{"files-excluded"}) {
+                # delete trailing '/' because otherwise find -path will fail
+                s?/+$?? ;
+                # use -depth to enable deleting directories
+                system('find',$main_source_dir,'-depth','-path',"$main_source_dir/$_",qw(-exec rm -rf {} ;))==0 or
+                    die "failure to run find properly";
+            };
+            my $nfiles_after = `find "$main_source_dir" | wc -l`;
+            if ( $nfiles_before == $nfiles_after && ! $exclude__MACOSX ) {
+                print "-- Source tree remains identical - no need for repacking.\n" if $verbose;
+            } else {
+                my $suffix = 'gz' ;
+                $newfile_base_dfsg = "${pkg}_${newversion}${excludesuffix}.orig.tar.$suffix" ;
+                system("cd $tempdir; GZIP='-n -9' tar --owner=root --group=root --mode=a+rX -czf \"$absdestdir/$newfile_base_dfsg\" $globpattern") == 0
+                   or die("Excluding files failed (could not create tarball)\n");
+                $symlink = 'files-excluded' # prevent symlinking or renaming
+            }
+        }
+    }
+
     my @renames = (
 	[qr/\.(tar\.gz|tgz)$/, 'gz'],
 	[qr/\.(tar\.bz2|tbz2?)$/, 'bz2'],
@@ -1506,6 +1580,8 @@ EOF
 		print "    and symlinked $renamed_base to it\n";
 	    } elsif ($symlink eq 'rename') {
 		print "    and renamed it as $renamed_base\n";
+	    } elsif ($symlink eq 'files-excluded') {
+		print "    and removed files from it in $newfile_base_dfsg\n";
 	    }
 	} elsif ($dehs) {
 	    my $msg = "Successfully downloaded updated package $newfile_base";
@@ -1514,6 +1590,8 @@ EOF
 		$msg .= " and symlinked $renamed_base to it";
 	    } elsif ($symlink eq 'rename') {
 		$msg .= " and renamed it as $renamed_base";
+	    } elsif ($symlink eq 'files-excluded') {
+		$msg .= " and removed files from it in $newfile_base_dfsg\n";
 	    } else {
 		$dehs_tags{'target'} = $newfile_base;
 	    }
@@ -1524,6 +1602,8 @@ EOF
 		print "    and symlinked $renamed_base to it\n";
 	    } elsif ($symlink eq 'rename') {
 		print "    and renamed it as $renamed_base\n";
+	    } elsif ($symlink eq 'files-excluded') {
+		print "    and removed files from it in $newfile_base_dfsg\n";
 	    }
 	}
 	last;
@@ -2066,3 +2146,56 @@ sub safe_replace($$) {
 	return 1;
     }
 }
+
+sub get_main_source_dir($$$$$) {
+    my ($tempdir, $pkg, $newversion, $excludesuffix, $exclude__MACOSX) = @_;
+    my $fcount = 0;
+    my $main_source_dir = '' ;
+    my $any_dir = '' ;
+    opendir DIR, $tempdir or die "opendir $tempdir: $!";
+    my @files = readdir DIR ;
+    closedir DIR ;
+    foreach my $file (@files) {
+	unless ($file =~ /^\.\.?/) {
+            if ( $exclude__MACOSX && $file =~ /^__MACOSX$/ ) {
+                `rm -rf ${tempdir}/__MACOSX` ;
+                next ;
+            }
+            $fcount++;
+	    if ( -d $tempdir.'/'.$file ) {
+                $any_dir = $tempdir . '/' . $file ;
+                # check whether there is some dir in upstream source which looks reasonable
+                # If such dir exists, we do not try to undirty the directory structure
+                $main_source_dir = $any_dir if ( $file =~ /^$pkg\w*$newversion$/i ) ;
+            }
+        }
+    }
+    if ( $fcount == 1 and $main_source_dir ) {
+        return $main_source_dir ;
+    }
+    if ( $fcount == 1 and $any_dir ) {
+        # Unusual base dir in tarball - should be rather something like ${pkg}-${newversion}
+        $main_source_dir = $tempdir . '/' . $pkg . '-' . $newversion . $excludesuffix . '.orig';
+        move($any_dir, $main_source_dir) or die("Unable to move $any_dir directory $main_source_dir\n");
+        return $main_source_dir ;
+    }
+    print "-- Dirty tarball found.\n" if $verbose;
+    if ( $main_source_dir ) { # if tarball is dirty but does contain a $pkg-$newversion dir we will not undirty but leave it as is
+        print "-- No idea how to create proper tarball structure - leaving as is.\n" if $verbose;
+        return $tempdir;
+    }
+    print "-- Move files to subdirectory $pkg-$newversion.\n" if $verbose;
+    $main_source_dir = $tempdir . '/' . $pkg . '-' . $newversion . $excludesuffix . '.orig';
+    mkdir($main_source_dir) or die("Unable to create temporary source directory $main_source_dir\n");
+    foreach my $file (@files) {
+	unless ($file =~ /^\.\.?/) {
+	    if ( -d "${tempdir}/$file" ) {
+                # HELP: why can't perl move not move directories????
+                system( "mv ${tempdir}/$file $main_source_dir" ) ;
+            } else {
+                move("${tempdir}/$file", $main_source_dir) or die("Unable to move ${tempdir}/$file directory $main_source_dir\n");
+            }
+        }
+    }
+    return $main_source_dir;
+}
-- 
1.8.4.rc3

Reply via email to