Package: devscripts
Version: 2.10.69+squeeze2
Severity: wishlist
Tags: patch

Hi,

in a (bit longish) thread on debian-devel@l.d.o[1] there was some
discussion about enabling uscan to remove files from upstream archives
according to some information given in some control file.  There was no
real consensus about what control file to use.  The implementation below
is based on using debian/copyright but is easy to switch to other files
in case some other consensus might be reached.

The attached patch does the following:

 1. If (and only if) the debian/copyright file is

     Format: http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/

    and if it contains a non-empty field Files-Excluded containing a
    space separated list of globs (as used by find and for specifying
    file lists in machine readable debian/control files). The deletion
    process will loop over every expression and is using the find
    command to delete the according globs.

 2. If files matching are contained in the source tarball this will
    be repackaged except if the option --no-exclusion is given at
    uscan command line or if USCAN_NO_EXCLUSION is set in
    /etc/devscripts.conf or ~/.devscripts.  The removal is implemented
    for all tar compression methods as well as for zip archives (which
    are unpackaged using unzip).  This means if the conditions for
    file exclusion as given above are fullfilled the patch below
    works similar as --repack.

 3. If the tarball did not contained any of the globs in
    debian/copyright::Files-Excluded it will be left untouched.

 4. In case something was removed the version string will be appended by
    '+dfsg' to express the fact that the content of the original source
    was changed.  Note: There was no real consensus whether to use this
    suffix or rather '~dfsg'.  This could be solved by some additional
    configuration option that could be added later.  For some moment I
    also had the idea to obtain the suffix which is "wanted" by the
    maintainer either from debian/watch:dversionmangle or
    debian/changelog but I droped this idea because I did not found
    a reliable method to make a safe guess.

 5. Sometimes upstream tarballs are dirty and unpack a load of files
    into the current directory.  The patch tries to behave reasonable
    and checks whether it could move those files into a dir named
      $pkg-$newversion
    (in case no such file or directory just exists in such a dirty
    tarball).  Also some non-dirty but quite generically named
    directories (like "source") are renamed to "$pkg-$newversion".

I have tested the code with five different packages in Debian Med
repository which show different problematic directory structures and
also different compression methods.  The according copyright files
including Files-Excluded are commited to the following locations:

  Vcs-Svn: svn://svn.debian.org/debian-med/trunk/packages/ampliconnoise/trunk/
  Vcs-Svn: 
svn://svn.debian.org/debian-med/trunk/packages/conquest-dicom-server/trunk/
  Vcs-Svn: svn://svn.debian.org/debian-med/trunk/packages/imagej/trunk/
  Vcs-Svn: svn://svn.debian.org/debian-med/trunk/packages/saint/trunk/
  Vcs-Svn: svn://svn.debian.org/debian-med/trunk/packages/tm-align/trunk/

When applying the patch and using `uscan --verbose --force-download`
I get the actual resulting orig.tar.gz as I want it to be.

Remark:
Regarding the implementation there was some uncertainity about the
actual Perl module to use.  In the patch below script I decided to
stick to Dpkg::Control and left the code for Parse::DebControl as a
comment which could pretty easily could replace the other parser.

Please consider applying this patch after possibly further discussion on
debian-devel@l.d.o.

Kind regards

      Andreas.

PS: Some more skilled Perl programmer might see some room for enhancing
    the code - just be warned that Perl is not my native language.

[1] http://lists.debian.org/debian-devel/2012/08/msg00380.html

-- Package-specific info:

--- /etc/devscripts.conf ---

--- ~/.devscripts ---
Not present

-- System Information:
Debian Release: 6.0.5
Architecture: i386 (i686)

Kernel: Linux 2.6.36-xenU-4814-i386 (SMP w/1 CPU core)
Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages devscripts depends on:
ii  dpkg-dev               1.15.8.12         Debian package development tools
ii  libc6                  2.11.3-3          Embedded GNU C Library: Shared lib
ii  perl                   5.10.1-17squeeze3 Larry Wall's Practical Extraction 

Versions of packages devscripts recommends:
pn  at                 <none>                (no description available)
ii  bsd-mailx [mailx]  8.1.2-0.20100314cvs-1 simple mail user agent
ii  curl               7.21.0-2.1+squeeze2   Get a file from an HTTP, HTTPS or 
ii  dctrl-tools        2.14.5                Command-line tools to process Debi
pn  debian-keyring     <none>                (no description available)
pn  debian-maintainers <none>                (no description available)
ii  dput               0.9.6.1+squeeze1      Debian package upload tool
pn  equivs             <none>                (no description available)
ii  fakeroot           1.14.4-1              Gives a fake root environment
ii  git [git-core]     1:1.7.2.5-3           fast, scalable, distributed revisi
ii  git-core           1:1.7.2.5-3           fast, scalable, distributed revisi
ii  gnupg              1.4.10-4              GNU privacy guard - a free PGP rep
pn  libauthen-sasl-per <none>                (no description available)
pn  libcrypt-ssleay-pe <none>                (no description available)
pn  libjson-perl       <none>                (no description available)
pn  libparse-debcontro <none>                (no description available)
pn  libsoap-lite-perl  <none>                (no description available)
pn  libterm-size-perl  <none>                (no description available)
ii  libtimedate-perl   1.2000-1              collection of modules to manipulat
ii  liburi-perl        1.54-2                module to manipulate and access UR
ii  libwww-perl        5.836-1               Perl HTTP/WWW client/server librar
pn  libyaml-syck-perl  <none>                (no description available)
pn  lintian            <none>                (no description available)
ii  lsb-release        3.2-23.2squeeze1      Linux Standard Base version report
ii  lynx-cur [www-brow 2.8.8dev.5-1          Text-mode WWW Browser with NLS sup
ii  lzma               4.43-14               Compression method of 7z format in
ii  mailx              1:20071201-3          Transitional package for mailx ren
ii  man-db             2.5.7-8               on-line manual pager
ii  openssh-client [ss 1:5.5p1-6+squeeze2    secure shell (SSH) client, for sec
ii  patch              2.6-2                 Apply a diff file to an original
ii  patchutils         0.3.1-2               Utilities to work with patches
ii  sensible-utils     0.0.4                 Utilities for sensible alternative
pn  strace             <none>                (no description available)
ii  unzip              6.0-4                 De-archiver for .zip files
ii  w3m [www-browser]  0.5.2-9               WWW browsable pager with excellent
pn  wdiff              <none>                (no description available)
ii  wget               1.12-2.1              retrieves files from the web
ii  xz-utils           5.0.0-2               XZ-format compression utilities

Versions of packages devscripts suggests:
ii  build-essential        11.5              Informational list of build-essent
pn  cvs-buildpackage       <none>            (no description available)
pn  devscripts-el          <none>            (no description available)
pn  gnuplot                <none>            (no description available)
pn  libfile-desktopentry-p <none>            (no description available)
pn  libnet-smtp-ssl-perl   <none>            (no description available)
ii  mutt                   1.5.20-9+squeeze2 text-based mailreader supporting M
pn  svn-buildpackage       <none>            (no description available)
ii  w3m                    0.5.2-9           WWW browsable pager with excellent

-- no debconf information
diff --git a/scripts/uscan.1 b/scripts/uscan.1
index 86b3078..4f53df1 100644
--- a/scripts/uscan.1
+++ b/scripts/uscan.1
@@ -428,6 +428,10 @@ Give verbose output.
 .B \-\-no\-verbose
 Don't give verbose output.  (This is the default behaviour.)
 .TP
+.B \-\-no\-exclusion
+Do not automatically exclude files mentioned in
+\fIdebian/copyright\fR field \fBFiles-Excluded\fR
+.TP
 .B \-\-debug
 Dump the downloaded web pages to stdout for debugging your watch file.
 .TP
@@ -501,6 +505,10 @@ equivalent to the \fB\-\-destdir\fR option.
 If this is set to \fIyes\fR, then after having downloaded a bzip tar,
 lzma tar, xz tar, or zip archive, \fBuscan\fR will repack it to a gzip tar.
 This is equivalent to the \fB\-\-repack\fR option.
+.B USCAN_NO_EXCLUSION
+If this is set to \fIyes\fR, files mentioned in the field \fBFiles-Excluded\fR
+of \fIdebian/copyright\fR will be ignored and no exclusion of files will be
+tried.  This is equivalent to the \fB\-\-no-exclusion\fR option.
 .SH "EXIT STATUS"
 The exit status gives some indication of whether a newer version was
 found or not; one is advised to read the output to determine exactly
diff --git a/scripts/uscan.pl b/scripts/uscan.pl
index 8723fb4..e118142 100755
--- a/scripts/uscan.pl
+++ b/scripts/uscan.pl
@@ -45,6 +45,11 @@ BEGIN {
 	}
     }
 }
+# Dpkg::Control::Hash prefered by James McCoy (who did the last three uscan.pl edits using a debian.org e-mail address)
+use Dpkg::Control::Hash;
+# Parse::DebControl suggested by Jonas Smedegaard
+# use Parse::DebControl;
+
 my $CURRENT_WATCHFILE_VERSION = 3;
 
 my $progname = basename($0);
@@ -70,6 +75,7 @@ sub dehs_die ($);
 sub dehs_output ();
 sub quoted_regex_replace ($);
 sub safe_replace ($$);
+sub get_main_source_dir($$$);
 
 sub usage {
     print <<"EOF";
@@ -136,6 +142,8 @@ Options:
     --no-conf, --noconf
                    Don\'t read devscripts config files;
                    must be the first option given
+    --no-exclusion no automatic exclusion of files mentioned in
+                   debian/copyright field Files-Excluded
     --help         Show this message
     --version      Show version information
 
@@ -178,6 +186,7 @@ my $dehs_start_output = 0;
 my $pkg_report_header = '';
 my $timeout = 20;
 my $user_agent_string = 'Debian uscan ###VERSION###';
+my $no_exclusion = 0;
 
 if (@ARGV and $ARGV[0] =~ /^--no-?conf$/) {
     $modified_conf_msg = "  (no configuration files read)";
@@ -194,6 +203,7 @@ if (@ARGV and $ARGV[0] =~ /^--no-?conf$/) {
 		       'USCAN_DEHS_OUTPUT' => 'no',
 		       'USCAN_USER_AGENT' => '',
 		       'USCAN_REPACK' => 'no',
+		       'USCAN_NO_EXCLUSION' => 'no',
 		       'DEVSCRIPTS_CHECK_DIRNAME_LEVEL' => 1,
 		       'DEVSCRIPTS_CHECK_DIRNAME_REGEX' => 'PACKAGE(-.+)?',
 		       );
@@ -231,6 +241,8 @@ if (@ARGV and $ARGV[0] =~ /^--no-?conf$/) {
 	or $config_vars{'USCAN_DEHS_OUTPUT'}='no';
     $config_vars{'USCAN_REPACK'} =~ /^(yes|no)$/
 	or $config_vars{'USCAN_REPACK'}='no';
+    $config_vars{'USCAN_NO_EXCLUSION'} =~ /^(yes|no)$/
+	or $config_vars{'USCAN_NO_EXCLUSION'}='no';
     $config_vars{'DEVSCRIPTS_CHECK_DIRNAME_LEVEL'} =~ /^[012]$/
 	or $config_vars{'DEVSCRIPTS_CHECK_DIRNAME_LEVEL'}=1;
 
@@ -261,7 +273,7 @@ if (@ARGV and $ARGV[0] =~ /^--no-?conf$/) {
 # Now read the command line arguments
 my $debug = 0;
 my ($opt_h, $opt_v, $opt_destdir, $opt_download, $opt_force_download,
-    $opt_report, $opt_passive, $opt_symlink, $opt_repack);
+    $opt_report, $opt_passive, $opt_symlink, $opt_repack, $opt_no_exclusion);
 my ($opt_verbose, $opt_level, $opt_regex, $opt_noconf);
 my ($opt_package, $opt_uversion, $opt_watchfile, $opt_dehs, $opt_timeout);
 my $opt_download_version;
@@ -293,6 +305,7 @@ GetOptions("help" => \$opt_h,
 	   "useragent=s" => \$opt_user_agent,
 	   "noconf" => \$opt_noconf,
 	   "no-conf" => \$opt_noconf,
+	   "no-exclusion" => \$opt_no_exclusion,
 	   "download-current-version" => \$opt_download_current_version,
 	   )
     or die "Usage: $progname [options] [directories]\nRun $progname --help for more details\n";
@@ -316,6 +329,7 @@ $timeout = 20 unless defined $timeout and $timeout > 0;
 $symlink = $opt_symlink if defined $opt_symlink;
 $verbose = $opt_verbose if defined $opt_verbose;
 $dehs = $opt_dehs if defined $opt_dehs;
+$no_exclusion = $opt_no_exclusion if defined $opt_no_exclusion;
 $user_agent_string = $opt_user_agent if defined $opt_user_agent;
 $download_version = $opt_download_version if defined $opt_download_version;
 if ($dehs) {
@@ -1429,6 +1443,56 @@ EOF
 	}
     }
 
+    if ( !$no_exclusion ) {
+        my $data = Dpkg::Control::Hash->new();
+        $data->load('debian/copyright');
+        # my $parser = new Parse::DebControl(1);
+        # my $data = $parser->parse_file('debian/copyright', {discardCase=>1,singleBlock=>1,});
+        my $okformat = qr'http://www.debian.org/doc/packaging-manuals/copyright-format/1.0';
+        if ($data->{'format'} =~ m{^$okformat/?$} and $data->{'files-excluded'} ) {
+            my $tempdir = tempdir ( "uscanXXXX", TMPDIR => 1, CLEANUP => 1 );
+            my $globpattern = "*";
+            my $hidden = ".[!.]*";
+            if (defined glob("$tempdir/$hidden")) {
+                $globpattern .= " $hidden";
+            }
+            my $absdestdir = abs_path($destdir);
+            unless ( system("cd $tempdir; tar -xaf \"$absdestdir/$newfile_base\" 2>/dev/null") == 0 ) {
+                print "-- $newfile_base is no tarball.  Try unzip.\n" if $verbose;
+                # try unzip if tar fails - we do want to do something sensible even if no --repack was specified
+                system('command -v unzip >/dev/null 2>&1') >> 8 == 0
+                   or die("unzip binary not found. This would serve as fallback because tar just failed.\n");
+                system('unzip', '-q', '-a', '-d', $tempdir, "$destdir/$newfile_base") == 0
+                   or die("Repacking from zip to tar.gz failed (could not unzip)\n");
+            }
+            my $main_source_dir = get_main_source_dir($tempdir, $pkg, $newversion);
+            unless ( -d $main_source_dir ) {
+                print STDERR "Error: $main_source_dir is no directory";
+            }
+            my $nfiles_before = `find $main_source_dir | wc -l`;
+            foreach (grep {/\//} split /\s+/, $data->{"files-excluded"}) {
+                # delete trailing '/' because otherwise find -path will fail
+                s?/+$?? ;
+                # use rm -rf to enable deleting non-empty directories
+                `find $main_source_dir -path "$main_source_dir/$_" | xargs rm -rf`;
+            };
+            foreach (grep {/^[^\/]+$/} split /\s+/, $data->{"files-excluded"}) {
+                `find $main_source_dir -type f -name $_ -delete`;
+            };
+            my $nfiles_after = `find $main_source_dir | wc -l`;
+            if ( $nfiles_before == $nfiles_after ) {
+                print "-- Source tree remains identical - no need for repacking.\n" if $verbose;
+            } else {
+                my $excludefuffix = '+dfsg' ;
+                my $suffix = 'gz' ;
+                my $newfile_base_dfsg = "${pkg}_${newversion}${excludefuffix}.orig.tar.$suffix" ;
+                system("cd $tempdir; GZIP='-n -9' tar --owner=root --group=root --mode=a+rX -czf \"$absdestdir/$newfile_base_dfsg\" $globpattern") == 0
+                   or die("Excluding files failed (could not create tarball)\n");
+                $symlink = 'no' # prevent symlinking or renaming
+            }
+        }
+    }
+
     my @renames = (
 	[qr/\.(tar\.gz|tgz)$/, 'gz'],
 	[qr/\.(tar\.bz2|tbz2?)$/, 'bz2'],
@@ -2005,3 +2069,50 @@ sub safe_replace($$) {
 	return 1;
     }
 }
+
+sub get_main_source_dir($$$) {
+    my ($tempdir, $pkg, $newversion) = @_;
+    my $fcount = 0;
+    my $main_source_dir = '' ;
+    my $any_dir = '' ;
+    opendir DIR, $tempdir or die "opendir $tempdir: $!";
+    my @files = readdir DIR ;
+    closedir DIR ;
+    foreach my $file (@files) {
+	unless ($file =~ /^\.\.?/) {
+            $fcount++;
+	    if ( -d $tempdir.'/'.$file ) {
+                $any_dir = $tempdir . '/' . $file ;
+                $main_source_dir = $any_dir if ( $file =~ /^$pkg\w*$newversion$/i ) ;
+            }
+        }
+    }
+    if ( $fcount == 1 and $main_source_dir ) {
+        return $main_source_dir ;
+    }
+    if ( $fcount == 1 and $any_dir ) {
+        # Unusual base dir in tarball - should be rather something like ${pkg}-${newversion}
+        $main_source_dir = $tempdir . '/' . $pkg . '-' . $newversion ;
+        move($any_dir, $main_source_dir) or die("Unable to move $any_dir directory $main_source_dir\n");
+        return $main_source_dir ;
+    }
+    print "-- Dirty tarball found.\n" if $verbose;
+    if ( $main_source_dir ) { # if tarball is dirty but does contain a $pkg-$newversion dir we will not undirty but leave it as is
+        print "-- No idea how to create proper tarball structure - leaving as is.\n" if $verbose;
+    	return $tempdir;
+    }
+    print "-- Move files to subdirectory $pkg-$newversion.\n" if $verbose;
+    $main_source_dir = $tempdir . '/' . $pkg . '-' . $newversion ;
+    mkdir($main_source_dir) or die("Unable to create temporary source directory $main_source_dir\n");
+    foreach my $file (@files) {
+	unless ($file =~ /^\.\.?/) {
+            # move("${tempdir}/$file", $main_source_dir) or die("Unable to move ${tempdir}/$file directory $main_source_dir\n");
+            unless ( move("${tempdir}/$file", $main_source_dir) ) {
+        	# HELP: why can't perl move not move directories????
+                print "Perl move seems to be not able to ` move(\"${tempdir}/$file\", $main_source_dir) ` ... trying system mv\n" if $debug;
+                system( "mv ${tempdir}/$file $main_source_dir" ) ;
+            }
+        }
+    }
+    return $main_source_dir;
+}

Reply via email to