Package: dpkg-dev
Version: 1.15.4.1
Severity: normal
File: /usr/bin/dpkg-source
Tags: patch

As seen in (but unrelated to) bug #554612, running dpkg-source -b is slow on
big trees, where the main contender is Dpkg::Source::Patch::add_diff_file.

%Time    Sec.     #calls   sec/call  F  name
32.52   69.8110    44702   0.001562     Dpkg::Source::Patch::add_diff_file
29.95   64.2837    44829   0.001434     Dpkg::IPC::fork_and_exec
10.22   21.9370    44829   0.000489     Dpkg::IPC::wait_child
 7.14   15.3309    97591   0.000157     File::Spec::Unix::abs2rel
 4.25    9.1206   585681   0.000016     File::Spec::Unix::canonpath

Here, the main problem is obviously forking 44829 times diff -u, while the
vast majority of files in the orig tarball haven't been touched (which is
mostly true on all packages).

The attached patch (which may have style and correctness issue) implements
a very simple check in perl (so, without a fork) to see if files differ
before running diff. The result is stunning:

>From 3 minutes and 30 seconds on iceape in format 3.0 (quilt), dpkg-source
-b goes down to 35 seconds (where 15 are spent bunzipping).

This is where the time is spent, now:

%Time    Sec.     #calls   sec/call  F  name
24.41   14.1649      128   0.110663     Dpkg::IPC::wait_child
19.46   11.2948    97594   0.000116     File::Spec::Unix::abs2rel
13.77    7.9901    44703   0.000179     Dpkg::Source::Patch::add_diff_file
 9.37    5.4382   585699   0.000009     File::Spec::Unix::canonpath

It looks much better.

My gut feeling is that it should improve run time on most if not all packages.
(and more importantly, will help big packages maintainers)

I'm pretty sure reading by blocks of a multiple of 4k instead of reading line
by line could be faster, too.

-- System Information:
Debian Release: squeeze/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.31-1-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages dpkg-dev depends on:
ii  binutils                      2.20-2     The GNU assembler, linker and bina
ii  bzip2                         1.0.5-3    high-quality block-sorting file co
ii  dpkg                          1.15.4.1   Debian package management system
ii  libtimedate-perl              1.1900-1   Time and date functions for Perl
ii  lzma                          4.43-14    Compression method of 7z format in
ii  make                          3.81-7     An utility for Directing compilati
ii  patch                         2.5.9-5    Apply a diff file to an original
ii  perl [perl5]                  5.10.1-6   Larry Wall's Practical Extraction 
ii  perl-modules                  5.10.1-6   Core Perl modules

Versions of packages dpkg-dev recommends:
ii  build-essential           11.4           Informational list of build-essent
ii  fakeroot                  1.14.3         Gives a fake root environment
ii  gcc [c-compiler]          4:4.3.3-9+nmu1 The GNU C compiler
ii  gcc-4.1 [c-compiler]      4.1.2-27       The GNU C compiler
ii  gcc-4.2 [c-compiler]      4.2.4-6        The GNU C compiler
ii  gcc-4.3 [c-compiler]      4.3.4-6        The GNU C compiler
ii  gcc-4.4 [c-compiler]      4.4.2-2        The GNU C compiler
ii  gnupg                     1.4.10-2       GNU privacy guard - a free PGP rep
ii  gpgv                      1.4.10-2       GNU privacy guard - signature veri

Versions of packages dpkg-dev suggests:
ii  debian-keyring [debian-mainta 2009.08.27 GnuPG (and obsolete PGP) keys of D

-- no debconf information

-- debsums errors found:
debsums: changed file /usr/share/perl5/Dpkg/Source/Patch.pm (from dpkg-dev 
package)
debsums: changed file /usr/share/perl5/Dpkg/Source/Package/V2.pm (from dpkg-dev 
package)
--- /usr/share/perl5/Dpkg/Source/Patch.pm
+++ /usr/share/perl5/Dpkg/Source/Patch.pm
@@ -58,6 +58,19 @@
 
 sub add_diff_file {
     my ($self, $old, $new, %opts) = @_;
+    open(OLD, "<", $old);
+    open(NEW, "<", $new);
+    my $match = 1;
+    while (<OLD>) {
+        if ($_ ne <NEW>) {
+            $match = 0;
+            last;
+        }
+    }
+    close OLD;
+    close NEW;
+    return 1 if ($match);
+
     $opts{"include_timestamp"} = 0 unless exists $opts{"include_timestamp"};
     my $handle_binary = $opts{"handle_binary_func"} || sub {
         my ($self, $old, $new) = @_;

Reply via email to