Package: dpkg-dev Version: 1.15.4.1 Severity: normal File: /usr/bin/dpkg-source Tags: patch
As seen in (but unrelated to) bug #554612, running dpkg-source -b is slow on big trees, where the main contender is Dpkg::Source::Patch::add_diff_file. %Time Sec. #calls sec/call F name 32.52 69.8110 44702 0.001562 Dpkg::Source::Patch::add_diff_file 29.95 64.2837 44829 0.001434 Dpkg::IPC::fork_and_exec 10.22 21.9370 44829 0.000489 Dpkg::IPC::wait_child 7.14 15.3309 97591 0.000157 File::Spec::Unix::abs2rel 4.25 9.1206 585681 0.000016 File::Spec::Unix::canonpath Here, the main problem is obviously forking 44829 times diff -u, while the vast majority of files in the orig tarball haven't been touched (which is mostly true on all packages). The attached patch (which may have style and correctness issue) implements a very simple check in perl (so, without a fork) to see if files differ before running diff. The result is stunning: >From 3 minutes and 30 seconds on iceape in format 3.0 (quilt), dpkg-source -b goes down to 35 seconds (where 15 are spent bunzipping). This is where the time is spent, now: %Time Sec. #calls sec/call F name 24.41 14.1649 128 0.110663 Dpkg::IPC::wait_child 19.46 11.2948 97594 0.000116 File::Spec::Unix::abs2rel 13.77 7.9901 44703 0.000179 Dpkg::Source::Patch::add_diff_file 9.37 5.4382 585699 0.000009 File::Spec::Unix::canonpath It looks much better. My gut feeling is that it should improve run time on most if not all packages. (and more importantly, will help big packages maintainers) I'm pretty sure reading by blocks of a multiple of 4k instead of reading line by line could be faster, too. -- System Information: Debian Release: squeeze/sid APT prefers unstable APT policy: (500, 'unstable'), (1, 'experimental') Architecture: amd64 (x86_64) Kernel: Linux 2.6.31-1-amd64 (SMP w/2 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/bash Versions of packages dpkg-dev depends on: ii binutils 2.20-2 The GNU assembler, linker and bina ii bzip2 1.0.5-3 high-quality block-sorting file co ii dpkg 1.15.4.1 Debian package management system ii libtimedate-perl 1.1900-1 Time and date functions for Perl ii lzma 4.43-14 Compression method of 7z format in ii make 3.81-7 An utility for Directing compilati ii patch 2.5.9-5 Apply a diff file to an original ii perl [perl5] 5.10.1-6 Larry Wall's Practical Extraction ii perl-modules 5.10.1-6 Core Perl modules Versions of packages dpkg-dev recommends: ii build-essential 11.4 Informational list of build-essent ii fakeroot 1.14.3 Gives a fake root environment ii gcc [c-compiler] 4:4.3.3-9+nmu1 The GNU C compiler ii gcc-4.1 [c-compiler] 4.1.2-27 The GNU C compiler ii gcc-4.2 [c-compiler] 4.2.4-6 The GNU C compiler ii gcc-4.3 [c-compiler] 4.3.4-6 The GNU C compiler ii gcc-4.4 [c-compiler] 4.4.2-2 The GNU C compiler ii gnupg 1.4.10-2 GNU privacy guard - a free PGP rep ii gpgv 1.4.10-2 GNU privacy guard - signature veri Versions of packages dpkg-dev suggests: ii debian-keyring [debian-mainta 2009.08.27 GnuPG (and obsolete PGP) keys of D -- no debconf information -- debsums errors found: debsums: changed file /usr/share/perl5/Dpkg/Source/Patch.pm (from dpkg-dev package) debsums: changed file /usr/share/perl5/Dpkg/Source/Package/V2.pm (from dpkg-dev package)
--- /usr/share/perl5/Dpkg/Source/Patch.pm +++ /usr/share/perl5/Dpkg/Source/Patch.pm @@ -58,6 +58,19 @@ sub add_diff_file { my ($self, $old, $new, %opts) = @_; + open(OLD, "<", $old); + open(NEW, "<", $new); + my $match = 1; + while (<OLD>) { + if ($_ ne <NEW>) { + $match = 0; + last; + } + } + close OLD; + close NEW; + return 1 if ($match); + $opts{"include_timestamp"} = 0 unless exists $opts{"include_timestamp"}; my $handle_binary = $opts{"handle_binary_func"} || sub { my ($self, $old, $new) = @_;