Proposed enhancement to diff:

diff of two very different files can take a very long time
and a lot of memory.
diff -q uses the same algorithm even though the status is
known at the first difference.

I propose ending the comparison at the first difference if
  diff is invoked with -q
  diff is not invoked with -w, -i, or -b

The changes pass the regression tests and all the tests I've tried.
I believe the changes are not machine dependent.
I invite criticism and counterexamples.

Example:

$ ls -l trash.120403 trash.120711
-rw-------  1 gwes  users  249686538 Apr  3  2012 trash.120403
-rw-r--r--  1 gwes  users  142356923 Jul 11  2012 trash.120711

$ time diff -q trash.120403 trash.120711
diff: 
    1m51.52s real     1m47.66s user     0m2.46s system

top output:

load averages:  1.02,  0.91,  0.58                        xxxx.oat.com 15:41:54
49 processes: 47 idle, 2 on processor
CPU0 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU1 states: 98.4% user,  0.0% nice,  1.6% system,  0.0% interrupt,  0.0% idle
Memory: Real: 403M/785M act/tot Free: 796M Cache: 312M Swap: 0K/1248M

  PID USERNAME PRI NICE  SIZE   RES STATE     WAIT      TIME    CPU COMMAND
18740 gwes      57    0  362M  333M onproc/1  biowait   1:05 95.61% diff


$ time work/newdiff/diff -q trash.120403 trash.120711
Files trash.120403 and trash.120711 differ
    0m0.00s real     0m0.00s user     0m0.00s system

The code changes

$ diff -u diff.h work/newdiff/diff.h
--- diff.h      Thu May 15 16:29:15 2014
+++ work/newdiff/diff.h Thu May 15 15:57:30 2014
@@ -64,6 +64,10 @@
 #define D_PROTOTYPE    0x080   /* Display C function prototype */
 #define D_EXPANDTABS   0x100   /* Expand tabs to spaces */
 #define D_IGNOREBLANKS 0x200   /* Ignore white space changes */
+                               /* test for possible return at first difference 
*/
+#define CANBRIEFRETURN(flags) (((flags) & (D_FOLDBLANKS | D_IGNORECASE \
+                                       | D_IGNOREBLANKS \
+                                       )) == 0)
 
 /*
  * Status values for print_status() and diffreg() return values

$ diff -u diffreg.c work/newdiff/diffreg.c 
--- diffreg.c   Thu May 15 16:29:15 2014
+++ work/newdiff/diffreg.c      Thu May 15 16:31:19 2014
@@ -366,6 +366,15 @@
                status |= 1;
                goto closem;
        }
+       if ((diff_format == D_BRIEF) && CANBRIEFRETURN(flags)) {
+               anychange = 1;
+               if (flags & D_HEADER) {
+                       diff_output("%s %s %s\n", \
+                               diffargs, file1, file2);
+                       flags &= ~D_HEADER;
+               }
+               goto closem;
+       }
        if (lflag) {
                /* redirect stdout to pr */
                int pfd[2];

Reply via email to