On Tue, 20 Apr 2004 21:03:15 +0200 Wolfgang Pfeiffer <[EMAIL PROTECTED]> wrote: > > My goal is to get easily rid of identical files on a system:
I did something like this once for a whole filesystem with a bash script. md5sum'ing *everything* is wasteful of time and cpu cycles, since (probably) most of the things you'll md5sum won't have duplicates. Instead, what I did was to get an ls of all the directories in which I wanted to search for duplicates (I used "find -type d -exec ls..." since I was doing it over a filesystem). I made sure the flags for ls were such that I'd get a column with filesizes and a column with pathnames. And I had the output directed into a file. Then, once that was done, I sorted the file (using the "sort" command) using as sort key the column with filesizes, then used uniq (with appropriate flags to only consider the filesize column) to trim out lines for which no other file had the same size. Then, I md5sum'd all of those (output into a file), and used uniq on that file to find duplicate md5sums. That's a pretty brute-force way to do it, but it works. I'm now awaiting someone else to point out a much more elegant solution. Heh. -c -- Chris Metzler [EMAIL PROTECTED] (remove "snip-me." to email) "As a child I understood how to give; I have forgotten this grace since I have become civilized." - Chief Luther Standing Bear
pgp00000.pgp
Description: PGP signature