On Saturday, December 12, 2009, Andy Hayward <[email protected]> wrote: > On Fri, Dec 11, 2009 at 23:24, STeve Andre' <[email protected]> wrote: >> B I am wondering if there is a port or otherwise available >> code which is good at comparing large numbers of files in >> an arbitrary number of directories? B I always try avoid >> wheel re-creation when possible. B I'm trying to help some- >> one with large piles of data, most of which is identical >> across N directories. B Most. B Its the 'across dirs' part >> that involves the effort, hence my avoidance of thinking >> on it if I can help it. ;-) > > sysutils/fdupes > > -- ach >
If you have a database available yo can store file hashes and use SQL. I used postgres for the job and had reasonable performance on a 10 million file collection. I stored directory paths in one table and filename, size, and sha1 in another table. Scripting the table creation was fairly easy... -N

