Hallo Julian Andres, 2012-03-04 um 01:07:42 schriebst Du: > On Sun, Mar 04, 2012 at 12:31:16AM +0100, Timo Weingärtner wrote:
> > The initial comparison was with hardlink, which got OOM killed with a > > hundred backups of my home directory. Last night I compared it to duff > > and rdfind which would have happily linked files with different st_mtime > > and st_mode. > > You might want to try hardlink 0.2~rc1. In any case, I don't think we need > yet another such tool in the archive. If you want that algorithm, we can > implement it in hardlink 0.2 using probably about 10 lines. I had that > locally and it works, so if you want it, we can add it and avoid the > need for one more hack in that space. And why is lighttpd in the archive? Apache can do the same ... > hardlink 0.2 is written in C, and uses a binary tree to map > (dev_t, off_t) to a struct file which contains the stat information > plus name for linking. It requires two allocations per file, one for > the struct file with the filename, and one for the node in the tree > (well, actually we only need the node for the first file with a > specific (dev_t, off_t) tuple). A node has 3 pointers. The "hardlink" I used at that time was written in python and definitely didn't do it the way I want. hadori is written in C++11 which IMHO makes it look a little more readable. It started with tree based map and multimap, now it uses the unordered_ (hash based) versions which made it twice as fast in a typical workload. The main logic is in hadori.C, handle_file and uses: std::unordered_map<ino_t, inode const> kept; std::unordered_map<ino_t, ino_t> to_link; std::unordered_multimap<off_t, ino_t> sizes; class inode contains a struct stat, a file name and an adler checksum, but I plan to drop the last one because I think the hashing option is no great gain. Grüße Timo
signature.asc
Description: This is a digitally signed message part.