Jesse Becker and others suggested: > http://users.softlab.ntua.gr/~ttsiod/rsbep.html
I tried it and it works, mostly, but definitely has some warts. To start with I gave it a negative control - a file so badly corrupted it should NOT have been able to recover it. % ssh remotePC 'dd if=/dev/sda1 bs=8192' >img.orig % cat img.orig | bzip2 >img.bz2.orig % cat img.bz2.orig | rsbep > img.bz2.rsbep % cat img.bz2.rsbep | pockmark -maxgap 100000 -maxrun 10000 >img.bz2.rsbep.pox % cat img.bz2.rsbep.pox | rsbep -d -v >img.bz2.restored rsbep: number of corrected failures : 9725096 rsbep: number of uncorrectable blocks : 0 img.orig is a Windows XP partition with all empty space filled with 0x0 bytes. That is then compressed with bzip2, then run through rsbep (the one from the link above), then corrupted with pockmark. Pockmark is my own little concoction, when used as shown it stamps 0x0 bytes starting randomly every (1-MAXGAP) bytes, for a run of (1-MAXRUN). In both cases the gap and run length are chosen at random from those ranges for each new gap/run. This should corrupt around 10% of the file, which I assumed would render it unrecoverable. Notice in the file sizes below that the overall size did not change when the file was run through pockmark. rsbep did not note any errors it couldn't correct. However, the size of the restored file is not the same as the orig. 4056976560 2010-06-08 17:51 img.bz2.restored 4639143600 2010-06-08 16:19 img.bz2.rsbep.pox 4639143600 2010-06-08 16:13 img.bz2.rsbep 4056879025 2010-06-08 14:40 img.bz2.orig 20974431744 2010-06-07 15:23 img.orig % bunzip2 -tvv img.bz2.restored img.bz2.restored: [1: huff+mtf data integrity (CRC) error in data So at the very least rsbep sometimes says it has recovered a file when it has not. I didn't really expect it to rescue this particular input, but it really should have handled it better. I reran it with a less damaged file like this: % cat img.bz2.rsbep | pockmark -maxgap 1000000 -maxrun 10000 >img.bz2.rsbep.pox2 % cat img.bz2.rsbep.pox2 | rsbep -d -v >img.bz2.restored2 rsbep: number of corrected failures : 46025036 rsbep: number of uncorrectable blocks : 0 % bunzip2 img.bz2.restored2 bunzip2: Can't guess original name for img.bz2.restored2 -- using img.bz2.restored2.out bunzip2: img.bz2.restored2: trailing garbage after EOF ignored % md5sum img.bz2.restored2.out img.orig 7fbaec7143c3a17a31295a803641aa3c img.bz2.restored2.out 7fbaec7143c3a17a31295a803641aa3c img.orig This time it was able to recover the corrupted file, but again, it created an output file which was a different size. Is this always the case? Seems to be at least for the size file used here: % cat img.bz2.orig | rsbep | rsbep -d > nopox.bz2 nopox.bz2 is also 4056976560. The decoded output is always 97535 bytes larger than the original, which may bear some relation to the z=ERR_BURST_LEN parameter as: 97535 /765 = 127.496732 which is suspiciously close to 255/2. Or that could just be a coincidence. In any case, bunzip2 was able to handle the crud on the end, but this would have been a problem for other binary files. Tbe other thing that is frankly bizarre is the number of "corrected" failures for the 2nd case vs. the first. The 2nd should have 10X fewer bad bytes than the first, but the rsbep status messages indicate 4.73X MORE. However, the number of bad bytes in the 2nd is almost exactly 1%, as it should be. All of this suggests that rsbep does not handle correctly files which are "too" corrupted. It gives the wrong number of corrected blocks and thinks that it has corrected everything when it has not done so. Worse, even when it does work the output file was never (in any of the test cases) the same size as the input file. I think this program has potential but it needs a bit of work to sand the rough edges off. I will have a look at it, but won't have a chance to do so for a couple of weeks. Regards, David Mathog mat...@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf