I wasn't thrilled with the limitations of rsbep and eventually wrote a program rsbd (Reed-Solomon for Block Devices) to do this. rsbd reuses the RS encoding/decoding routines from the rsbep distribution (written by Phil Karn) but the rest is new code. rsbd uses message digests (SHA1) that let it skip the RS decode step on data that is not corrupted, which speeds up "decode" a lot. It also keeps track of erasures and so can restore 32 erasures (rsbep is limited to 16) in a block of 255 bytes. rsbd does everything it can to verify data integrity, and either aborts on error or optionally slogs on anyway while noting the locations of bad output. I do not believe there are any cases where it will output bad data and not tell you. (Could be a bug somewhere though.)
rsbd can be retrieved from rsbd.sourceforge.net. Here is an example (uses one core on a dual Opteron 280, single SATA disk on the machine) that corresponds roughly to the size of a DVD-R: % cat test | time rsbd -e >test.rsbd 90.98user 14.36system 3:04.17elapsed 57%CPU % cat test.rsbd | time rsbd -d -c >restored Output size: 4056879025 input size: 4648980480 Input blocks total: 9080040 Input blocks erased: 0 Neighborhoods processed: 4451 Sections processed: 35602 Sections Spec. Blk verified: ddgst good: 35602 Sections Spec. Blk verified: ddgst bad: 0 Sections Spec. Blk verified: RS: ddgst good: 0 Sections Spec. Blk verified: RS: ddgst bad: 0 Sections Spec. Blk reverified: ddgst good: 0 Sections Spec. Blk reverified: ddgst bad: 0 Sections Spec. Blk reverified: RS: ddgst good: 0 Sections Spec. Blk reverified: RS: ddgst bad: 0 Sections Spec. Blk corrupt: ddgst good: 0 Sections Spec. Blk corrupt: ddgst bad: 0 Sections Spec. Blk corrupt: RS: ddgst good: 0 Sections Spec. Blk corrupt: RS: ddgst bad: 0 RSblks total: 18192622 RSblks clean: 18192622 RSblks corrected: 0 RSblks excess erasures: 0 RSblks uncorrectable: 0 RSblks avg corr. bytes: 0 RSblks max corr. bytes: 0 50.24user 12.43system 2:28.60elapsed 42%CPU % cat test.rsbd | \ pockmark -bs 512 -maxgap 4000 -maxrun 40 > test.rsbd.pox % cat test.rsbd.pox | time rsbd -d -c >restored Output size: 4056879025 input size: 4648980480 Input blocks total: 9080040 Input blocks erased: 91639 Neighborhoods processed: 4451 Sections processed: 35602 Sections Spec. Blk verified: ddgst good: 12608 Sections Spec. Blk verified: ddgst bad: 17152 Sections Spec. Blk verified: RS: ddgst good: 17152 Sections Spec. Blk verified: RS: ddgst bad: 0 Sections Spec. Blk reverified: ddgst good: 0 Sections Spec. Blk reverified: ddgst bad: 5842 Sections Spec. Blk reverified: RS: ddgst good: 5842 Sections Spec. Blk reverified: RS: ddgst bad: 0 Sections Spec. Blk corrupt: ddgst good: 0 Sections Spec. Blk corrupt: ddgst bad: 0 Sections Spec. Blk corrupt: RS: ddgst good: 0 Sections Spec. Blk corrupt: RS: ddgst bad: 0 RSblks total: 18192622 RSblks clean: 6454984 RSblks corrected: 11737638 RSblks excess erasures: 0 RSblks uncorrectable: 0 RSblks avg corr. bytes: 3.7 RSblks max corr. bytes: 19 246.49user 11.73system 5:10.14elapsed 83%CPU % md5sum test restored b22a361554771045df4424e547eaa558 restored b22a361554771045df4424e547eaa558 test >From the above one can see that it doesn't waste time doing RS decoding unless it needs to. Consequently the decode on a file which isn't corrupt runs faster than the encode. Somewhat more on subject for this group, the current version of rsbd is completely single threaded. There is plenty of room here for parallelization. For instance, for each "neighborhood" the sha1 digests are independent and can be done 8 at a time, the RS encode/decode are performed 8*511 times (all of which are completely independent), the XOR step is performed on blocks of 512 consecutive bytes 4096*255/512 times, all independent. However, the [255,4096] <-> [4096,255] transpose of a byte array, once per neighborhood, isn't going to be as trivial to split into threads, and that could be rate limiting. Regards, David Mathog mat...@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf