I am the author of the updated "rsbep" package that was mentioned in this thread, and I was contacted by David Mathog (just "mathog" henceforth) about the issues he had.
"mathog" reported a difference in the decoded output sizes, when directly using "rsbep" - but as mentioned by David N. Lombard, the usage scenario that "mathog" followed was not a sanctioned one: both my site's article as well as the package instructions (README) referred to the "freeze"/"melt" scripts, that decode the shielded data into the correct output size. The reason for what mathog experienced is a bit complex, but clearly explained on my site (http://users.softlab.ece.ntua.gr/~ttsiod/rsbep.html), and it boils down to this: in order to withstand storage errors, an interleaving of the Reed-Solomon encoded data has to take place. Basically, the x86 ASM code of Reed-Solomon that I "inherited" from the original "rsbep" and use in my package, adds 16 bytes of parity data to each block of 223 bytes of input, turning it to a 255-bytes block. These parity bytes allow detection and correction of 16 errors (in the encoded 255-byte block), as well as detection of 32 errors (in the encoded 255-byte block). This however won't work for storage media, since they work or fail on sector boundaries (512 bytes for disks and 2048 bytes for CDs/DVDs) - so the encoded data are interleaved by my package, inside blocks of 1040400 bytes (containing 4080 of the Reed-Solomon-encoded 255-byte blocks)... In this way, a loss of a sector only impacts ONE byte in the 512 encoded "blocks" that are passing through it (due to the interleaving)... If interested, you can read more details on my page, where I explain how the idea works. The end result, is that - the interleaved stream can lose 127 contiguous sectors (65024 contiguous bytes) and still be recoverable. - the interleaved stream can lose 128-255 sectors, and detect the error (and report it, but not fix it) - Beyond that number of errors (which correspond, after de-interleaving, to more than 32 bytes in the encoded 255 byte block), the Reed-Solomon code is lost... Given the interleaving that my package performs on the encoded bytes, the only chance of this happening, is losing a contiguous stream of more than 32x4080 bytes, i.e. 130560 bytes. A storage error that causes this much loss (255 contiguous sectors!) is a lost cause anyway - at least as far as my needs go. If you want to be able to recover from this or even larger amounts of loss, you can do it, by increasing the block size from my chosen 255x4080 (1040400 bytes) to something even bigger, and by adapting my interleaving code (rsbep.c, "distribute" function). To summarize, "mathog"'s pockmark app is not representative of what happens in storage media - they NEVER fail on byte-levels - they fail on sector levels. So what should you do, if you want to be 100% sure of failure detection? Simple: By reviewing my freeze/melt scripts, you will see that all I do to the "to-be-shielded-stream" is (a) add a "magic marker" and (b) add the file size, so that "melt.sh" can chop the output down to the right size. If you want bullet-proof validity checks, you can easily add the MD5 or SHA sum of the input data, to the "to-be-shielded-stream", so that the "melting" process can check this and be 100% certain in restoration or detection of failure, even in the face of impossible stream corruption (more than 130K lost). Note however, that this is not necessary if you use an algorithm that can detect errors in the decoded stream (which is how I use my rsbep - i.e always on a stream generated by gzip, bzip2, etc) Hope this clarifies things. Kind regards, Thanassis Tsiodras, Dr.-Ing. -- What I gave, I have; what I spent, I had; what I kept, I lost. -Old Epitaph _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf